D360-VLM v4.0.0
D360-VLM v4.0.0 is the V4 unified enterprise document intelligence model for extraction, classification, and structured JSON generation from business documents, including noisy and partially degraded scans.
What's New In V4
- Migrates to one canonical
d360_vlmruntime with legacy V3 API/inference wrappers for backward compatibility. - Adds release gates for field extraction F1, document classification accuracy, confidence calibration, and JSON schema validity.
- Introduces a normalized business document schema (
business-doc-v1) and a LLaVA-style image+instruction+JSON supervision manifest. - Ships budget-guarded Modal training + packaging workflow for repeatable enterprise releases.
Models Used
- Primary V4 base model:
HuggingFaceTB/SmolVLM-500M-Instruct - Training objective: Vision-to-JSON instruction tuning (image + prompt -> structured response)
- Runtime surface: Unified
d360_vlmengine + V3 compatibility endpoints
V4 Architecture
- Backbone:
AutoModelForVision2Seqinitialized fromHuggingFaceTB/SmolVLM-500M-Instruct - Processor:
AutoProcessormultimodal tokenization for image + text sequences - Supervision format: LLaVA-style examples (
<image>, business prompt, target JSON response) - Serving layer: Canonical FastAPI app in
d360_vlm.api.appwith V3-compatible routes and wrappers - Quality gates: Extraction/classification metrics, confidence buckets (ECE), schema checks, and release readiness flags
Training Summary (Current V4 Run)
- Run name:
d360_vlm_v4 - Train rows:
unknown - Eval rows:
unknown - Elapsed minutes:
unknown - Projected spend (USD):
unknown(budget:unknown) - Hardware profile: Modal L4 GPU with runtime budget guard
Evaluation Signals
- Release ready:
None - Field F1:
None - Document type accuracy:
None - Schema valid rate:
None
Included In This HF Repo
model/trained model artifactsmodel.pyruntime wrapperd360_vlm_v3_inference.pycompatibility inference APId360_vlm_v3_api.pycompatibility FastAPI entrypointeval_gate_results.jsongate reportmodel_card.jsonstructured release metadata
Intended Use
- Enterprise document workflows: forms, invoices, receipts, IDs, and mixed-layout business documents.
- Structured extraction and classification into downstream JSON contracts.
Limitations
- Performance is dataset-dependent; always validate on your own documents.
- Degraded handwriting and extreme low-resolution scans may reduce quality.
- Human review is recommended for high-risk compliance and financial workflows.
Repository
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for abrarali113/d360_vlm_unified
Base model
HuggingFaceTB/SmolLM2-360M
Quantized
HuggingFaceTB/SmolLM2-360M-Instruct
Quantized
HuggingFaceTB/SmolVLM-500M-Instruct