unsloth-sft-vlm-qwen35-final
LoRA adapter fine-tuned from Qwen3.5-0.8B for visual-language DFK image classification. Trained using the SITA framework https://github.com/aitf-its-tim3-dfk/SITA.
Note: This is the checkpoint from Workshop 2 (of which there have been changes since, this is not the final ckpt, we recommend loading the final ckpt once it becomes available), it's been made available to allow for trialling of integration by DFK-2.
Model Details
Model Description
This is a LoRA adapter for Qwen3.5-0.8B, fine-tuned as a Vision-Language Model (VLM) using Unsloth's SFT pipeline. The model is trained to analyze images and classify them for DFK detection tasks in Indonesian.
- Developed by: DFK Tim 3 ITS
- Model type: Vision-Language Model (VLM) — LoRA adapter
- Language(s): Indonesian
- Finetuned from: unsloth/Qwen3.5-0.8B
Model Sources
- Repository: SITA
Uses
Direct Use
Image-based content moderation classification. Given an image, the model produces a structured analysis with a classification label.
Out-of-Scope Use
This model is not intended for general-purpose vision-language tasks. It is specialized for the DFK disinformation detection pipeline.
Training Details
Training Data
Custom DFK VLM dataset (dfk_vlm_dataset_v1) with a 90/10 train/eval split, loaded from CSV (images_v2.csv).
Prompt Template
Each sample is formatted as a multi-turn conversation using qwen3.5_chatml:
<|im_start|>user
Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar dari sebuah unggahan media sosial, tentukan label kategori pelanggaran dan berikan analisis detail mengenai pelanggaran yang ditemukan.
Judul: {title}
Konteks: {text}
<image>
<|im_end|>
<|im_start|>assistant
Label: {label}
Analisis: {analisis}
<|im_end|>
The model is trained on responses only (train_on_responses_only: true).
Training Procedure
Trained with the SITA framework using the following config (configs/vlmconf.yaml):
Training Hyperparameters
| Parameter | Value |
|---|---|
| Training regime | fp32 (4-bit quantization disabled) |
| LoRA r | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| LoRA target modules | all-linear |
| Finetune vision layers | true |
| Finetune language layers | true |
| Finetune attention modules | true |
| Finetune MLP modules | true |
| Epochs | 5 |
| Batch size | 32 |
| Learning rate | 2e-4 |
| Gradient accumulation steps | 1 |
| Max sequence length | 2048 |
| Optimizer | AdamW 8-bit |
| Gradient checkpointing | unsloth |
| Seed | 3407 |
| Chat template | qwen3.5_chatml |
| Train on responses only | true |
Trainer
- Trainer:
unsloth_vlm_sft(Unsloth VLM SFT trainer) - Instruction part:
<|im_start|>user\n - Response part:
<|im_start|>assistant\n
Evaluation
- Evaluator:
vlm_gen - Max new tokens: 512
- Temperature: 0.0
- BERTScore model:
bert-base-multilingual-cased
Framework versions
- PEFT 0.19.0
- Downloads last month
- 13