Configuration Parsing Warning:Invalid JSON for config file config.json

Nemotron-3-Nano-30B-A3B BODHI distillation

LoRA fine-tune of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 on the espressovi/BODHI-distillation dataset, merged back into the base for standalone use.

Training

Base: NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 (hybrid Mamba2 + attention + MoE)
Method: LoRA, r=64, alpha=128, attention modules only (q/k/v/o_proj)
Sequence length: 8192
Compute: 8x A100 80 GB, ZeRO-2 + grad checkpointing, 1 epoch (~860 steps)

License

Inherits the NVIDIA Open Model License from the base model.

Downloads last month: 15

Safetensors

Model size

32B params

Tensor type

F32

BF16

Model tree for Krish2002/Nemotron-3-Nano-30B-A3B-bodhi-distil

Base model

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16

Finetuned

(4)

this model