Configuration Parsing Warning:Invalid JSON for config file config.json

Nemotron-3-Nano-30B-A3B BODHI distillation

LoRA fine-tune of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 on the espressovi/BODHI-distillation dataset, merged back into the base for standalone use.

Training

  • Base: NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 (hybrid Mamba2 + attention + MoE)
  • Method: LoRA, r=64, alpha=128, attention modules only (q/k/v/o_proj)
  • Sequence length: 8192
  • Compute: 8x A100 80 GB, ZeRO-2 + grad checkpointing, 1 epoch (~860 steps)

License

Inherits the NVIDIA Open Model License from the base model.

Downloads last month
15
Safetensors
Model size
32B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Krish2002/Nemotron-3-Nano-30B-A3B-bodhi-distil

Finetuned
(4)
this model