Configuration Parsing Warning:Invalid JSON for config file config.json
Nemotron-3-Nano-30B-A3B BODHI distillation
LoRA fine-tune of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 on the
espressovi/BODHI-distillation
dataset, merged back into the base for standalone use.
Training
- Base: NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 (hybrid Mamba2 + attention + MoE)
- Method: LoRA, r=64, alpha=128, attention modules only (q/k/v/o_proj)
- Sequence length: 8192
- Compute: 8x A100 80 GB, ZeRO-2 + grad checkpointing, 1 epoch (~860 steps)
License
Inherits the NVIDIA Open Model License from the base model.
- Downloads last month
- 15