Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated about 16 hours ago • 49
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 13 items • Updated about 13 hours ago • 15
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated 4 days ago • 31
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 13 days ago • 234
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated 11 days ago • 74
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 11 days ago • 124
💻 Local SmolLMs Collection SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos • 14 items • Updated May 5 • 55
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release • 12 items • Updated May 5 • 82
SmolVLM Collection State-of-the-art compact VLMs for on-device applications: Base, Synthetic, and Instruct. Check our blog: https://huggingface.co/blog/smolvlm • 5 items • Updated May 5 • 41
📚 LLM pretraining datasets Collection A collection of datasets for LLM pretraining • 9 items • Updated May 5 • 15
dLLM & dMLLM Collection (M)LLMs based on Discrete Diffusion Model and relevant techniques • 16 items • Updated Jul 23 • 2