---
base_model: Tesslate/OmniCoder-9B
tags:
- llama-cpp
- gguf
- qwen3.5
- omnicoder
- tesslate
- code
- agent
license: apache-2.0
---

# OmniCoder-9B-GGUF
### GGUF quantizations of [OmniCoder-9B](https://huggingface.co/Tesslate/OmniCoder-9B)
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/Tesslate/OmniCoder-9B)
---
## Available Quantizations
| Quantization | Size | Use Case |
|:---|---:|:---|
| `Q2_K` | ~3.8 GB | Extreme compression, lowest quality |
| `Q3_K_S` | ~4.3 GB | Small footprint |
| `Q3_K_M` | ~4.6 GB | Small footprint, balanced |
| `Q3_K_L` | ~4.9 GB | Small footprint, higher quality |
| `Q4_0` | ~5.3 GB | Good balance |
| `Q4_K_S` | ~5.4 GB | Good balance |
| **`Q4_K_M`** | **~5.7 GB** | **Recommended for most users** |
| `Q5_0` | ~6.3 GB | High quality |
| `Q5_K_S` | ~6.3 GB | High quality |
| `Q5_K_M` | ~6.5 GB | High quality, balanced |
| `Q6_K` | ~7.4 GB | Near-lossless |
| `Q8_0` | ~9.5 GB | Highest quality quantization |
| `BF16` | ~17.9 GB | Full precision |
## Usage
```bash
# Install llama.cpp
brew install llama.cpp # macOS
# or build from source: https://github.com/ggml-org/llama.cpp
# Interactive chat
llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192
# Server mode (OpenAI-compatible API)
llama-server --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -c 8192
```
---
**Built by [Tesslate](https://tesslate.com)** | See full model card: [OmniCoder-9B](https://huggingface.co/Tesslate/OmniCoder-9B)