fedora-copr/granite-4.0-h-small-quantized.w8a8
This is a W8A8 INT8 quantized version of ibm-granite/granite-4.0-h-small.
Model Details
Quantized by: Jiri Podivin jpodivin@redhat.com
Architecture: Granite-4.0 Hybrid MoE (Mamba + Transformer)
Quantization: INT8 Weight & Activation (W8A8)
Engine Support: vLLM (0.6.0+)
Implementation
The quantization was performed using the llm-compressor library.
vLLM Serving
vllm serve fedora-copr/granite-4.0-h-small-quantized.w8a8 --quantization compressed-tensors
- Downloads last month
- 8
Model tree for fedora-copr/granite-4.0-h-small-quantized.w8a8
Base model
ibm-granite/granite-4.0-h-small