fedora-copr
/

granite-4.0-h-small-quantized.w8a8

Text Generation

granitemoehybrid

Mixture of Experts

8-bit precision

compressed-tensors

Model card Files Files and versions

fedora-copr/granite-4.0-h-small-quantized.w8a8

This is a W8A8 INT8 quantized version of ibm-granite/granite-4.0-h-small.

Model Details

Quantized by: Jiri Podivin jpodivin@redhat.com
Architecture: Granite-4.0 Hybrid MoE (Mamba + Transformer)
Quantization: INT8 Weight & Activation (W8A8)
Engine Support: vLLM (0.6.0+)

Implementation

The quantization was performed using the llm-compressor library.

vLLM Serving

vllm serve fedora-copr/granite-4.0-h-small-quantized.w8a8 --quantization compressed-tensors

Downloads last month: 8

Safetensors

Model size

32B params

Tensor type

BF16

·

I8

·

Model tree for fedora-copr/granite-4.0-h-small-quantized.w8a8

Base model

ibm-granite/granite-4.0-h-small

Quantized

(37)

this model