G4-PRISM-PRO — PRISM Dynamic Quantization
Gemma 4 31B-It PRISM-PRO-Dynamic-Quant
- PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline.
- DQ: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis - not closed-gated Datasets.
Created by Ex0bit
💡 This model is free for active members, or available for purchase to all others here: https://ko-fi.com/s/0daff0e074. Support My Research & Development efforts. Members Recieve access to the latest PRISM-PRO Model drops on Day-0
Model Details
| Property | Value |
|---|---|
| Base Model | google/gemma-4-31B-it |
| Architecture | Gemma 4 ISWA (Interleaved Sliding Window Attention) |
| Parameters | 31B dense |
| Quantization | PRISM-PRO-DYNAMIC-QUANT |
| Achieved BPW | 5.02 |
| File Size | ~19 GB |
| Context Length | 262,144 tokens |
| Modalities | Text, Image, Video, Audio |
| Creator | Ex0bit |
Supported Modalities
- Text: Full instruction-following and chat
- Image: Vision understanding via SigLIP encoder (280 tokens/image)
- Video: Gemma4VideoProcessor (32 frames, pooled)
- Audio: 40ms per token, 750 token sequence length
PRISM-DQ Quantization
This model uses PRISM-PRO Dynamic Quantization — a proprietary per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on 7 structural weight metrics.
Unlike uniform quantization (Q4_K_M, Q5_K_M), PRISM-DQ analyzes each tensor class's sensitivity to quantization error and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect early layers (high downstream error propagation).
No calibration data, no importance matrices, no training data required.
The result: BF16-equivalent quality at 5.02 bits-per-weight — a 68% size reduction with zero measurable quality loss.
Usage
llama.cpp
llama-server --model Gemma4-PRISM-PRO-DQ-GGUF/gemma4-prism-pro-dq.gguf \
--port 8080 -ngl 99
LM Studio
Download the GGUF file and load it in LM Studio. The model will be detected as Gemma 4 31B.
Ollama
ollama create g4-prism -f Modelfile
Refusal & Bias Removal
This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-31B-it using the State of The Art PRISM pipeline.
License
Apache 2.0 (inherited from google/gemma-4-31B-it)
Credits
- Creator: Ex0bit
- Base model: Google DeepMind
- Quantization engine: PRISM-DQ by Ex0bit
- Downloads last month
- -
We're not able to determine the quantization variants.
Model tree for Ex0bit/Gemma4-PRISM-PRO-DQ
Base model
google/gemma-4-31B-it