You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

G4-PRISM-PRO — PRISM Dynamic Quantization

Gemma 4 31B-It PRISM-PRO-Dynamic-Quant

PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline.
DQ: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis - not closed-gated Datasets.

Created by Ex0bit

💡 This model is free for active members, or available for purchase to all others here: https://ko-fi.com/s/0daff0e074. Support My Research & Development efforts. Members Recieve access to the latest PRISM-PRO Model drops on Day-0

Model Details

Property	Value
Base Model	google/gemma-4-31B-it
Architecture	Gemma 4 ISWA (Interleaved Sliding Window Attention)
Parameters	31B dense
Quantization	PRISM-PRO-DYNAMIC-QUANT
Achieved BPW	5.02
File Size	~19 GB
Context Length	262,144 tokens
Modalities	Text, Image, Video, Audio
Creator	Ex0bit

Supported Modalities

Text: Full instruction-following and chat
Image: Vision understanding via SigLIP encoder (280 tokens/image)
Video: Gemma4VideoProcessor (32 frames, pooled)
Audio: 40ms per token, 750 token sequence length

PRISM-DQ Quantization

This model uses PRISM-PRO Dynamic Quantization — a proprietary per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on 7 structural weight metrics.

Unlike uniform quantization (Q4_K_M, Q5_K_M), PRISM-DQ analyzes each tensor class's sensitivity to quantization error and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect early layers (high downstream error propagation).

No calibration data, no importance matrices, no training data required.

The result: BF16-equivalent quality at 5.02 bits-per-weight — a 68% size reduction with zero measurable quality loss.

Usage

llama.cpp

llama-server --model Gemma4-PRISM-PRO-DQ-GGUF/gemma4-prism-pro-dq.gguf \
  --port 8080 -ngl 99

LM Studio

Download the GGUF file and load it in LM Studio. The model will be detected as Gemma 4 31B.

Ollama

ollama create g4-prism -f Modelfile

Refusal & Bias Removal

This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-31B-it using the State of The Art PRISM pipeline.

License

Apache 2.0 (inherited from google/gemma-4-31B-it)

Credits

Creator: Ex0bit
Base model: Google DeepMind
Quantization engine: PRISM-DQ by Ex0bit

Downloads last month: -

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Ex0bit/Gemma4-PRISM-PRO-DQ

Base model

google/gemma-4-31B-it

Finetuned

(40)

this model