You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Parameters Format Format Quant Multimodal

G4-PRISM-PRO — PRISM Dynamic Quantization

Gemma 4 31B-It PRISM-PRO-Dynamic-Quant

  • PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline.
  • DQ: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis - not closed-gated Datasets.

Created by Ex0bit


💡 This model is free for active members, or available for purchase to all others here: https://ko-fi.com/s/0daff0e074. Support My Research & Development efforts. Members Recieve access to the latest PRISM-PRO Model drops on Day-0

Ko-fi


Model Details

Property Value
Base Model google/gemma-4-31B-it
Architecture Gemma 4 ISWA (Interleaved Sliding Window Attention)
Parameters 31B dense
Quantization PRISM-PRO-DYNAMIC-QUANT
Achieved BPW 5.02
File Size ~19 GB
Context Length 262,144 tokens
Modalities Text, Image, Video, Audio
Creator Ex0bit

Supported Modalities

  • Text: Full instruction-following and chat
  • Image: Vision understanding via SigLIP encoder (280 tokens/image)
  • Video: Gemma4VideoProcessor (32 frames, pooled)
  • Audio: 40ms per token, 750 token sequence length

PRISM-DQ Quantization

This model uses PRISM-PRO Dynamic Quantization — a proprietary per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on 7 structural weight metrics.

Unlike uniform quantization (Q4_K_M, Q5_K_M), PRISM-DQ analyzes each tensor class's sensitivity to quantization error and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect early layers (high downstream error propagation).

No calibration data, no importance matrices, no training data required.

The result: BF16-equivalent quality at 5.02 bits-per-weight — a 68% size reduction with zero measurable quality loss.

Usage

llama.cpp

llama-server --model Gemma4-PRISM-PRO-DQ-GGUF/gemma4-prism-pro-dq.gguf \
  --port 8080 -ngl 99

LM Studio

Download the GGUF file and load it in LM Studio. The model will be detected as Gemma 4 31B.

Ollama

ollama create g4-prism -f Modelfile

Refusal & Bias Removal

This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-31B-it using the State of The Art PRISM pipeline.

License

Apache 2.0 (inherited from google/gemma-4-31B-it)

Credits

Downloads last month
-
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ex0bit/Gemma4-PRISM-PRO-DQ

Finetuned
(40)
this model