Gemma 4 Sparse Autoencoder (SAE)
First open-source Sparse Autoencoder trained on Gemma 4 26B activations.
Built for the ErnOSAgent neural interpretability pipeline.
Architecture
| Parameter | Value |
|---|---|
| Features | 131,072 |
| Model Dimension | 2,816 |
| Expansion Factor | 46.6ร |
| Format | SafeTensors |
| Source Model | Gemma 4 26B IT (Q4_K_M) |
| Training Hardware | Apple M3 Ultra (512GB RAM) |
| Extraction Layer | Last-layer residual stream |
Files
gemma4_sae_1m.safetensorsโ SAE encoder/decoder weights (2.8GB)feature_map.jsonโ 195 labeled features via automated probing
Usage
With ErnOSAgent (Rust)
# Place weights in the data directory
mkdir -p ~/.ernosagent/sae_training/
# Download weights
huggingface-cli download MettaMazza/gemma4-sae gemma4_sae_1m.safetensors --local-dir ~/.ernosagent/sae_training/
# Download feature map
huggingface-cli download MettaMazza/gemma4-sae feature_map.json --local-dir ~/.ernosagent/sae_training/
# Run ErnOS โ SAE loads automatically
cd ~/Desktop/ErnOSAgent && cargo run --release
With Python
from safetensors import safe_open
import numpy as np
with safe_open("gemma4_sae_1m.safetensors", framework="numpy") as f:
encoder = f.get_tensor("encoder.weight") # [131072, 2816]
decoder = f.get_tensor("decoder.weight") # [2816, 131072]
bias = f.get_tensor("encoder.bias") # [131072]
# Encode activations โ sparse features
activations = np.random.randn(2816).astype(np.float32) # from model
features = np.maximum(0, encoder @ activations + bias) # ReLU
# Top-k active features
top_k = np.argsort(features)[-20:][::-1]
for idx in top_k:
if features[idx] > 0:
print(f"Feature {idx}: {features[idx]:.3f}")
Feature Map
The feature_map.json contains 195 human-interpretable labels mapped to SAE feature indices via automated probing. Categories include:
- Reasoning: Chain-of-thought, logical deduction, mathematical reasoning
- Safety: Refusal, deception detection, bias detection, power-seeking
- Cognitive: Creativity, recall, planning, context integration
- Emotional: Valence, arousal, emotional tone detection
- Technical: Code generation, technical depth, language detection
Training
Trained using ErnOSAgent's native SAE training pipeline (cargo run -- --train-sae):
- Activation Collection: Extract 2816-dim residual stream vectors from Gemma 4 26B via llama.cpp's native
/embeddingendpoint - Training: TopK sparse autoencoder with gradient descent (k=64, LR=3e-4)
- Probing: Automated feature labeling via targeted prompt pairs
License
MIT โ same as ErnOSAgent.
Citation
@misc{mettamazza2026gemma4sae,
title={Gemma 4 Sparse Autoencoder for Neural Interpretability},
author={MettaMazza},
year={2026},
url={https://huggingface.co/MettaMazza/gemma4-sae}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support