Instructions to use DreamFast/gemma-3-12b-it-heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DreamFast/gemma-3-12b-it-heretic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DreamFast/gemma-3-12b-it-heretic") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("DreamFast/gemma-3-12b-it-heretic") model = AutoModelForImageTextToText.from_pretrained("DreamFast/gemma-3-12b-it-heretic") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use DreamFast/gemma-3-12b-it-heretic with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="DreamFast/gemma-3-12b-it-heretic", filename="gguf/gemma-3-12b-it-heretic-Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use DreamFast/gemma-3-12b-it-heretic with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf DreamFast/gemma-3-12b-it-heretic:Q4_K_M
Use Docker
docker model run hf.co/DreamFast/gemma-3-12b-it-heretic:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use DreamFast/gemma-3-12b-it-heretic with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DreamFast/gemma-3-12b-it-heretic" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/gemma-3-12b-it-heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DreamFast/gemma-3-12b-it-heretic:Q4_K_M
- SGLang
How to use DreamFast/gemma-3-12b-it-heretic with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DreamFast/gemma-3-12b-it-heretic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/gemma-3-12b-it-heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DreamFast/gemma-3-12b-it-heretic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/gemma-3-12b-it-heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use DreamFast/gemma-3-12b-it-heretic with Ollama:
ollama run hf.co/DreamFast/gemma-3-12b-it-heretic:Q4_K_M
- Unsloth Studio new
How to use DreamFast/gemma-3-12b-it-heretic with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DreamFast/gemma-3-12b-it-heretic to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DreamFast/gemma-3-12b-it-heretic to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DreamFast/gemma-3-12b-it-heretic to start chatting
- Docker Model Runner
How to use DreamFast/gemma-3-12b-it-heretic with Docker Model Runner:
docker model run hf.co/DreamFast/gemma-3-12b-it-heretic:Q4_K_M
- Lemonade
How to use DreamFast/gemma-3-12b-it-heretic with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull DreamFast/gemma-3-12b-it-heretic:Q4_K_M
Run and chat with the model
lemonade run user.gemma-3-12b-it-heretic-Q4_K_M
List all available models
lemonade list
why vision tower layers not included in comfyui version?
why vision tower layers not included?
why vision tower layers not included?
Simply because LTX-2 does not use the vision capabilities at all in its pipeline. The ComfyUI version strips out the vision components to save VRAM and disk space as it's never used. LTX2 only uses the text encoding capabilities to generate the embeddings.
Umm, lt has, ltx can do image 2 video and vision can describe image and can be used for image to video generation, thats why gemma text encoder provided by comfyui includes them. Two recent nodes require otherwise throw mismatch errors
The vision tower layers are not used because LTX-2's architecture doesn't process images through Gemma at all. Image-to-video works by encoding the input image through the VAE and replacing the latent at the first frame. Gemma only ever sees text.
The source code describes the text encoder as "Gemma text encoder implementation with tokenizers, feature extractors, and separate encoders for audio-video and video-only generation." No vision components: https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-core/README.md
The architecture is: Gemma 3 Backbone processes text tokens into embeddings, then a Multi-Layer Feature Extractor aggregates from the decoder layers, then a Text Connector feeds into the DiT. The input is text tokens through decoder layers only. The vision tower is a separate component (SigLIP-based) that is never referenced in this pipeline.
Image conditioning in the pipeline uses "Replacing Latents." It encodes the image via the VAE and replaces the latent at a specific frame. Gemma is never involved: https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md
Regarding the mismatch errors you're seeing, this is a known ComfyUI regression, not an LTX-2 issue. A recent ComfyUI update added multimodal/vision support to their Gemma 3 loader for other models, which broke LTX-2 workflows. The new loader reserves/shifts tokens for vision tasks that LTX-2 doesn't use, corrupting spatial alignment and causing shape mismatches. The fix is to roll back the ComfyUI commit or use the LTXAVTextEncoderLoader node instead: https://github.com/Comfy-Org/ComfyUI/issues/11920
The ComfyUI version strips the vision weights intentionally to save VRAM and disk space. The mismatch errors are caused by ComfyUI's loader trying to account for vision layers that LTX-2 never uses.
If you're still hitting issues, can you share which nodes you're using and your workflow so I can try to replicate it. So far I haven't been able to on my system here, although I haven't used LTX2.3 yet.
i was talking about image to text output from textgenerate node and than take that output as positive text. it has nothing to with what ltx do.
Ahh ok thanks for that update, it makes sense now. The TextGenerateLTX2Prompt node uses Gemma for prompt enhancement and can optionally take an image input for image analysis/captioning. That part would use Gemma's vision capabilities. This seems to be new in ComfyUI core nodes and was introduced February 20th this year. It wasn't available when I originally made these text encoders. So that explains the confusion I had regarding the issue.
This is separate from LTX-2's actual generation pipeline. LTX-2 itself only uses Gemma as a text encoder to generate embeddings that condition the DiT. The Gemma model packaged for LTX2 in ComfyUI had the vision weights intentionally removed, so it cannot be used for vision capabilities. If you're connecting an image to that node with the stripped-down Gemma, that would explain the mismatch errors. Maybe the newer LTX2 workflows use a Gemma with vision+text capabilities.
If you need vision-based prompt enhancement, you'd need to load the full Gemma 3 12B with vision weights, not the LTX2-specific text-only version.
Given that his must be new given the recent comments I can look at remaking a new model with vision capabilities. Although keep in mind Gemma itself is rather censored and with it's training data, it didn't learn many taboo subjects. So even without the refusals it still wont know a lot of things. I wrote more about this here https://huggingface.co/DreamFast/gemma-3-12b-it-heretic/discussions/3
Thanks for the update as it helped to become clear what the issue is. I should make a quick note in the README.md about this and this node.
You can apply gemma uncensored lora to clip with kijai nodes- Load lora (model and clip) and ONLY APPLY IT TO textgenerate it wont affect ltx clip.
https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/loras/gemma-3-12b-it-abliterated_lora_rank64_bf16.safetensors
https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/loras/gemma-3-12b-it-abliterated_heretic_lora_rank64_bf16.safetensors
also NVFP4 is supported now in comfyui, created with comfy kitchen script, please upload them. i am using gemma_3_12B_it_nvfp4_uncalibrated.safetensors. saves memory alot.
https://huggingface.co/DreamFast/gemma-3-12b-it-heretic-v2 check out version 2 with vision support and nvfp4. I'll leave this thread open for others to read easier.
Tested okay for me here. Let me know how it goes!
