Instructions to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jackrong/Negentropy-claude-opus-4.7-4B-GGUF", filename="Negentropy-claude-opus-4.7-4B-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Negentropy-claude-opus-4.7-4B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Negentropy-claude-opus-4.7-4B-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
- Ollama
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Ollama:
ollama run hf.co/Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
- Unsloth Studio new
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Negentropy-claude-opus-4.7-4B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Negentropy-claude-opus-4.7-4B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Negentropy-claude-opus-4.7-4B-GGUF to start chatting
- Pi new
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Docker Model Runner:
docker model run hf.co/Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
- Lemonade
How to use Jackrong/Negentropy-claude-opus-4.7-4B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jackrong/Negentropy-claude-opus-4.7-4B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Negentropy-claude-opus-4.7-4B-GGUF-Q4_K_M
List all available models
lemonade list
- 🌀 Negentropy-claude-opus-4.7-4B: A Reasoning Experimental Model Based on Trace Inversion
🌀 Negentropy-claude-opus-4.7-4B: A Reasoning Experimental Model Based on Trace Inversion
💡 Abstract
Based on current public information, commercial models like OpenAI's GPT series and Anthropic's Claude series have clearly hidden their true internal reasoning chains. What we ultimately see through APIs or frontend interfaces are often just "Reasoning Bubbles"—highly compressed and summarized versions of the original massive reasoning content. For small models aiming to improve capabilities through data distillation, these overly compressed reasoning chains fail to provide sufficient step-level learning signals. On the contrary, because the logical leaps are too large and intermediate derivations are missing, forcing small models to learn these summaries directly often leaves them confused and unable to master true reasoning abilities.
🚀 Negentropy-claude-opus-4.7-4B is a 4B-level lightweight reasoning-enhanced model. Its name is derived from Negentropy in information theory, symbolizing the reconstruction of highly ordered, logically rigorous reasoning chains from compressed and fragmented information. It also draws inspiration from the movie ♻️Tenet (which I recently watched but didn't quite understand😄).
✨ Unlike other distillation models in the community that perform simple output distillation from closed-source commercial models, this model employs the cutting-edge "Trace Inversion" technique: using the Trace-Inverter-4B model to restore the compressed "Reasoning Bubbles" leaked by Claude-Opus-4.7 in public interfaces into full Chains-of-Thought (CoT), using them as high-value supervisory signals for fine-tuning.
Core: This model validates that we can recover logical details with strong supervisory power through reverse engineering. This allows a 4B-level model to achieve a sense of reasoning progression and logical rigor close to top-tier commercial models.
📖 1. Negentropy Theory: Why Do We Need "Inversion"? (Introduction)
In the current model distillation paradigm, we face a massive "Information Entropy" trap:
- Commercial Black Boxes: Models like Claude-Opus or GPT-5.5 often provide only the final answer (Answer-only) or extremely compressed summaries (Reasoning Bubbles) via API.
- Information Loss: This compression process significantly increases the data's "entropy," causing small models to learn only "imitated conclusions" rather than "reasoning logic" due to the lack of explanations for intermediate logical leaps.
The core idea of Negentropy-4.7-4B is to expand these compressed signals back into detailed, coherent reasoning trajectories through a "Negentropy (reducing disorder)" process. We believe that the essence of reasoning lies in the process, not the result. Thinking chains completed through inversion techniques provide small models with higher-quality "logical gradients."
🚀 2. Model Positioning
This model aims to bridge the gap between "lightweight" and "deep reasoning." It retains the extremely high inference speed and low memory footprint of a 4B model while exhibiting a distinct "Opus style" in its logical expression.
It is best suited for:
- 🧩 Complex Logical Orchestration: Tasks requiring multi-step reasoning to reach a conclusion.
- 🧪 High-Quality Synthetic Data Generation: Serving as a teacher model to generate initial reasoning chains.
- 💻 Localized Inference Engine: Implementing "thinking" processes similar to large models on edge devices.
- 🔄 Reasoning Alignment Research: Studying how small models inherit the thinking patterns of large models.
🧪 3. The Negentropy Pipeline: Data Construction and Training
❤️ This project was completed alongside Kyle Hessling. Special thanks to him for providing the compute cluster support that made large-scale inversion data fine-tuning possible. You can find him on X here: 🔗 @KyleHessling1
The training data for this model went through a "three-stage purification" process:
🛠 Stage 1: Trace Capture
We collected and cleaned datasets from the community: lordx64/reasoning-distill-claude-opus-4-7-max and Roman1111111/claude-opus-4.6-10000x. These include questions, final answers, and accompanying compressed reasoning summaries from Claude-Opus-4.6/4.7. At this point, the logical chains in the data are broken.
🛠 Stage 2: Trace Inversion
Using a specially trained Jackrong/Trace-Inverter-4B model as a "decoder," we used the aforementioned broken summaries as a skeleton, combined with the question and answer, to reverse-generate the full Full_trace process.
- Input: Problem + Claude-4.7 Answer + Compressed Bubbles
- Output: Full Reconstructed Reasoning Trace (Negentropy Trace)
🛠 Stage 3: SFT
Using the Unsloth framework for deep fine-tuning, the base model (Qwen3.5-4B) learned these inverted thinking chains, which possess a "strong model logic feel."
Why 4B? The 4B model has shown surprising "cost-performance" in handling inversion tasks. It can quickly absorb logical templates and maintain extremely high token output speeds during inference, making it ideal for high-frequency reasoning tasks. More importantly, there is currently relatively little public data available from commercial API models, which does not support training models that are too large and might lead to underfitting.
✨ 4. Key Capabilities
- 🌀 Deep Logical Expansion: When facing complex problems, the model no longer jumps to conclusions but presents layered thinking steps similar to the Claude style.
- 🎯 Answer Consistency: Since the training data originates from Claude-Opus inversions, the model significantly enhances answer accuracy and stability while maintaining a high sense of logic.
- ⚡ Ultra-Fast Response: As a 4B model, it can achieve response speeds hundreds of times faster than commercial APIs on consumer-grade GPUs like RTX 4090/5090.
- 🏗 Structured Thinking: Native support for
<think>...</think>tags, with clearly structured output logic that is easy for downstream programs to parse.
📝 5. Recommended Usage
Example:
User: Prove that there is no largest prime number.
Assistant:
<think>
1. Use proof by contradiction: Assume there exists a largest prime number P.
2. Construct a new number Q, where Q is the product of all known primes plus 1: Q = (2 * 3 * 5 * ... * P) + 1.
3. Analyze the divisibility of Q: Q leaves a remainder of 1 when divided by any of the known primes (2 to P).
4. Conclusion branches:
- If Q is prime, then Q > P, contradicting the assumption that P is the largest prime.
- If Q is composite, it must have a prime factor. This prime factor cannot be any of the primes from 2 to P, so there must exist a prime number greater than P.
5. In either case, we derive that there exists a prime number larger than P.
</think>
Conclusion: By proof by contradiction, we have proven that there are infinitely many prime numbers, and no largest prime exists.
⚠️ 6. Limitations
- Hallucination Risk: While the model has learned the form of reasoning, it may still generate "pseudo-reasoning" that looks plausible but is logically flawed when facing extremely difficult, unknown fields.
- Knowledge Boundaries: The upper limit of the model's capability is restricted by the knowledge base of the base model itself; inversion technology primarily enhances "thinking patterns" rather than "encyclopedic knowledge."
- Not for High-Risk Decisions: Please do not use the generated reasoning processes directly for critical decision-making scenarios such as medical, financial, or legal matters.
🔭 7. Future Vision
Negentropy-claude-opus-4.7-4B is just the first step in our "Negentropy Project."
- Scaling Up: We plan to migrate this method to 9B, 27B, and even 35B MoE models to explore the ceiling of inversion technology on models with larger parameter counts.
- Self-Evolution: Using Reinforcement Learning (RL) to enable the model to learn self-correction and logical closure on top of inversion.
🙏 8. Acknowledgements
- Thanks to Claude-Opus-4.7 for providing (albeit passively) high-quality logical signals.
- Thanks to the paper How to Steal Reasoning Without Reasoning Traces (arXiv:2603.07267) for providing the theoretical foundation.
📖 9. Citation
@misc{jackrong_negentropy_47_4b,
title = {Negentropy-claude-opus-4.7-4B: Reasoning Enhancement via Trace Inversion},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face}
}
- Downloads last month
- 9,936
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit

