Instructions to use ProtoNeuron-3/Nucleus-V-1.5-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ProtoNeuron-3/Nucleus-V-1.5-7B",
	filename="Nucleus-V-1.5-7B_4_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B
# Run inference directly in the terminal:
llama-cli -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B
# Run inference directly in the terminal:
llama-cli -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B
# Run inference directly in the terminal:
./llama-cli -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Use Docker

docker model run hf.co/ProtoNeuron-3/Nucleus-V-1.5-7B

LM Studio
Jan
Ollama
How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Ollama:
```
ollama run hf.co/ProtoNeuron-3/Nucleus-V-1.5-7B
```

Unsloth Studio new

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ProtoNeuron-3/Nucleus-V-1.5-7B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ProtoNeuron-3/Nucleus-V-1.5-7B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ProtoNeuron-3/Nucleus-V-1.5-7B to start chatting

Pi new

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ProtoNeuron-3/Nucleus-V-1.5-7B"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ProtoNeuron-3/Nucleus-V-1.5-7B

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default ProtoNeuron-3/Nucleus-V-1.5-7B

Run Hermes

hermes

Docker Model Runner
How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Docker Model Runner:
```
docker model run hf.co/ProtoNeuron-3/Nucleus-V-1.5-7B
```

Lemonade

How to use ProtoNeuron-3/Nucleus-V-1.5-7B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ProtoNeuron-3/Nucleus-V-1.5-7B

Run and chat with the model

lemonade run user.Nucleus-V-1.5-7B-{{QUANT_TAG}}

List all available models

lemonade list

⚡ NEUATOMIC: NUCLEUS V1.5

THE LOGIC COMPRESSION BREAKTHROUGH

🤯 WORLD-CLASS REASONING, LAPTOP EFFICIENCY.

The industry claimed you need 175 Billion parameters for superior logic. We proved them wrong with 7 Billion. NeuAtomic: Nucleus V1.5 is engineered not just for performance, but for unprecedented cognitive density.

We compressed the logical capacity of an entire server farm into a 4.5 GB footprint.

👑 THE WORLD'S BEST 7B MODEL FOR REASONING EFFICIENCY.

🔬 THE AUDITED TRUTH: BENCHMARK BREAKDOWN

Our model was subjected to the industry-standard GSM8K (Grade School Math 8K) benchmark, which measures complex, multi-step reasoning—the ultimate test of an LLM's intelligence.

Metric	NeuAtomic Nucleus V1.5	Industry Baseline (GPT-3.5 Legacy)	The Competitive Edge
Parameters	7 Billion	175 Billion	25X Smaller
Reasoning Score (GSM8K Pass@1)	74.00% (AUDIT-PROOF)	~ 57.0% (Est. Base)	CRUSHES GPT-3.5
Inference Footprint	4-bit (~ 4.5 GB)	N/A	Deployable on a Laptop
Efficiency Index (Score/GB)	~ 16.4	~ 0.16 (Estimated)	100X More Parameter-Efficient

"Nucleus V1.5 achieves a 74.00% GSM8K score on a 4-bit model, a performance previously considered impossible for this parameter size. This validates our superior training methodology."

🛠️ CORE TECHNOLOGY: THE NEUATOMIC DIFFERENCE

Nucleus V1.5 is the result of a proprietary training methodology designed for extreme logical compression and inference efficiency.

Architecture: Optimized 7B Core, derived from the Qwen architecture. (The base architecture was the starting point; the performance is the result of our custom engineering.)
Training Focus: Deep Logical Compression—ensuring maximum reasoning capacity within the smallest footprint.
Identity Guard: The model maintains a rigid, hardened persona ("The Nucleus"), making it resilient against common prompt injection and role-play attacks.
Deployment Standard: Ships in the Q4_K_M GGUF format for best-in-class compatibility and speed across consumer hardware (via llama.cpp).

💡 DEPLOYMENT & USE CASES

NeuAtomic: Nucleus V1.5 is ideal for applications requiring high-fidelity logical processing where latency and cost are critical:

Algorithmic Trading & Financial Analysis.
Complex Data Validation & Querying.
Automated STEM Problem Solving.
Low-Cost, Edge-Based Reasoning Servers.

📥 GET STARTED

Download: Get the NeuAtomic_V2_Nucleus_Q4_K_M.gguf file from [Link to Hugging Face or Repository].
Prerequisites: Install the necessary backend for optimal performance.
```
pip install llama-cpp-python
```

Python Example (Inference):

from llama_cpp import Llama

# Load the highly efficient 4-bit model
llm = Llama(
    model_path="./NeuAtomic_V2_Nucleus_Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1 # Use GPU if available
)

# Test the core reasoning capability
prompt = "Q: I have 5 shirts. It takes 3 hours to dry 1 shirt in the sun. How long will it take to dry all 5 shirts together?\nA: Let's think step by step."

output = llm(
    prompt,
    max_tokens=256,
    temperature=0.2, # Low temperature for factual output
    stop=["Q:"],
    echo=True
)

print(output['choices'][0]['text'])

The giants are too slow. Efficiency is the new intelligence. — The NeuAtomic Team

Downloads last month: 11

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

GSM8K Pass@1
self-reported

0.740