Instructions to use ValiantLabs/gpt-oss-20b-ShiningValiant3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ValiantLabs/gpt-oss-20b-ShiningValiant3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ValiantLabs/gpt-oss-20b-ShiningValiant3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ValiantLabs/gpt-oss-20b-ShiningValiant3")
model = AutoModelForCausalLM.from_pretrained("ValiantLabs/gpt-oss-20b-ShiningValiant3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ValiantLabs/gpt-oss-20b-ShiningValiant3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ValiantLabs/gpt-oss-20b-ShiningValiant3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/gpt-oss-20b-ShiningValiant3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ValiantLabs/gpt-oss-20b-ShiningValiant3

SGLang

How to use ValiantLabs/gpt-oss-20b-ShiningValiant3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ValiantLabs/gpt-oss-20b-ShiningValiant3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/gpt-oss-20b-ShiningValiant3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ValiantLabs/gpt-oss-20b-ShiningValiant3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/gpt-oss-20b-ShiningValiant3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ValiantLabs/gpt-oss-20b-ShiningValiant3 with Docker Model Runner:
```
docker model run hf.co/ValiantLabs/gpt-oss-20b-ShiningValiant3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Support our open-source dataset and model releases!

Shining Valiant 3: Qwen3-1.7B, Qwen3-4B, Qwen3-8B, Ministral-3-14B-Reasoning-2512, gpt-oss-20b

Shining Valiant 3 is a science, AI design, and general reasoning specialist built on gpt-oss-20b.

Finetuned on our newest science reasoning data generated with Deepseek R1 0528!
AI to build AI: our high-difficulty AI reasoning data makes Shining Valiant 3 your friend for building with current AI tech and discovering new innovations and improvements!
Improved general and creative reasoning to supplement problem-solving and general chat performance.
Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Prompting Guide

Shining Valiant 3 uses the gpt-oss-20b prompt format.

Shining Valiant 3 is a reasoning finetune; reasoning level high is generally recommended.

NOTE: This release of Shining Valiant 3 currently uses bf16 for all parameters. Consider quantized models if you're not looking to use bf16.

Example inference script provided by gpt-oss-20b to get started:

from transformers import pipeline
import torch

model_id = "ValiantLabs/gpt-oss-20b-ShiningValiant3"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Reversible Cellular Automata (RCAs) are CAs that have an inverse rule, allowing the simulation to run backward in time. Explain the theoretical significance of RCAs in the context of modeling physical laws that are time-symmetric. Describe the additional constraints that must be placed on a rule set to ensure it is reversible and discuss the challenges in constructing non-trivial reversible rules."},
]

outputs = pipe(
    messages,
    max_new_tokens=12000,
)
print(outputs[0]["generated_text"][-1])