Instructions to use FILM6912/typhoon2.5-qwen3-30b-a3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FILM6912/typhoon2.5-qwen3-30b-a3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FILM6912/typhoon2.5-qwen3-30b-a3b")
model = AutoModelForCausalLM.from_pretrained("FILM6912/typhoon2.5-qwen3-30b-a3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FILM6912/typhoon2.5-qwen3-30b-a3b",
	filename="GGUF/F16/typhoon2.5-qwen3-30b-a3b.F16-00001-of-00002.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16
# Run inference directly in the terminal:
llama-cli -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16
# Run inference directly in the terminal:
llama-cli -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16
# Run inference directly in the terminal:
./llama-cli -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Use Docker

docker model run hf.co/FILM6912/typhoon2.5-qwen3-30b-a3b:F16

LM Studio
Jan

vLLM

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FILM6912/typhoon2.5-qwen3-30b-a3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FILM6912/typhoon2.5-qwen3-30b-a3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FILM6912/typhoon2.5-qwen3-30b-a3b:F16

SGLang

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FILM6912/typhoon2.5-qwen3-30b-a3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FILM6912/typhoon2.5-qwen3-30b-a3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FILM6912/typhoon2.5-qwen3-30b-a3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FILM6912/typhoon2.5-qwen3-30b-a3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Ollama:
```
ollama run hf.co/FILM6912/typhoon2.5-qwen3-30b-a3b:F16
```

Unsloth Studio new

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FILM6912/typhoon2.5-qwen3-30b-a3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FILM6912/typhoon2.5-qwen3-30b-a3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FILM6912/typhoon2.5-qwen3-30b-a3b to start chatting

Pi new

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FILM6912/typhoon2.5-qwen3-30b-a3b:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Run Hermes

hermes

Docker Model Runner
How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Docker Model Runner:
```
docker model run hf.co/FILM6912/typhoon2.5-qwen3-30b-a3b:F16
```

Lemonade

How to use FILM6912/typhoon2.5-qwen3-30b-a3b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FILM6912/typhoon2.5-qwen3-30b-a3b:F16

Run and chat with the model

lemonade run user.typhoon2.5-qwen3-30b-a3b-F16

List all available models

lemonade list

Typhoon2.5-Qwen3-30B-A3B: Thai Large Language Model (Instruct)

Typhoon2.5-Qwen3-30B-A3B is a instruct Thai 🇹🇭 large language model with 3 billion active parameters, a 256k context length, and function-calling capabilities. It is based on Qwen3 30B A3B.

Performance

Model Description

Model type: A 30B instruct mixture of expert (3B active) decoder-only model based on Qwen3 architecture.
Requirement: transformers 4.51.0 or newer.
Primary Language(s): Thai 🇹🇭 and English 🇬🇧
Context Length: 256K
License: Apache 2.0 License

Usage Example

This code snippet shows how to use the Typhoon2.5-Qwen3-30B-A3B model for Thai or English text generation using the transformers library. It includes setting up the model and tokenizer, formatting chat messages in a system-user style, and generating a response.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "scb10x/typhoon2.5-qwen3-30b-a3b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a male AI assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest. Typhoon is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks. Typhoon responds directly to all human messages without unnecessary affirmations or filler phrases like “Certainly!”, “Of course!”, “Absolutely!”, “Great!”, “Sure!”, etc. Specifically, Typhoon avoids starting responses with the word “Certainly” in any way. Typhoon follows this information in all languages, and always responds to the user in the language they use or request. Typhoon is now being connected with a human. Write in fluid, conversational prose, Show genuine interest in understanding requests, Express appropriate emotions and empathy. Also showing information in term that is easy to understand and visualized."},
    {"role": "user", "content": "ขอสูตรไก่ย่าง"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.05,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Deploy as Server

This section shows how to run Typhoon2.5 as an OpenAI-compatible API server using vllm.

pip install vllm
vllm serve scb10x/typhoon2.5-qwen3-30b-a3b --max-model-len 8192 --tool-call-parser hermes --enable-auto-tool-choice --gpu-memory-utilization 0.95
# adjust --max-model-len based on your avaliable memory

Using Tools

You can provide tools to the vLLM-powered OpenAI-compatible API for functionality.

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

def get_weather(location: str, unit: str):
    return f"Getting the weather for {location} in {unit}..."

tool_functions = {"get_weather": get_weather}

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
    extra_body={
      "repetition_penalty": 1.05
    }
)

tool_call = response.choices[0].message.tool_calls[0].function
print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}")
print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")

Sampling Parameters

For this model, we encourage you to use a low temperature and set "repetition_penalty" = 1.05 to improve performance and reduce repetition.

Intended Uses & Limitations

This model is an instructional model. However, it’s still undergoing development. It incorporates some level of guardrails, but it still may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.

https://twitter.com/opentyphoon

Support

https://discord.gg/us5gAYmrxw

Citation

If you find Typhoon2 useful for your work, please cite it using:

@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}

Downloads last month: 40

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for FILM6912/typhoon2.5-qwen3-30b-a3b

Base model

typhoon-ai/typhoon2.5-qwen3-30b-a3b

Quantized

(4)

this model

Paper for FILM6912/typhoon2.5-qwen3-30b-a3b

Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models

Paper • 2412.13702 • Published Dec 18, 2024 • 2

FILM6912
/

typhoon2.5-qwen3-30b-a3b