GigaChat3-10B-A1.8B

ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»ΡΠ΅ΠΌ GigaChat3-10B-A1.8B β€” Π΄ΠΈΠ°Π»ΠΎΠ³ΠΎΠ²ΡƒΡŽ модСль сСмСйства GigaChat. МодСль основана Π½Π° Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π΅ Mixture-of-Experts (MoE) с 10B ΠΎΠ±Ρ‰ΠΈΡ… ΠΈ 1.8B Π°ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ². АрхитСктура Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Multi-head Latent Attention (MLA) ΠΈ Multi-Token Prediction (MTP), Π·Π° счСт Ρ‡Π΅Π³ΠΎ модСль ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Π½Π° для высокой пропускной способности (throughput) ΠΏΡ€ΠΈ инфСрСнсС. МодСль ΠΎΠ±ΡƒΡ‡Π΅Π½Π° ΠΏΠΎΠ²Π΅Ρ€Ρ… нашСй Π±Π°Π·ΠΎΠ²ΠΎΠΉ вСрсии (GigaChat3-10B-A1.8B-base) с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ высококачСствСнных SFT-Π΄Π°Π½Π½Ρ‹Ρ…. Данная вСрсия ΠΏΡ€Π΅Π΄Π½Π°Π·Π½Π°Ρ‡Π΅Π½Π° для Π²Ρ‹ΡΠΎΠΊΠΎΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ инфСрСнса Π² fp8, модСль Π² bf16 β€” GigaChat3-10B-A1.8B. Π‘ΠΎΠ»ΡŒΡˆΠ΅ подробностСй Π² Ρ…Π°Π±Ρ€ ΡΡ‚Π°Ρ‚ΡŒΠ΅.

АрхитСктура ΠΌΠΎΠ΄Π΅Π»ΠΈ

GigaChat3-10B-A1.8B ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ ΠΊΠ°ΡΡ‚ΠΎΠΌΠ½ΡƒΡŽ MoE-Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Ρƒ:

Multi-head Latent Attention (MLA)

ВмСсто стандартного Multi-head Attention модСль ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ MLA. MLA обСспСчиваСт эффСктивный инфСрСнс Π·Π° счСт сТатия Key-Value (KV) кэша Π² Π»Π°Ρ‚Π΅Π½Ρ‚Π½Ρ‹ΠΉ Π²Π΅ΠΊΡ‚ΠΎΡ€, Ρ‡Ρ‚ΠΎ Π·Π½Π°Ρ‡ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ сниТаСт трСбования ΠΊ памяти ΠΈ ускоряСт ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΡƒ.

Multi-Token Prediction (MTP)

МодСль ΠΎΠ±ΡƒΡ‡Π΅Π½Π° с использованиСм Π·Π°Π΄Π°Ρ‡ΠΈ Multi-Token Prediction (MTP). Π­Ρ‚ΠΎ позволяСт ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΏΡ€Π΅Π΄ΡΠΊΠ°Π·Ρ‹Π²Π°Ρ‚ΡŒ нСсколько Ρ‚ΠΎΠΊΠ΅Π½ΠΎΠ² Π·Π° ΠΎΠ΄ΠΈΠ½ ΠΏΡ€ΠΎΡ…ΠΎΠ΄, Ρ‡Ρ‚ΠΎ ускоряСт Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΡŽ Π΄ΠΎ 40% с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ Ρ‚Π΅Ρ…Π½ΠΈΠΊ спСкулятивной/ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½ΠΎΠΉ Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ.

Π”Π°Π½Π½Ρ‹Π΅ для обучСния

МодСль ΠΎΠ±ΡƒΡ‡Π΅Π½Π° Π½Π° 20Π’ Ρ‚ΠΎΠΊΠ΅Π½ΠΎΠ². ΠœΡ‹ Π΄ΠΎΠ±Π°Π²ΠΈΠ»ΠΈ 10 языков β€” ΠΎΡ‚ китайского ΠΈ арабского Π΄ΠΎ узбСкского ΠΈ казахского, Π° Ρ‚Π°ΠΊΠΆΠ΅ Ρ€Π°ΡΡˆΠΈΡ€ΠΈΠ»ΠΈ Π½Π°Π±ΠΎΡ€ источников: ΠΊΠ½ΠΈΠ³ΠΈ, акадСмичСскиС Π΄Π°Π½Π½Ρ‹Π΅, датасСты ΠΏΠΎ ΠΊΠΎΠ΄Ρƒ ΠΈ ΠΌΠ°Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΠΊΠ΅. ВсС Π΄Π°Π½Π½Ρ‹Π΅ проходят Π΄Π΅Π΄ΡƒΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡŽ, ΡΠ·Ρ‹ΠΊΠΎΠ²ΡƒΡŽ Ρ„ΠΈΠ»ΡŒΡ‚Ρ€Π°Ρ†ΠΈΡŽ ΠΈ автоматичСскиС ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ качСства ΠΏΡ€ΠΈ ΠΏΠΎΠΌΠΎΡ‰ΠΈ эвристик ΠΈ классификаторов. ΠšΠ»ΡŽΡ‡Π΅Π²ΠΎΠΉ Π²ΠΊΠ»Π°Π΄ Π² качСство внСсла синтСтика: ΠΌΡ‹ сгСнСрировали ΠΎΠΊΠΎΠ»ΠΎ 5,5 Ρ‚Ρ€ΠΈΠ»Π»ΠΈΠΎΠ½ΠΎΠ² Ρ‚ΠΎΠΊΠ΅Π½ΠΎΠ² синтСтичСских Π΄Π°Π½Π½Ρ‹Ρ…. Π’ корпус входят вопросы-ΠΎΡ‚Π²Π΅Ρ‚Ρ‹ ΠΊ тСкстам, Ρ†Π΅ΠΏΠΎΡ‡ΠΊΠΈ reverse-prompt для структурирования Π΄Π°Π½Π½Ρ‹Ρ…, LLM-Π·Π°ΠΌΠ΅Ρ‚ΠΊΠΈ с коммСнтариями ΠΎΡ‚ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π²Π½ΡƒΡ‚Ρ€ΠΈ тСкстов, ΠΌΠΈΠ»Π»ΠΈΠΎΠ½Ρ‹ синтСтичСских Π·Π°Π΄Π°Ρ‡ с Ρ€Π΅ΡˆΠ΅Π½ΠΈΡΠΌΠΈ ΠΏΠΎ ΠΌΠ°Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΠΊΠ΅ ΠΈ ΠΎΠ»ΠΈΠΌΠΏΠΈΠ°Π΄Π½ΠΎΠΌΡƒ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠΈΡ€ΠΎΠ²Π°Π½ΠΈΡŽ (с синтСтичСскими тСстами) Π½Π° основС PromptCot.

Π˜Π½Ρ„Π΅Ρ€Π΅Π½Ρ

Одно ΠΈΠ· ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… прСимущСств GigaChat3-10B-A1.8B β€” ΡΠΊΠΎΡ€ΠΎΡΡ‚ΡŒ инфСрСнса. МодСль (особСнно Π² Ρ€Π΅ΠΆΠΈΠΌΠ΅ MTP) дСмонстрируСт ΠΏΡ€ΠΎΠΏΡƒΡΠΊΠ½ΡƒΡŽ ΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡ‚ΡŒ, ΡΠΎΠΏΠΎΡΡ‚Π°Π²ΠΈΠΌΡƒΡŽ с пропускной ΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡ‚ΡŒΡŽ Π·Π½Π°Ρ‡ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ ΠΌΠ΅Π½ΡŒΡˆΠΈΡ… dense‑модСлСй. ΠœΡ‹ измСряли с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ vLLM v0.11.0, Π½Π° Ρ‚ΠΈΠΏΠ΅ bfloat16 c batch_size=1. Бсылка Π½Π° ΠΊΠΎΠ΄.

МодСль request_throughput output_throughput total_token_throughput mean_ttft_ms
Qwen3-1.7B 1.689 357.308 726.093 11.824
mtp-GigaChat3-10B-A1.8B-base 1.533 333.620 678.894 26.345
GigaChat3-10B-A1.8B-base 1.077 234.363 476.912 31.053
Qwen3-4B 0.978 206.849 420.341 14.947
Qwen3-8B 0.664 140.432 285.375 16.663
YandexGPT-5-Lite-8B-pretrain 0.641 147.305 300.269 16.711

Π‘Π΅Π½Ρ‡ΠΌΠ°Ρ€ΠΊΠΈ

Π₯отя модСль ΠΈΠΌΠ΅Π΅Ρ‚ 10 ΠΌΠΈΠ»Π»ΠΈΠ°Ρ€Π΄ΠΎΠ² ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ², Π΅Ρ‘ прямыС Π°Π½Π°Π»ΠΎΠ³ΠΈ β€” ΠΌΠΎΠ΄Π΅Π»ΠΈ Ρ€Π°Π·ΠΌΠ΅Ρ€ΠΎΠΌ 3–4 ΠΌΠΈΠ»Π»ΠΈΠ°Ρ€Π΄Π° ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ². Однако благодаря высокой скорости Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ ΠΌΡ‹ Ρ‚Π°ΠΊΠΆΠ΅ сравниваСм Π΅Ρ‘ с Π΅Ρ‰Ρ‘ Π±ΠΎΠ»Π΅Π΅ ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½Ρ‹ΠΌΠΈ модСлями.

ΠœΠ΅Ρ‚Ρ€ΠΈΠΊΠ° GigaChat 3 Lightning Qwen3-1.7B-Instruct Qwen3-4B-Instruct-2507 SmolLM3
MMLU_RU_FIVE_SHOT 0.6833 0.4876 0.5972 0.4998
RUBQ_ZERO_SHOT 0.6516 0.2557 0.3170 0.6363
MMLU_PRO_EN_FIVE_SHOT 0.6061 0.410 0.6849 0.5013
MMLU_EN_FIVE_SHOT 0.7403 0.60 0.7080 0.5992
BBH_THREE_SHOT 0.4525 0.3317 0.7165 0.4161
SuperGPQA 0.2731 0.2092 0.3745 0.2459
MATH_500_FOUR_SHOT 0.7000 0.7520 0.8880 0.8020
GPQA_COT_ZERO_SHOT 0.3502 0.2651 0.5370 0.3704
LiveCodeBench_ZERO_SHOT 0.2031 0.0794 0.3046 0.1656
HUMAN_EVAL_PLUS_ZERO_SHOT 0.6951 0.6280 0.8780 0.7012

Как ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΈΡ‚ΡŒ ΠΌΠ΅Ρ‚Ρ€ΠΈΠΊΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ

# lm-eval[api]==0.4.9.1
# sglang[all]==0.5.5
# ΠΈΠ»ΠΈ 
# vllm==0.11.2

export HF_ALLOW_CODE_EVAL=1

# sglang server up

# 10B
python -m sglang.launch_server --model-path <path_to_model> --host 127.0.0.1 --port 30000 --dtype auto --mem-fraction-static 0.88 --trust-remote-code --allow-auto-truncate --speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2

# mmlu pro check
python -m lm_eval --model sglang-generate --output_path <path_to_model> --batch_size 16 --model_args base_url=http://127.0.0.1:30000/generate,num_concurrent=16,tokenized_requests=True,max_length=131072,tokenizer=<path_to_model> --trust_remote_code --confirm_run_unsafe_code --num_fewshot 5 --tasks mmlu_pro

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ использования (Quickstart)

1. transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "ai-sage/GigaChat3-10B-A1.8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)

messages = [
    {"role": "user", "content": "Π”ΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΎΡ€Π΅ΠΌΡƒ ΠΎ Π½Π΅ΠΏΠΎΠ΄Π²ΠΈΠΆΠ½ΠΎΠΉ Ρ‚ΠΎΡ‡ΠΊΠ΅"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1000)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False)
print(result)

2. vLLM

Запуск сСрвСра

# VLLM DeepGemm conflicts with our hidden dim size.
# Fix: Disable it via env var (VLLM_USE_DEEP_GEMM=0).
VLLM_USE_DEEP_GEMM=0 vllm serve ai-sage/GigaChat3-10B-A1.8B \
  --dtype "auto" \
  --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": false}'

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ запроса

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-sage/GigaChat3-10B-A1.8B",
    "messages": [
      {
        "role": "user",
        "content": "Π”ΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΎΡ€Π΅ΠΌΡƒ ΠΎ Π½Π΅ΠΏΠΎΠ΄Π²ΠΈΠΆΠ½ΠΎΠΉ Ρ‚ΠΎΡ‡ΠΊΠ΅"
      }
    ],
    "max_tokens": 400,
    "temperature": 0
  }'

3. SGLang

Запуск сСрвСра

python -m sglang.launch_server \
  --model-path ai-sage/GigaChat3-10B-A1.8B \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 1 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 2

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ запроса

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-sage/GigaChat3-10B-A1.8B",
    "messages": [
      {
        "role": "user",
        "content": "Π”ΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΎΡ€Π΅ΠΌΡƒ ΠΎ Π½Π΅ΠΏΠΎΠ΄Π²ΠΈΠΆΠ½ΠΎΠΉ Ρ‚ΠΎΡ‡ΠΊΠ΅"
      }
    ],
    "max_tokens": 1000,
    "temperature": 0
  }'

Function call

1. transformers

Click for a dropdown
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import json
import re
REGEX_FUNCTION_CALL_V3 = re.compile(r"function call<\|role_sep\|>\n(.*)$", re.DOTALL)
REGEX_CONTENT_PATTERN = re.compile(r"^(.*?)<\|message_sep\|>", re.DOTALL)
def parse_function_and_content(completion_str: str):
    """
    Using the regexes the user provided, attempt to extract function call and content.
    Returns (function_call_str_or_None, content_str_or_None)
    """

    function_call = None
    content = None

    m_func = REGEX_FUNCTION_CALL_V3.search(completion_str)
    if m_func:
        try:
            function_call = json.loads(m_func.group(1))
            if isinstance(function_call, dict) and "name" in function_call and "arguments" in function_call:
                if not isinstance(function_call["arguments"], dict):
                    function_call = None
            else:
                function_call = None
        except json.JSONDecodeError:
            function_call = None

            # will return raw string in failed attempt of function calling
            return function_call, completion_str

    m_content = REGEX_CONTENT_PATTERN.search(completion_str)
    if m_content:
        content = m_content.group(1)
    else:
        # as a fallback, everything before the first message_sep marker if present
        if "<|message_sep|>" in completion_str:
            content = completion_str.split("<|message_sep|>")[0]
        else:
            content = completion_str

    return function_call, content

model_name = "ai-sage/GigaChat3-10B-A1.8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
tools = [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "ΠŸΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ Ρ‚Π΅ΠΊΡƒΡ‰Π΅ΠΉ ΠΏΠΎΠ³ΠΎΠ΄Π΅ Π² ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠΌ Π³ΠΎΡ€ΠΎΠ΄Π΅.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "НазваниС Π³ΠΎΡ€ΠΎΠ΄Π° (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, Москва, Казань)."
            }
          },
          "required": ["city"]
        }
      }
    }
]
messages = [
    {"role": "user", "content": "Какая сСйчас ΠΏΠΎΠ³ΠΎΠ΄Π° Π² МосквС?"}
]
input_tensor = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1000)

result = parse_function_and_content(tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False))[0]
print(result)

2. vLLM

Π‘ΠΎΠ±Π΅Ρ€ΠΈΡ‚Π΅ dev Π²Π΅Ρ€ΡΠΈΡŽ, ΠΊΠΎΠΌΠΌΠΈΡ‚>=21bb323)

Запуск сСрвСра

# VLLM DeepGemm conflicts with our hidden dim size.
# Fix: Disable it via env var (VLLM_USE_DEEP_GEMM=0).
VLLM_USE_DEEP_GEMM=0 vllm serve ai-sage/GigaChat3-10B-A1.8B \
  --dtype "auto" \
  --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": false}' \
  --enable-auto-tool-choice \
  --tool-call-parser gigachat3

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ запроса

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "ai-sage/GigaChat3-10B-A1.8B",
  "temperature": 0,
  "messages": [
    {
      "role": "user",
      "content": "Какая сСйчас ΠΏΠΎΠ³ΠΎΠ΄Π° Π² МосквС?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "ΠŸΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ Ρ‚Π΅ΠΊΡƒΡ‰Π΅ΠΉ ΠΏΠΎΠ³ΠΎΠ΄Π΅ Π² ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠΌ Π³ΠΎΡ€ΠΎΠ΄Π΅.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "НазваниС Π³ΠΎΡ€ΠΎΠ΄Π° (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, Москва, Казань)."
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}'

3. SGLang

Π‘ΠΎΠ±Π΅Ρ€ΠΈΡ‚Π΅ dev Π²Π΅Ρ€ΡΠΈΡŽ Π½Π° Π΄Π°Π½Π½ΠΎΠΉ Π²Π΅Ρ‚ΠΊΠ΅ - https://github.com/sgl-project/sglang/pull/14765.

Запуск сСрвСра

python -m sglang.launch_server \
  --model-path ai-sage/GigaChat3-10B-A1.8B \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 1 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 2
  --tool-call-parser gigachat3

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ запроса

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "ai-sage/GigaChat3-10B-A1.8B",
  "temperature": 0,
  "messages": [
    {
      "role": "user",
      "content": "Какая сСйчас ΠΏΠΎΠ³ΠΎΠ΄Π° Π² МосквС?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "ΠŸΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ Ρ‚Π΅ΠΊΡƒΡ‰Π΅ΠΉ ΠΏΠΎΠ³ΠΎΠ΄Π΅ Π² ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠΌ Π³ΠΎΡ€ΠΎΠ΄Π΅.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "НазваниС Π³ΠΎΡ€ΠΎΠ΄Π° (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, Москва, Казань)."
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}'
Downloads last month
8,351
Safetensors
Model size
11B params
Tensor type
F32
Β·
BF16
Β·
F8_E4M3
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for ai-sage/GigaChat3-10B-A1.8B

Quantized
(13)
this model
Quantizations
8 models

Spaces using ai-sage/GigaChat3-10B-A1.8B 2

Collection including ai-sage/GigaChat3-10B-A1.8B