Mamba (State Space Model) - Turkish
This model demonstrates the efficiency of State Space Models (SSM) on morphologically rich languages. It is a Mamba architecture (115M parameters) trained from scratch on Turkish Wikipedia, achieving **2x faster inference speed** compared to a Transformer baseline of equivalent size.
Benchmark Results π
| Model Architecture | Throughput (tok/s) | Latency (ms) | Peak VRAM (MB) | Final Loss (500 Steps) |
|---|---|---|---|---|
| Transformer (GPT-2) | 67.53 | 14.81 | ~1786 | 6.81 |
| Mamba (Ours) | 131.09 | 7.63 | ~2469* | 20.58 |
Model,Throughput (tok/s),Latency (ms/token),Final Loss (500 Steps) Transformer (Baseline),67.53,14.81,6.81 Mamba (SSM),131.09,7.63,20.58
> Note: VRAM usage for Mamba includes CUDA Graph overhead for maximum throughput.
Model Details
- Architecture: Mamba (SSM)
- Parameters: ~115 Million
- Training Data: Turkish Wikipedia (Nov 2023)
- Context Length: 1024
Usage (Requires mamba-ssm)
import torch
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from transformers import PreTrainedTokenizerFast
# 1. Load Tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("oguzatas/mamba-tr-project-tokenizer")
# 2. Load Mamba Model (Manual load required for custom arch)
model = MambaLMHeadModel.from_pretrained("oguzatas/mamba-tr-project-mamba", dtype=torch.float32, device="cuda")
# 3. Generate
text = "Yapay zeka teknolojileri"
input_ids = tokenizer.encode(text, return_tensors="pt").to("cuda")
output = model.generate(input_ids, max_length=50, cg=True) # cg=True enables CUDA Graphs
print(tokenizer.decode(output[0]))