Mamba (State Space Model) - Turkish

This model demonstrates the efficiency of State Space Models (SSM) on morphologically rich languages. It is a Mamba architecture (115M parameters) trained from scratch on Turkish Wikipedia, achieving **2x faster inference speed** compared to a Transformer baseline of equivalent size.

Benchmark Results πŸ†

Model Architecture Throughput (tok/s) Latency (ms) Peak VRAM (MB) Final Loss (500 Steps)
Transformer (GPT-2) 67.53 14.81 ~1786 6.81
Mamba (Ours) 131.09 7.63 ~2469* 20.58

Model,Throughput (tok/s),Latency (ms/token),Final Loss (500 Steps) Transformer (Baseline),67.53,14.81,6.81 Mamba (SSM),131.09,7.63,20.58

> Note: VRAM usage for Mamba includes CUDA Graph overhead for maximum throughput.

Model Details

  • Architecture: Mamba (SSM)
  • Parameters: ~115 Million
  • Training Data: Turkish Wikipedia (Nov 2023)
  • Context Length: 1024

Usage (Requires mamba-ssm)

import torch
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from transformers import PreTrainedTokenizerFast

# 1. Load Tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("oguzatas/mamba-tr-project-tokenizer")

# 2. Load Mamba Model (Manual load required for custom arch)
model = MambaLMHeadModel.from_pretrained("oguzatas/mamba-tr-project-mamba", dtype=torch.float32, device="cuda")

# 3. Generate
text = "Yapay zeka teknolojileri"
input_ids = tokenizer.encode(text, return_tensors="pt").to("cuda")
output = model.generate(input_ids, max_length=50, cg=True) # cg=True enables CUDA Graphs

print(tokenizer.decode(output[0]))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using oguzatas/mamba-tr-project-mamba 1