SpeechT5 Crimean Tatar TTS - Sevil (Female Voice)

Text-to-Speech model for Crimean Tatar language (Qırımtatar tili) based on Microsoft SpeechT5.

Model Details

Base Model: microsoft/speecht5_tts
Voice: Sevil (female)
Training Data: 1,566 audio recordings
Text Format: Phonetic v4 mapping (Latin → English phonetic)

Phonetic Mapping

This model uses phonetic v4 mapping to convert Crimean Tatar Latin script to English-friendly phonetic representation:

Original	Phonetic	Sound
ğ	gh	soft g
ç	ch	ch
ş	sh	sh
ñ	ng	ng
ı	y	ы
ö	o	o
ü	u	u
j	zh	ж
c	dj	дж

Usage

import torch
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan

# Load model
model_id = "servinosmanov/speecht5-crh-sevil"
processor = SpeechT5Processor.from_pretrained(model_id)
model = SpeechT5ForTextToSpeech.from_pretrained(model_id)
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Load speaker embedding
speaker_embedding = torch.load("speaker_embedding.pt")

# Phonetic mapping function
def to_phonetic(text):
    replacements = [
        ('ğ', 'gh'), ('Ğ', 'GH'),
        ('ç', 'ch'), ('Ç', 'CH'),
        ('ş', 'sh'), ('Ş', 'SH'),
        ('ñ', 'ng'), ('Ñ', 'NG'),
        ('ı', 'y'), ('I', 'Y'),
        ('ö', 'o'), ('ü', 'u'),
        ('j', 'zh'), ('J', 'ZH'),
        ('c', 'dj'), ('C', 'DJ'),
    ]
    for old, new in replacements:
        text = text.replace(old, new)
    return text

# Generate speech
text = "Selam aleykum"
inputs = processor(text=text, return_tensors="pt")
speech = model.generate_speech(inputs["input_ids"], speaker_embedding.unsqueeze(0), vocoder=vocoder)

# Save audio
import soundfile as sf
sf.write("output.wav", speech.numpy(), samplerate=16000)

Training

Epochs: 300
Batch size: 4
Gradient accumulation: 8
Learning rate: 1e-4

License

Apache 2.0

Citation

Osmanov, Servin (2025). speecht5-crh-sevil: Crimean Tatar Text-to-Speech Model. HuggingFace. https://huggingface.co/servinosmanov/speecht5-crh-sevil

Dataset: tts-crh-sevil-fixed

Downloads last month: 27

Safetensors

Model size

0.1B params

Tensor type

F32

servinosmanov
/

speecht5-crh-sevil