SpeechT5 Crimean Tatar TTS - Sevil (Female Voice)
Text-to-Speech model for Crimean Tatar language (Qırımtatar tili) based on Microsoft SpeechT5.
Model Details
- Base Model: microsoft/speecht5_tts
- Voice: Sevil (female)
- Training Data: 1,566 audio recordings
- Text Format: Phonetic v4 mapping (Latin → English phonetic)
Phonetic Mapping
This model uses phonetic v4 mapping to convert Crimean Tatar Latin script to English-friendly phonetic representation:
| Original | Phonetic | Sound |
|---|---|---|
| ğ | gh | soft g |
| ç | ch | ch |
| ş | sh | sh |
| ñ | ng | ng |
| ı | y | ы |
| ö | o | o |
| ü | u | u |
| j | zh | ж |
| c | dj | дж |
Usage
import torch
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
# Load model
model_id = "servinosmanov/speecht5-crh-sevil"
processor = SpeechT5Processor.from_pretrained(model_id)
model = SpeechT5ForTextToSpeech.from_pretrained(model_id)
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# Load speaker embedding
speaker_embedding = torch.load("speaker_embedding.pt")
# Phonetic mapping function
def to_phonetic(text):
replacements = [
('ğ', 'gh'), ('Ğ', 'GH'),
('ç', 'ch'), ('Ç', 'CH'),
('ş', 'sh'), ('Ş', 'SH'),
('ñ', 'ng'), ('Ñ', 'NG'),
('ı', 'y'), ('I', 'Y'),
('ö', 'o'), ('ü', 'u'),
('j', 'zh'), ('J', 'ZH'),
('c', 'dj'), ('C', 'DJ'),
]
for old, new in replacements:
text = text.replace(old, new)
return text
# Generate speech
text = "Selam aleykum"
inputs = processor(text=text, return_tensors="pt")
speech = model.generate_speech(inputs["input_ids"], speaker_embedding.unsqueeze(0), vocoder=vocoder)
# Save audio
import soundfile as sf
sf.write("output.wav", speech.numpy(), samplerate=16000)
Training
- Epochs: 300
- Batch size: 4
- Gradient accumulation: 8
- Learning rate: 1e-4
License
Apache 2.0
Citation
Osmanov, Servin (2025). speecht5-crh-sevil: Crimean Tatar Text-to-Speech Model. HuggingFace. https://huggingface.co/servinosmanov/speecht5-crh-sevil
Related
- Dataset: tts-crh-sevil-fixed
- Downloads last month
- 27