SpeechT5 Crimean Tatar TTS - Sevil (Female Voice)

Text-to-Speech model for Crimean Tatar language (Qırımtatar tili) based on Microsoft SpeechT5.

Model Details

  • Base Model: microsoft/speecht5_tts
  • Voice: Sevil (female)
  • Training Data: 1,566 audio recordings
  • Text Format: Phonetic v4 mapping (Latin → English phonetic)

Phonetic Mapping

This model uses phonetic v4 mapping to convert Crimean Tatar Latin script to English-friendly phonetic representation:

Original Phonetic Sound
ğ gh soft g
ç ch ch
ş sh sh
ñ ng ng
ı y ы
ö o o
ü u u
j zh ж
c dj дж

Usage

import torch
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan

# Load model
model_id = "servinosmanov/speecht5-crh-sevil"
processor = SpeechT5Processor.from_pretrained(model_id)
model = SpeechT5ForTextToSpeech.from_pretrained(model_id)
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Load speaker embedding
speaker_embedding = torch.load("speaker_embedding.pt")

# Phonetic mapping function
def to_phonetic(text):
    replacements = [
        ('ğ', 'gh'), ('Ğ', 'GH'),
        ('ç', 'ch'), ('Ç', 'CH'),
        ('ş', 'sh'), ('Ş', 'SH'),
        ('ñ', 'ng'), ('Ñ', 'NG'),
        ('ı', 'y'), ('I', 'Y'),
        ('ö', 'o'), ('ü', 'u'),
        ('j', 'zh'), ('J', 'ZH'),
        ('c', 'dj'), ('C', 'DJ'),
    ]
    for old, new in replacements:
        text = text.replace(old, new)
    return text

# Generate speech
text = "Selam aleykum"
inputs = processor(text=text, return_tensors="pt")
speech = model.generate_speech(inputs["input_ids"], speaker_embedding.unsqueeze(0), vocoder=vocoder)

# Save audio
import soundfile as sf
sf.write("output.wav", speech.numpy(), samplerate=16000)

Training

  • Epochs: 300
  • Batch size: 4
  • Gradient accumulation: 8
  • Learning rate: 1e-4

License

Apache 2.0

Citation

Osmanov, Servin (2025). speecht5-crh-sevil: Crimean Tatar Text-to-Speech Model. HuggingFace. https://huggingface.co/servinosmanov/speecht5-crh-sevil

Related

Downloads last month
27
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using servinosmanov/speecht5-crh-sevil 1