Multilingual Whisper (Uz/En/Ru) β Fine-tuned Speech-to-Text Model
A fine-tuned Whisper Small model optimized to transcribe Uzbek, English, and Russian equally well.
This model is intended for real-world speech transcription with a balanced multilingual dataset and performs competitively against strong open-source and commercial STT solutions.
Model Details
Model Description
This model extends OpenAI Whisper Small by fine-tuning it on a multilingual speech mixture, aimed to deliver robust ASR performance for Uzbek, English, and Russian speakers.
The goal was to reduce the performance gap between languages, especially improving Uzbek speech recognition, where public ASR resources are scarce.
- Model type: Automatic Speech Recognition (ASR)
- Language(s): Uzbek πΊπΏ, English π¬π§, Russian π·πΊ
- License: Apache-2.0
- Finetuned from: openai/whisper-small
- Intended usage: Real-time & offline speech-to-text
Trained datasets:
- DavronSherbaev/uzbekvoice-filtered
- telegram-voice-messages (private collection)
- navaistt-open-datasets
- sovaai/russian-audiobooks
- librispeech
Evaluation
Word Error Rate (WER) Comparison
All WER results were obtained using the same test set. The test set consists of real-world voice messages collected from public Telegram groups. It contains approximately 2 hours of audio data in total. The dataset will be made publicly available soon.
| Model | WER β |
|---|---|
| Whisper-small-uz-v1 | 34.5% |
| Gemini (Commercial) | 36.21% |
| NavaiSTT v2 (Open-Source medium model) | 35.14% |
| Aisha STT (Commercial) | 41.71% |
The model outperforms both commercial and open-source Uzbek STT models, showing strong generalization for informal real-world speech.
Usage Example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio
model_id = "OvozifyLabs/whisper-small-uz-v1"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
audio, sr = torchaudio.load("audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)
- Downloads last month
- 146
Model tree for OvozifyLabs/whisper-small-uz-v1
Base model
openai/whisper-small