Multilingual Whisper (Uz/En/Ru) — Fine-tuned Speech-to-Text Model

A fine-tuned Whisper Small model optimized to transcribe Uzbek, English, and Russian equally well.
This model is intended for real-world speech transcription with a balanced multilingual dataset and performs competitively against strong open-source and commercial STT solutions.

Model Details

Model Description

This model extends OpenAI Whisper Small by fine-tuning it on a multilingual speech mixture, aimed to deliver robust ASR performance for Uzbek, English, and Russian speakers.
The goal was to reduce the performance gap between languages, especially improving Uzbek speech recognition, where public ASR resources are scarce.

Model type: Automatic Speech Recognition (ASR)
Language(s): Uzbek 🇺🇿, English 🇬🇧, Russian 🇷🇺
License: Apache-2.0
Finetuned from: openai/whisper-small
Intended usage: Real-time & offline speech-to-text

Trained datasets:

DavronSherbaev/uzbekvoice-filtered
telegram-voice-messages (private collection)
navaistt-open-datasets
sovaai/russian-audiobooks
librispeech

Evaluation

Word Error Rate (WER) Comparison

All WER results were obtained using the same test set. The test set consists of real-world voice messages collected from public Telegram groups. It contains approximately 2 hours of audio data in total. The dataset will be made publicly available soon.

Model	WER ↓
Whisper-small-uz-v1	34.5%
Gemini (Commercial)	36.21%
NavaiSTT v2 (Open-Source medium model)	35.14%
Aisha STT (Commercial)	41.71%

The model outperforms both commercial and open-source Uzbek STT models, showing strong generalization for informal real-world speech.

Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio

model_id = "OvozifyLabs/whisper-small-uz-v1"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

audio, sr = torchaudio.load("audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(text)

Downloads last month: 146

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for OvozifyLabs/whisper-small-uz-v1

Base model

openai/whisper-small

Finetuned

(3101)

this model

OvozifyLabs
/

whisper-small-uz-v1