Automatic Speech Recognition
Transformers
Safetensors
Azerbaijani
whisper
azerbaijani
asr
speech
fine-tuned
Eval Results (legacy)
Instructions to use LocalDoc/azerbaijani-whisper-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LocalDoc/azerbaijani-whisper-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="LocalDoc/azerbaijani-whisper-small")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-small") model = AutoModelForSpeechSeq2Seq.from_pretrained("LocalDoc/azerbaijani-whisper-small") - Notebooks
- Google Colab
- Kaggle
metadata
language:
- az
license: apache-2.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- whisper
- azerbaijani
- asr
- speech
- fine-tuned
base_model: openai/whisper-small
datasets:
- LocalDoc/azerbaijani_asr
- LocalDoc/fleurs-azerbaijani-asr
metrics:
- wer
- cer
model-index:
- name: azerbaijani-whisper-small
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
type: LocalDoc/fleurs-azerbaijani-asr
name: FLEURS Azerbaijani
split: test
metrics:
- type: wer
value: 20.54
name: WER
- type: cer
value: 5.72
name: CER
Azerbaijani Whisper Small
Fine-tuned openai/whisper-small for Azerbaijani automatic speech recognition.
Performance
| Model | Params | WER | CER |
|---|---|---|---|
| whisper-small (baseline) | 242M | 52.17% | 14.52% |
| whisper-medium (baseline) | 769M | 34.54% | 9.00% |
| whisper-large-v3 (baseline) | 1543M | 21.00% | 5.51% |
| azerbaijani-whisper-small | 242M | 20.54% | 5.72% |
This model achieves better quality than whisper-large-v3 while being 6x smaller.
Evaluated on FLEURS Azerbaijani test set.
Usage
pip install --upgrade transformers
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import numpy as np
processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-small")
audio, sr = sf.read("audio.wav")
if len(audio.shape) > 1:
audio = audio.mean(axis=1)
audio = librosa.resample(np.asarray(audio, dtype=np.float32), orig_sr=sr, target_sr=16000)
sr = 16000
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")
with torch.no_grad():
ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)
text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)
Note: Audio must be 16kHz mono. If your audio has a different sample rate, use
librosa.resample()as shown above. Passing audio without resampling will produce incorrect results.
Requirements
pip install transformers torch soundfile librosa
Benchmark Details
All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).
| Model | Params | WER | CER | RTF (GPU) |
|---|---|---|---|---|
| whisper-tiny | 38M | 104.48% | 53.93% | 0.033 |
| whisper-base | 73M | 82.63% | 30.35% | 0.032 |
| whisper-small | 242M | 52.17% | 14.52% | 0.053 |
| whisper-medium | 769M | 34.54% | 9.00% | 0.097 |
| whisper-large-v3 | 1543M | 21.00% | 5.51% | 0.129 |
| whisper-large-v3-turbo | 809M | 22.99% | 6.55% | 0.024 |
| azerbaijani-whisper-small | 242M | 20.54% | 5.72% | ~0.05 |
License
Apache 2.0