Instructions to use JoanKinoti/afrihubert-kikuyu-luhya with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JoanKinoti/afrihubert-kikuyu-luhya with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="JoanKinoti/afrihubert-kikuyu-luhya")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("JoanKinoti/afrihubert-kikuyu-luhya") model = AutoModelForCTC.from_pretrained("JoanKinoti/afrihubert-kikuyu-luhya") - PEFT
How to use JoanKinoti/afrihubert-kikuyu-luhya with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
AfriHuBERT โ Kikuyu + Luhya ASR
Model Card for Model ID
A multilingual automatic speech recognition (ASR) model for Kikuyu and Luhya, two Bantu languages spoken primarily in Kenya. Built on top of AfriHuBERT using a two-stage fine-tuning approach with LoRA adaptation to prevent catastrophic forgetting.
Model Details
Model Description
This model performs automatic speech recognition for two Kenyan Bantu languages โ Kikuyu and Luhya. It was fine-tuned from AfriHuBERT using a two-stage approach: first fully fine-tuned on Kikuyu, then LoRA-adapted jointly on both Kikuyu and Luhya to add Luhya support without catastrophic forgetting of Kikuyu.
- Developed by: Joan Kinoti
- Model type: Automatic Speech Recognition (CTC)
- Language(s): Kikuyu (
ki), Luhya (luy) - License: MIT
- Finetuned from: ajesujoba/AfriHuBERT
Model Sources [optional]
- Repository: https://huggingface.co/JoanKinoti/afrihubert-kikuyu-luhya
- Demo [optional]: [More Information Needed]
Direct Use
Transcribe spoken Kikuyu or Luhya audio to text. Suitable for:
- Voice interfaces for Kenyan languages
- Transcription pipelines for Kikuyu and Luhya audio content
- Research on low-resource African language ASR
Downstream Use [optional]
Can be integrated into larger pipelines for translation, keyword spotting, or voice-controlled applications targeting Kikuyu and Luhya speakers.
Bias, Risks, and Limitations
- Trained on a limited dataset โ performance may degrade on out-of-domain speakers or recording conditions
- Character-level vocabulary may struggle with loanwords or proper nouns not seen during training
- Performance may vary across dialects within Kikuyu and Luhya
[More Information Needed]
How to Get Started with the Model
Use the code below to get started with the model.
import torch
import soundfile as sf
from transformers import HubertForCTC, Wav2Vec2Processor
model = HubertForCTC.from_pretrained("JoanKinoti/afrihubert-kikuyu-luhya")
processor = Wav2Vec2Processor.from_pretrained("JoanKinoti/afrihubert-kikuyu-luhya")
def transcribe(audio_path: str) -> str:
audio, sr = sf.read(audio_path)
# Resample to 16kHz if needed
if sr != 16000:
import torchaudio
audio = torchaudio.functional.resample(
torch.tensor(audio).unsqueeze(0), sr, 16000
).squeeze(0).numpy()
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
pred_ids = torch.argmax(logits, dim=-1)
return processor.batch_decode(pred_ids)[0]
print(transcribe("your_audio.wav"))
[More Information Needed]
Training Details
Training Data
Training Data
Kikuyu: MCAA1-MSU/anv_data_ke โ used for the initial full fine-tune on Kikuyu speech.
Luhya: Mozilla Data Collective โ Luhya โ used jointly with Kikuyu during the LoRA adaptation stage.
Both datasets consist of audio recordings with corresponding transcriptions. Audio was resampled to 16kHz mono and transcriptions were character-level tokenized using a shared 37-token vocabulary covering both languages.
Training Procedure
Stage 1 โ Full fine-tune on Kikuyu: AfriHuBERT was fully fine-tuned on Kikuyu speech data producing a strong Kikuyu ASR checkpoint.
Stage 2 โ Joint LoRA adaptation on Kikuyu + Luhya: LoRA adapters were applied on top of the Kikuyu checkpoint and trained jointly on both languages using balanced sampling to prevent catastrophic forgetting. The final LoRA adapters were merged into the base model for deployment.
Preprocessing [optional]
[More Information Needed]
- Audio resampled to 16kHz mono
- Peak normalization with soft tanh clipping
- Maximum audio length: 20 seconds
- Character-level tokenization with 37-token joint vocabulary
Training Hyperparameters
- Training regime: [More Information Needed]
- Training regime: fp16 mixed precision
- Optimizer: AdamW
- LM head learning rate: 1e-3
- LoRA layers learning rate: 3e-5
- Warmup ratio: 0.1
- Training steps: 43,200
- Trainable parameters: 618,277 (~0.65% of total)
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Metrics
Word Error Rate (WER) โ measures the edit distance between predicted and reference transcriptions at the word level. Lower is better. Character Error Rate (CER) - measures the percentage of incorrectly predicted characters (substitutions, deletions, or insertions) compared to the total characters in the reference text
Results
| Language | Samples | WER | CER |
|---|---|---|---|
| Kikuyu | 2,033 | 39.9% | 8.8% |
| Luhya | 668 | 44.2% | 13.6% |
Summary
The model achieves strong character-level accuracy on both languages, with Kikuyu CER of 8.8% and Luhya CER of 13.6%. The higher WER relative to CER is expected for a character-level CTC model โ individual character substitutions cause entire word mismatches. Sample predictions show the model captures most phonetic content correctly, with errors mainly on similar-sounding characters and word boundaries.
The results demonstrate that joint LoRA training successfully preserved Kikuyu performance while adding Luhya support, with no catastrophic forgetting observed.
Model Card Authors [optional]
Joan Kinoti
Model Card Contact
JoanKinoti on HuggingFace
[More Information Needed]
- Downloads last month
- 45