Instructions to use sudoping01/bambara-asr-llm-exp1-all with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sudoping01/bambara-asr-llm-exp1-all with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("sudoping01/bambara-llm-exp3-merged")
model = PeftModel.from_pretrained(base_model, "sudoping01/bambara-asr-llm-exp1-all")

Transformers

How to use sudoping01/bambara-asr-llm-exp1-all with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sudoping01/bambara-asr-llm-exp1-all")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sudoping01/bambara-asr-llm-exp1-all")
model = AutoModelForImageTextToText.from_pretrained("sudoping01/bambara-asr-llm-exp1-all")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sudoping01/bambara-asr-llm-exp1-all with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sudoping01/bambara-asr-llm-exp1-all"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sudoping01/bambara-asr-llm-exp1-all",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sudoping01/bambara-asr-llm-exp1-all

SGLang

How to use sudoping01/bambara-asr-llm-exp1-all with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sudoping01/bambara-asr-llm-exp1-all" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sudoping01/bambara-asr-llm-exp1-all",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sudoping01/bambara-asr-llm-exp1-all" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sudoping01/bambara-asr-llm-exp1-all",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sudoping01/bambara-asr-llm-exp1-all with Docker Model Runner:
```
docker model run hf.co/sudoping01/bambara-asr-llm-exp1-all
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

See axolotl config

axolotl version: 0.12.2

base_model: sudoping01/bambara-llm-exp3-merged
processor_type: AutoProcessor
hub_model_id: sudoping01/bambara-asr-llm-exp1

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true


skip_prepare_dataset: true  
remove_unused_columns: false
sample_packing: false


ddp: true
ddp_find_unused_parameters: true

# Template and tokens
chat_template: gemma3n
eot_tokens:
  - <end_of_turn>
special_tokens:
  eot_token: <end_of_turn>


datasets:
  - path: instruction_dataset_asr_axolotl_format.jsonl
    type: chat_template


val_set_size: 0.01
output_dir: ./outputs/bambara-gemma3n-asr-lora-exp1


adapter: lora
lora_r: 32  # Reduced from 64 for stability
lora_alpha: 64  # Reduced from 128 for stability
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

# Sequence and batch settings - conservative for audio
sequence_len: 4096  # Reduced from 4096
pad_to_sequence_len: false
micro_batch_size: 8  # Increased: You have 8x H100s, can handle larger batches
gradient_accumulation_steps: 2

# Training parameters
num_epochs: 6  # Start with 1 epoch for testing
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-4  # Slightly higher as per research
warmup_ratio: 0.1  # Increased warmup for multimodal
weight_decay: 0.0  # Set to 0 for multimodal


bf16: true  # Must be true, not auto
tf32: false
load_in_4bit: false  # Keep false for quality
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Monitoring
logging_steps: 1  # More frequent for debugging
saves_per_epoch: 2
evals_per_epoch: 2

# ASR metrics
metrics:
  - name: wer
  - name: cer

bambara-asr-llm-exp1

This model is a fine-tuned version of sudoping01/bambara-llm-exp3-merged on the instruction_dataset_asr_axolotl_format.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 0.0887
Memory/max Mem Active(gib): 18.76
Memory/max Mem Allocated(gib): 18.76
Memory/device Mem Reserved(gib): 19.99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 350
training_steps: 3508

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	2.3603	18.76	18.76	19.99
0.5427	0.5009	293	0.5847	18.76	18.76	19.99
0.3592	1.0017	586	0.4359	18.76	18.76	19.99
0.3552	1.5026	879	0.3764	18.76	18.76	19.99
0.2928	2.0034	1172	0.3247	18.76	18.76	19.99
0.2413	2.5043	1465	0.2867	18.76	18.76	19.99
0.2799	3.0051	1758	0.2314	18.76	18.76	19.99
0.1091	3.5060	2051	0.2033	18.76	18.76	19.99
0.115	4.0068	2344	0.1539	18.76	18.76	19.99
0.0835	4.5077	2637	0.1185	18.76	18.76	19.99
0.0722	5.0085	2930	0.1021	18.76	18.76	19.99
0.0945	5.5094	3223	0.0887	18.76	18.76	19.99

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for sudoping01/bambara-asr-llm-exp1-all

Base model

sudoping01/maliba-llm

Adapter

(2)

this model