hn / README.md
manancode's picture
Add Madlad-400-3B-MT ONNX optimized models with component separation
3d81992 verified
# Madlad-400-3B-MT ONNX Optimized
This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model,
optimized for reduced memory consumption following the NLLB optimization approach.
## Model Description
- **Base Model**: jbochi/madlad400-3b-mt
- **Optimization**: Component separation for reduced RAM usage
- **Target**: Mobile and edge deployment
- **Format**: ONNX with separated components
## Files Structure
### Optimized Components (`/model/`)
- `madlad_encoder.onnx` - Encoder component
- `madlad_decoder.onnx` - Decoder component
- `madlad_decoder.onnx_data` - Decoder weights data
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `spiece.model` - SentencePiece tokenizer model
- `inference_script.py` - Python inference script
### Original Models (`/original_models/`)
- Complete original ONNX exports for reference
## Optimization Benefits
1. **Memory Reduction**: Separated shared components to avoid duplication
2. **Mobile Ready**: Optimized for deployment on mobile devices
3. **Modular**: Components can be loaded independently as needed
## Usage
```python
# Basic usage with the optimized models
from transformers import T5Tokenizer
import onnxruntime as ort
# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model")
# Load ONNX models
encoder_session = ort.InferenceSession("model/madlad_encoder.onnx")
decoder_session = ort.InferenceSession("model/madlad_decoder.onnx")
# For detailed inference, see inference_script.py
```
## Translation Example
```python
# Input format: <2xx> text (where xx is target language code)
text = "<2pt> I love pizza!" # Translate to Portuguese
# Expected output: "Eu amo pizza!"
```
## Language Codes
This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code:
- `<2pt>` - Portuguese
- `<2es>` - Spanish
- `<2fr>` - French
- `<2de>` - German
- And many more...
## Performance Notes
- **Original Model Size**: ~3.3B parameters
- **Memory Optimization**: Reduced RAM usage through component separation
- **Inference Speed**: Optimized for faster generation with separated components
## Technical Details
### Optimization Approach
This optimization follows the same principles used for NLLB models:
1. **Component Separation**: Split encoder/decoder into separate files
2. **Weight Deduplication**: Avoid loading shared weights multiple times
3. **Memory Efficiency**: Load only required components during inference
### Export Process
The models were exported using:
```bash
optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3
```
## Requirements
```
torch>=1.9.0
transformers>=4.20.0
onnxruntime>=1.12.0
sentencepiece>=0.1.95
optimum[onnxruntime]>=1.14.0
```
## Citation
```bibtex
@misc{madlad-onnx-optimized,
title={Madlad-400-3B-MT ONNX Optimized},
author={manancode},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized}
}
```
## Credits
- **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi
- **Optimization Technique**: Inspired by NLLB ONNX optimizations
- **Export Tools**: HuggingFace Optimum
## License
This work is based on the original Madlad-400 model. Please refer to the original model's license terms.