| # Madlad-400-3B-MT ONNX Optimized | |
| This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model, | |
| optimized for reduced memory consumption following the NLLB optimization approach. | |
| ## Model Description | |
| - **Base Model**: jbochi/madlad400-3b-mt | |
| - **Optimization**: Component separation for reduced RAM usage | |
| - **Target**: Mobile and edge deployment | |
| - **Format**: ONNX with separated components | |
| ## Files Structure | |
| ### Optimized Components (`/model/`) | |
| - `madlad_encoder.onnx` - Encoder component | |
| - `madlad_decoder.onnx` - Decoder component | |
| - `madlad_decoder.onnx_data` - Decoder weights data | |
| - `tokenizer_config.json` - Tokenizer configuration | |
| - `special_tokens_map.json` - Special tokens mapping | |
| - `spiece.model` - SentencePiece tokenizer model | |
| - `inference_script.py` - Python inference script | |
| ### Original Models (`/original_models/`) | |
| - Complete original ONNX exports for reference | |
| ## Optimization Benefits | |
| 1. **Memory Reduction**: Separated shared components to avoid duplication | |
| 2. **Mobile Ready**: Optimized for deployment on mobile devices | |
| 3. **Modular**: Components can be loaded independently as needed | |
| ## Usage | |
| ```python | |
| # Basic usage with the optimized models | |
| from transformers import T5Tokenizer | |
| import onnxruntime as ort | |
| # Load tokenizer | |
| tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model") | |
| # Load ONNX models | |
| encoder_session = ort.InferenceSession("model/madlad_encoder.onnx") | |
| decoder_session = ort.InferenceSession("model/madlad_decoder.onnx") | |
| # For detailed inference, see inference_script.py | |
| ``` | |
| ## Translation Example | |
| ```python | |
| # Input format: <2xx> text (where xx is target language code) | |
| text = "<2pt> I love pizza!" # Translate to Portuguese | |
| # Expected output: "Eu amo pizza!" | |
| ``` | |
| ## Language Codes | |
| This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code: | |
| - `<2pt>` - Portuguese | |
| - `<2es>` - Spanish | |
| - `<2fr>` - French | |
| - `<2de>` - German | |
| - And many more... | |
| ## Performance Notes | |
| - **Original Model Size**: ~3.3B parameters | |
| - **Memory Optimization**: Reduced RAM usage through component separation | |
| - **Inference Speed**: Optimized for faster generation with separated components | |
| ## Technical Details | |
| ### Optimization Approach | |
| This optimization follows the same principles used for NLLB models: | |
| 1. **Component Separation**: Split encoder/decoder into separate files | |
| 2. **Weight Deduplication**: Avoid loading shared weights multiple times | |
| 3. **Memory Efficiency**: Load only required components during inference | |
| ### Export Process | |
| The models were exported using: | |
| ```bash | |
| optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3 | |
| ``` | |
| ## Requirements | |
| ``` | |
| torch>=1.9.0 | |
| transformers>=4.20.0 | |
| onnxruntime>=1.12.0 | |
| sentencepiece>=0.1.95 | |
| optimum[onnxruntime]>=1.14.0 | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @misc{madlad-onnx-optimized, | |
| title={Madlad-400-3B-MT ONNX Optimized}, | |
| author={manancode}, | |
| year={2024}, | |
| publisher={Hugging Face}, | |
| url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized} | |
| } | |
| ``` | |
| ## Credits | |
| - **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi | |
| - **Optimization Technique**: Inspired by NLLB ONNX optimizations | |
| - **Export Tools**: HuggingFace Optimum | |
| ## License | |
| This work is based on the original Madlad-400 model. Please refer to the original model's license terms. | |