hn / README.md

Add Madlad-400-3B-MT ONNX optimized models with component separation

3d81992 verified 4 months ago

3.52 kB


	# Madlad-400-3B-MT ONNX Optimized

	This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model,
	optimized for reduced memory consumption following the NLLB optimization approach.

	## Model Description

	- Base Model: jbochi/madlad400-3b-mt
	- Optimization: Component separation for reduced RAM usage
	- Target: Mobile and edge deployment
	- Format: ONNX with separated components

	## Files Structure

	### Optimized Components (`/model/`)
	- `madlad_encoder.onnx` - Encoder component
	- `madlad_decoder.onnx` - Decoder component
	- `madlad_decoder.onnx_data` - Decoder weights data
	- `tokenizer_config.json` - Tokenizer configuration
	- `special_tokens_map.json` - Special tokens mapping
	- `spiece.model` - SentencePiece tokenizer model
	- `inference_script.py` - Python inference script

	### Original Models (`/original_models/`)
	- Complete original ONNX exports for reference

	## Optimization Benefits

	1. Memory Reduction: Separated shared components to avoid duplication
	2. Mobile Ready: Optimized for deployment on mobile devices
	3. Modular: Components can be loaded independently as needed

	## Usage

	```python
	# Basic usage with the optimized models
	from transformers import T5Tokenizer
	import onnxruntime as ort

	# Load tokenizer
	tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model")

	# Load ONNX models
	encoder_session = ort.InferenceSession("model/madlad_encoder.onnx")
	decoder_session = ort.InferenceSession("model/madlad_decoder.onnx")

	# For detailed inference, see inference_script.py
	```

	## Translation Example

	```python
	# Input format: <2xx> text (where xx is target language code)
	text = "<2pt> I love pizza!" # Translate to Portuguese
	# Expected output: "Eu amo pizza!"
	```

	## Language Codes

	This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code:
	- `<2pt>` - Portuguese
	- `<2es>` - Spanish
	- `<2fr>` - French
	- `<2de>` - German
	- And many more...

	## Performance Notes

	- Original Model Size: ~3.3B parameters
	- Memory Optimization: Reduced RAM usage through component separation
	- Inference Speed: Optimized for faster generation with separated components

	## Technical Details

	### Optimization Approach

	This optimization follows the same principles used for NLLB models:

	1. Component Separation: Split encoder/decoder into separate files
	2. Weight Deduplication: Avoid loading shared weights multiple times
	3. Memory Efficiency: Load only required components during inference

	### Export Process

	The models were exported using:
	```bash
	optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3
	```

	## Requirements

	```
	torch>=1.9.0
	transformers>=4.20.0
	onnxruntime>=1.12.0
	sentencepiece>=0.1.95
	optimum[onnxruntime]>=1.14.0
	```

	## Citation

	```bibtex
	@misc{madlad-onnx-optimized,
	title={Madlad-400-3B-MT ONNX Optimized},
	author={manancode},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized}
	}
	```

	## Credits

	- Base Model: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi
	- Optimization Technique: Inspired by NLLB ONNX optimizations
	- Export Tools: HuggingFace Optimum

	## License

	This work is based on the original Madlad-400 model. Please refer to the original model's license terms.