--- title: Advanced Multilingual Image Describer emoji: 🌍 colorFrom: purple colorTo: indigo sdk: streamlit sdk_version: "1.32.0" app_file: app.py pinned: false --- # 🌍 Advanced Multilingual Image Describer **No translation APIs • Native multilingual support • Latest vision-language models** ## 🚀 Features - **Direct multilingual captioning** - No separate translation step - **Latest models** - LLaVA 1.5, Qwen-VL, Moondream 2 - **10+ languages** - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more - **Fast & efficient** - Optimized for Hugging Face Spaces - **Clean interface** - Simple and intuitive ## 🤖 Supported Models ### LLaVA 1.5 (7B) - **Languages**: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic - **Best for**: High-quality detailed descriptions - **Size**: 7 billion parameters ### Qwen-VL-Chat - **Languages**: English, Chinese, Japanese, Korean, French, German, Spanish, Russian - **Best for**: Conversational responses - **Size**: 9.6 billion parameters ### Moondream 2 - **Languages**: English, Spanish, French, German - **Best for**: Fast inference, smaller size - **Size**: 1.4 billion parameters ## 🌐 How It Works 1. **Select a model** from the sidebar 2. **Choose language** for output 3. **Upload an image** (JPG, PNG, WebP) 4. **Click "Generate Description"** 5. **Get native description** in selected language ## ⚡ Performance - **Inference time**: 2-10 seconds - **Memory usage**: ~8-16GB RAM - **Quality**: Human-like descriptions - **Languages**: Native output (not translated) ## 🛠️ Technical Details - **Framework**: Streamlit + Transformers - **Models**: Latest vision-language models from Hugging Face - **Deployment**: Hugging Face Spaces (CPU/GPU) - **Code**: Pure Python, no external APIs ## 📋 File Structure