--- title: RAG Observability Platform emoji: 🚀 colorFrom: blue colorTo: indigo sdk: docker pinned: false app_port: 7860 --- # RAG Observability Platform 🚀 # RAG Observability Platform - Project Summary ## Project Overview The **RAG Observability Platform** is a production-grade Retrieval-Augmented Generation (RAG) system that demonstrates advanced MLOps practices and hybrid cloud-local deployment strategies. It combines cutting-edge ML inference optimization (Apple Silicon GPU) with MLOps observability frameworks for enterprise-ready applications. --- ## What This Project Does ### Core Functionality 1. **Local RAG Pipeline (Mac M4)** - Ingests unstructured text documents - Chunks documents using recursive text splitting - Generates embeddings via sentence-transformers (optimized for Apple Silicon via MPS acceleration) - Stores embeddings in ChromaDB (local vector database) - Retrieves relevant context and generates answers using Llama 3.2 3B model via MLX 2. **Cloud Deployment (Hugging Face Spaces)** - Docker containerization for reproducible deployment - Automatic fallback to CPU-based inference when MLX unavailable - Streamlit web UI for interactive chat with documents - Graceful degradation: maintains functionality across platforms 3. **Experiment Tracking (Dagshub + MLflow)** - Logs all ingestion runs with parameters and metrics - Centralized experiment monitoring from local machine - Version control for code and data via Git + DVC - Remote MLflow server for team collaboration ### Technical Highlights - **Cross-Platform Optimization**: Native M4 GPU (via MLX) for local development; CPU fallback for cloud - **Infrastructure as Code**: Docker + UV for reproducible environments - **Modern Python Stack**: LangChain (LCEL), Pydantic, asyncio-ready - **MLOps Best Practices**: Experiment tracking, dependency management, secrets handling --- ## Key Highlight 1. **GPU Optimization**: Understand when to use specialized tools (MLX for Apple Silicon) vs. standard libraries (PyTorch) 2. **Cross-Platform Development**: Device abstraction, graceful fallbacks, testing on multiple architectures 3. **Dependency Management**: Using UV for faster resolution, managing optional dependencies (local vs. cloud groups) 4. **MLOps Practices**: Experiment tracking, versioning data + code, secrets management 5. **Production Deployment**: Docker best practices, environment variable injection, port mapping 6. **Modern Python**: Type hints, LangChain LCEL (functional composition), error handling 7. **Troubleshooting**: Resolved Python version mismatches, binary file handling in Git, device compatibility issues --- ## Why This Project Stands Out - **Full Stack**: From local GPU optimization to cloud deployment - **Senior-Level Considerations**: - Device compatibility across platforms - Graceful degradation (MLX → Transformers fallback) - Secrets management without pushing `.env` - Experiment observability - **Modern Tooling**: UV (faster than pip), MLX (Apple Silicon optimization), LangChain LCEL (declarative chains) - **Problem Solving**: Resolved real-world issues (ONNX version compatibility, Docker base image mismatch, GPU device detection) --- ## GitHub/Portfolio Presentation **Repository Structure** (visible in your GitHub): ``` rag-observability-platform/ ├── src/ │ ├── ingestion/ (document loading, chunking, embedding) │ ├── retrieval/ (RAG chain with LCEL) │ └── generation/ (MLX wrapper, device handling) ├── app/frontend/ (Streamlit UI) ├── Dockerfile (Cloud deployment) ├── pyproject.toml (UV dependency management) └── README.md (project documentation) ``` **Git History** (visible in commits): - Clean, semantic commits showing progression - Branching strategy: `master` → `mvp` → `frontend`/`backend` - Demonstrates collaborative workflow understanding --- 1. **"Why MLX instead of PyTorch?"** - MLX is optimized for Apple Silicon; PyTorch CPU mode is 10x slower on M4 2. **"How do you handle the MLX import error in Docker?"** - Try-except with fallback to transformers; dynamic device selection 3. **"Why use Dagshub for this portfolio project?"** - Demonstrates understanding of MLOps practices; shows ability to connect local experiments to remote tracking 4. **"What would you do at scale?"** - Move to managed inference (HF Inference API), DVC for larger datasets, Kubernetes for orchestration ---