--- license: apple-amlr base_model: - mistralai/Mistral-7B-Instruct-v0.2 tags: - rag - compression - retrieval - instruction-tuned - generation library_name: transformers --- # CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

# CLaRa-7B-Instruct (Compression-16 & 128) The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x). It supports instruction-following QA directly from compressed document representations. **Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model. **Benchmarks:** Strong instruction-following performance under 16× compression. --- ## More details and usage examples: Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) GitHub: https://github.com/apple/ml-clara Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa --- ## Example Usage (Instruction-Tuned Inference) ```python from transformers import AutoModel unirag = AutoModel.from_pretrained( "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16", trust_remote_code=True ).to("cuda") documents = [ [ "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", "Hagsatera is a genus of flowering plants from the orchid family...", "Alsobia is a genus of flowering plants in the family Gesneriaceae..." ] ] questions = [ "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?" ] # Instruction-tuned usage out = unirag.generate_from_text( questions=questions, documents=documents, max_new_tokens=64 ) print("Generated answer:", out)