--- license: mit base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned tags: - text-embeddings - sentence-transformers - llm2vec - medical - chest-xray - radiology - clinical-nlp language: - en pipeline_tag: feature-extraction library_name: transformers --- # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding. It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234). ## Model Description LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy. This design enhances semantic representation of chest X-ray reports, making the model robust across different reporting styles and effective even with domain-specific abbreviations. It improves performance on clinical text similarity, retrieval, and interpretation tasks. ### Key Features - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct - **Pooling Mode**: Latent Attention (trained weights automatically loaded) - **Bidirectional Processing**: Enabled for better context understanding - **Medical Domain**: Specialized for chest X-ray report analysis - **Max Length**: 512 tokens - **Precision**: bfloat16 - **Automatic Loading**: Latent attention weights are automatically loaded from safetensors - **Simple API**: Built-in methods for similarity computation and instruction-based encoding ## Training Details ### Training Data - Fully fine-tuned on chest X-ray reports and medical text data - Training focused on understanding pleural effusion status and other chest X-ray findings ### Training Configuration - **Pooling Mode**: `latent_attention` (modified from base model) - **Enable Bidirectional**: True - **Max Length**: 512 - **Torch Dtype**: bfloat16 - **Full Fine-tuning**: All model weights were updated during training ## Usage ### Installation ```bash # Only transformers is needed! pip install transformers torch ``` ### Basic Usage ```python import torch from transformers import AutoModel # Load the model - that's it! model = AutoModel.from_pretrained( "lukeingawesome/llm2vec4cxr", trust_remote_code=True, torch_dtype=torch.bfloat16 ).to("cuda" if torch.cuda.is_available() else "cpu").eval() # Simple text encoding report = "Small left pleural effusion with basal atelectasis." embedding = model.encode_text([report]) print(embedding.shape) # torch.Size([1, 2048]) # Multiple texts at once reports = [ "No acute cardiopulmonary abnormality.", "Small bilateral pleural effusions.", "Large left pleural effusion with compressive atelectasis." ] embeddings = model.encode_text(reports) print(embeddings.shape) # torch.Size([3, 2048]) ``` ### Instruction-Based Encoding and Similarity ```python import torch from transformers import AutoModel # Load model model = AutoModel.from_pretrained( "lukeingawesome/llm2vec4cxr", trust_remote_code=True, torch_dtype=torch.bfloat16 ).to("cuda" if torch.cuda.is_available() else "cpu").eval() # Instruction-based task with separator instruction = "Determine the status of the pleural effusion." report = "There is a small increase in the left-sided effusion." query = instruction + "!@#$%^&*()" + report # Compare against multiple candidates candidates = [ "No pleural effusion", "Pleural effusion present", "Worsening pleural effusion", "Improving pleural effusion" ] # One-line similarity computation scores = model.compute_similarities(query, candidates) print(scores) # tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0') best_match = candidates[torch.argmax(scores)] print(f"Best match: {best_match}") # Best match: Worsening pleural effusion ``` ### Medical Report Retrieval Example ```python import torch from transformers import AutoModel # Load model model = AutoModel.from_pretrained( "lukeingawesome/llm2vec4cxr", trust_remote_code=True, torch_dtype=torch.bfloat16 ).to("cuda" if torch.cuda.is_available() else "cpu").eval() # Instruction for retrieval instruction = "Retrieve semantically similar reports" query_report = "Small left pleural effusion with basal atelectasis." query = instruction + "!@#$%^&*()" + query_report # Candidate reports candidates = [ "No acute cardiopulmonary abnormality.", "Small left pleural effusion is present.", "Large right pleural effusion causing compressive atelectasis.", "Heart size is normal with no evidence of pleural effusion.", ] # Compute similarities scores = model.compute_similarities(query, candidates) # Get most similar best_idx = torch.argmax(scores) print(f"Most similar: {candidates[best_idx]}") print(f"Score: {scores[best_idx]:.4f}") ``` ## API Reference The model provides three main methods: ### `encode_text(texts, max_length=512)` Simple text encoding for one or more texts. **Parameters:** - `texts`: List of strings or single string - `max_length`: Maximum sequence length (default: 512) **Returns:** Tensor of shape `(batch_size, 2048)` 📄 **Related Papers**: - [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234) *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).* - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays. **Parameters:** - `texts`: List of strings with optional separator - `separator`: String separator (default: `'!@#$%^&*()'`) - `max_length`: Maximum sequence length (default: 512) **Returns:** Tensor of shape `(batch_size, 2048)` The model has been evaluated on chest X-ray report analysis tasks, particularly for: - Text retrieval/encoder - Medical text similarity comparison - Clinical finding extraction **Parameters:** - `query_text`: Single query string - `candidate_texts`: List of candidate strings - `separator`: String separator (default: `'!@#$%^&*()'`) - `max_length`: Maximum sequence length (default: 512) **Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores ## Training Details ### Training Data - Fully fine-tuned on chest X-ray reports and medical text data - Training focused on understanding pleural effusion status and other chest X-ray findings ### Training Configuration - **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads) - **Enable Bidirectional**: True - **Max Length**: 512 tokens - **Torch Dtype**: bfloat16 - **Full Fine-tuning**: All model weights were updated during training ## Technical Specifications - **Model Type**: Bidirectional Language Model (LLM2Vec) - **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling - **Parameters**: ~1B parameters - **Hidden Size**: 2048 - **Input Length**: Up to 512 tokens - **Output Dimension**: 2048 - **Precision**: bfloat16 - **Dependencies**: Only transformers and torch ## Intended Use ### Primary Use Cases - **Medical Text Embeddings**: Generate embeddings for chest X-ray reports - **Clinical Text Similarity**: Compare medical texts for semantic similarity - **Medical Information Retrieval**: Find relevant medical reports or findings - **Clinical NLP Research**: Foundation model for medical text analysis ### Limitations - Specialized for chest X-ray reports - may not generalize to other medical domains - Requires careful preprocessing for optimal performance - Should be used as part of a larger clinical decision support system, not for standalone diagnosis ## Evaluation The model has been evaluated on chest X-ray report analysis tasks, particularly for: - Text retrieval and encoding - Medical text similarity comparison - Clinical finding extraction ### Sample Performance The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks. **LLM2Vec4CXR** shows stronger performance in: - Handling medical abbreviations and radiological terminology - Capturing fine-grained semantic differences in chest X-ray reports - Understanding clinical context and temporal changes ## Related Resources 📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234) 🔗 **Related Projects**: - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays ## Citation If you use this model in your research, please cite: ```bibtex @article{ko2025exploring, title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays}, author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min}, journal={arXiv preprint arXiv:2509.15234}, year={2025} } ``` ## Acknowledgments This model is built upon: - [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders - [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models ## License This model is licensed under the MIT License.