Model Card for Model ID
A fine-tuned Mistral-7B-Instruct-v0.3 model specifically trained for generating medical rationales and explanations. The model was trained using QLoRA on a custom dataset of medical rationales.
Model Details
Model Description
This model is a fine-tuned version of Mistral-7B-Instruct-v0.3, specifically optimized for generating detailed medical rationales and explanations. It is mainly intended to be used in METEORA Rerankers of medical RAG systems. It was trained using Low-Rank Adaptation (LoRA) on a dataset of medical reasoning tasks, resulting in an 80%+ improvement in performance metrics compared to the base model.
- Developed by: Chidiebere Okoene
- Model type: Causal Language Model (Decoder-only)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: mistralai/Mistral-7B-Instruct-v0.3
Model Sources [optional]
- Repository: (https://github.com/ChidiOkoene/METEORA_Med-Reraker/tree/feat/V1)
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Direct Use
This model is intended for generating medical rationales, explanations, and reasoning for healthcare-related queries. It can be used by:
- Medical educators creating teaching materials
- Healthcare professionals seeking second opinions or explanations
- Medical students learning diagnostic reasoning
- Researchers exploring medical AI applications
Downstream Use [optional]
This model can be integrated into:
- METEORA Reranker for Medical RAG systems
- Clinical decision support systems
- Healthcare chatbots for patient education
- Medical documentation assistants
Out-of-Scope Use
This model should not be used for:
- Direct patient diagnosis without human supervision
- Making treatment decisions without clinical validation
- Replacing licensed medical professionals
- Generating medical advice for serious conditions
Bias, Risks, and Limitations
- Training Data Bias: The model was trained on a specific dataset of medical rationales and may not cover all medical specialties or rare conditions
- Accuracy Limitations: While performance improved significantly, the model may still generate incorrect or incomplete information
- Temporal Limitations: Medical knowledge evolves rapidly, and the model may not reflect the latest guidelines or research
- Demographic Biases: The training data may not adequately represent all patient populations
Recommendations
- Always verify model outputs with current medical literature and guidelines
- Use this model as an educational tool rather than a diagnostic tool
- Implement human oversight for any clinical applications
- Regularly update the model with new medical knowledge
- Disclose the AI-assisted nature of generated content to end users
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "chidiokoene/mistral-7b-med-rationales-finetuned"
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Generate rationales
def generate_rationale(prompt):
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
prompt = "Given the user query below, generate 3 concise rationales (1โ2 sentences each) describing what evidence a correct passage should contain.
Explain the mechanism of action of metformin in type 2 diabetes."
rationale = generate_rationale(prompt)
print(rationale)
Training Details
Training Data
The model was fine-tuned on a proprietary dataset of medical rationales containing approximately 11,362 training examples and 3,246 validation examples. The data consisted of medical questions paired with detailed explanatory rationales.
Training Procedure
Preprocessing [optional]
Text was tokenized using the Mistral tokenizer
Sequences were truncated or padded to 1024 tokens
Special tokens were added for instruction following
Training Hyperparameters
- Training regime:
- Training regime: bf16 mixed precision with QLoRA
- Learning rate: 2e-4
- Batch size: 2 (with gradient accumulation steps: 4)
- Epochs: 3
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
Speeds, Sizes, Times [optional]
- Training time: ~13 hours on a single GPU with 15GB VRAM
- Model size: ~15GB (4-bit quantized)
- Inference speed: ~2.9 samples/second
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a held-out validation set of 1,624 medical rationale examples.
Factors
[More Information Needed]
Metrics
- Perplexity (lower is better)
- Average cross-entropy loss (lower is better)
- Inference speed (samples per second)
Results
Metric Baseline Model Fine-tuned Model Improvement
Perplexity 7.78 1.51 80.6%
Average Loss 2.05 0.41 79.9%
Inference Speed 5.17 samples/sec 2.91 samples/sec -43.7%
The fine-tuned model shows exceptional improvement in understanding and generating medical rationales, with over 80% improvement in both perplexity and loss metrics. The reduction in inference speed is expected due to the added LoRA parameters.
{
"baseline_model": {
"perplexity": 7.784124134664591,
"average_loss": 2.0520862921697764,
"loss_std": 0.2737355939406239,
"evaluation_time_seconds": 313.9927325248718,
"samples_per_second": 5.1720942295101064
},
"fine_tuned_model": {
"perplexity": 1.5100232168650496,
"average_loss": 0.4121250261159502,
"loss_std": 0.147794492117157,
"evaluation_time_seconds": 557.3957495689392,
"samples_per_second": 2.9135493072129037
},
"comparison": {
"perplexity_improvement_percent": 80.60124439510734,
"loss_improvement_percent": 79.9167789537647,
"relative_speed": 0.5633210026586989
},
"evaluation_parameters": {
"max_length": 1024,
"batch_size": 1,
"num_samples_evaluated": 1624
}
Summary
The fine-tuning process was highly successful, resulting in a model that significantly outperforms the base Mistral-7B model on medical rationale generation tasks while maintaining reasonable inference speed.
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Hardware Type: NVIDIA GPU with 15GB VRAM
Hours used: ~13 hours for training
Carbon Emitted: Estimated based on Machine Learning Impact calculator
Technical Specifications [optional]
Model Architecture and Objective
Architecture: Transformer-based decoder-only model
Objective: Causal language modeling with instruction tuning
Parameters: 7 billion
Context length: 4096 tokens
Compute Infrastructure
[More Information Needed]
Hardware
Single GPU training
Software
PyTorch, Transformers, PEFT, Accelerate
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 11
Model tree for chidiokoene/okoene-med-rationale
Base model
mistralai/Mistral-7B-v0.3