Model Card for FinetunedLAMAtoR1-001-3B

Model Details

Technical Specifications

Model Architecture and Objective

Base Model: Llama-3.2-3B-Instruct
Architecture: Causal Decoder-Only Transformer
Hidden Size: 3072
Layers: 28
Heads: 24
Parameters: ~3.21B (Loaded in 4-bit quantization)
Precision: Float16 (during inference/training via LoRA)

Compute Infrastructure

Hardware: Tesla T4 GPU (Google Colab)
VRAM Usage: ~2.24 GB (Model) + Training Overhead
Quantization: 4-bit (QLoRA) via bitsandbytes

Model Weights

Type: LoRA Adapter (Peft)
Adapter File Size: ~92 MB
Total Saved Size: ~108 MB

Model Description

This model is a fine-tuned version of unsloth/Llama-3.2-3B-Instruct designed to mimic reflective, human-like stream-of-consciousness reasoning. It was trained using Unsloth on the ServiceNow-AI/R1-Distill-SFT dataset.

The model utilizes a specific system prompt to trigger a "thinking" process (Chain of Thought) before providing the final answer, aiming to replicate the reasoning capabilities seen in models like DeepSeek-R1.

Developed by: Muhammad Shaheer Khan
Model type: Causal Language Model (LoRA Fine-tune)
Language(s) (NLP): English
License: Llama 3.2 Community License
Finetuned from model: unsloth/Llama-3.2-3B-Instruct

Uses

Direct Use

The model is intended for reasoning tasks where explainability and step-by-step logic are required. It excels at math problems, logic puzzles, and complex queries requiring iterative thought.

System Prompt: To activate the reasoning capabilities, you must use the following system prompt:

"You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer."

How to Get Started with the Model

You can use the model with the unsloth library for 2x faster inference, or standard Hugging Face transformers.

Using Unsloth (Recommended)

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Muhammad-Shaheer/FinetunedLAMAtoR1-001-3B",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

sys_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer.
<problem>
{}
</problem>
"""

message = sys_prompt.format("If there are a dozen of eggs at cost $60, how much one egg cost?")

messages = [{"role": "user", "content": message}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    input_ids = inputs,
    max_new_tokens = 1024,
    use_cache = True,
    temperature = 1.5,
    min_p = 0.1
)
print(tokenizer.batch_decode(outputs))

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Muhammad-Shaheer/FinetunedLAMAtoR1-001-3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct