---
license: apache-2.0
tags:
  - gptq
  - quantized
  - causal-lm
  - transformers
  - pytorch
  - phi-2
  - text-generation
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/phi-2
inference: true
---

# 🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the **Phi-2** model by Microsoft, optimized for efficient inference using `gptqmodel`.

## 📌 Model Details

- **Base Model**: Microsoft Phi-2
- **Quantization**: GPTQ (4-bit)
- **Quantizer**: `GPTQModel`
- **Framework**: PyTorch + HuggingFace Transformers
- **Device Support**: CUDA (GPU)
- **License**: Apache 2.0

## 🚀 Features

- ✅ Lightweight: 4-bit quantization significantly reduces memory usage
- ✅ Fast Inference: Ideal for deployment on consumer GPUs
- ✅ Compatible: Works with `transformers`, `optimum`, and `gptqmodel`
- ✅ CUDA-accelerated: Automatically uses GPU for speed

## 📚 Usage

This model is ready-to-use with the Hugging Face `transformers` library.

## 🧪 Intended Use

- Research and development
- Prototyping generative applications
- Fast inference environments with limited GPU memory

## 📖 References

- Microsoft Phi-2: https://huggingface.co/microsoft/phi-2  
- GPTQModel: https://github.com/ModelCoud/GPTQModel  
- Transformers: https://github.com/huggingface/transformers

## ⚖️ License

This model is distributed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).