STiFLeR7
/

Phi2-GPTQ

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions

STiFLeR7 commited on Apr 4

Commit

a15c650

·

verified ·

1 Parent(s): 43f428f

Update README.md

Files changed (1) hide show

README.md +39 -3

README.md CHANGED Viewed

@@ -1,3 +1,39 @@
----
-license: apache-2.0
----

+# 🧠 Phi-2 GPTQ (Quantized)
+This repository provides a 4-bit GPTQ quantized version of the **Phi-2** model by Microsoft, optimized for efficient inference using `gptqmodel`.
+## 📌 Model Details
+- **Base Model**: Microsoft Phi-2
+- **Quantization**: GPTQ (4-bit)
+- **Quantizer**: `GPTQModel`
+- **Framework**: PyTorch + HuggingFace Transformers
+- **Device Support**: CUDA (GPU)
+- **License**: Apache 2.0
+## 🚀 Features
+- ✅ Lightweight: 4-bit quantization significantly reduces memory usage
+- ✅ Fast Inference: Ideal for deployment on consumer GPUs
+- ✅ Compatible: Works with `transformers`, `optimum`, and `gptqmodel`
+- ✅ CUDA-accelerated: Automatically uses GPU for speed
+## 📚 Usage
+This model is ready-to-use with the Hugging Face `transformers` library.
+## 🧪 Intended Use
+- Research and development
+- Prototyping generative applications
+- Fast inference environments with limited GPU memory
+## 📖 References
+- Microsoft Phi-2: https://huggingface.co/microsoft/phi-2
+- GPTQModel: https://github.com/ModelCoud/GPTQModel
+- Transformers: https://github.com/huggingface/transformers
+## ⚖️ License
+This model is distributed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).