STiFLeR7 commited on
Commit
a15c650
Β·
verified Β·
1 Parent(s): 43f428f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Phi-2 GPTQ (Quantized)
2
+
3
+ This repository provides a 4-bit GPTQ quantized version of the **Phi-2** model by Microsoft, optimized for efficient inference using `gptqmodel`.
4
+
5
+ ## πŸ“Œ Model Details
6
+
7
+ - **Base Model**: Microsoft Phi-2
8
+ - **Quantization**: GPTQ (4-bit)
9
+ - **Quantizer**: `GPTQModel`
10
+ - **Framework**: PyTorch + HuggingFace Transformers
11
+ - **Device Support**: CUDA (GPU)
12
+ - **License**: Apache 2.0
13
+
14
+ ## πŸš€ Features
15
+
16
+ - βœ… Lightweight: 4-bit quantization significantly reduces memory usage
17
+ - βœ… Fast Inference: Ideal for deployment on consumer GPUs
18
+ - βœ… Compatible: Works with `transformers`, `optimum`, and `gptqmodel`
19
+ - βœ… CUDA-accelerated: Automatically uses GPU for speed
20
+
21
+ ## πŸ“š Usage
22
+
23
+ This model is ready-to-use with the Hugging Face `transformers` library.
24
+
25
+ ## πŸ§ͺ Intended Use
26
+
27
+ - Research and development
28
+ - Prototyping generative applications
29
+ - Fast inference environments with limited GPU memory
30
+
31
+ ## πŸ“– References
32
+
33
+ - Microsoft Phi-2: https://huggingface.co/microsoft/phi-2
34
+ - GPTQModel: https://github.com/ModelCoud/GPTQModel
35
+ - Transformers: https://github.com/huggingface/transformers
36
+
37
+ ## βš–οΈ License
38
+
39
+ This model is distributed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).