--- license: apache-2.0 tags: - gptq - quantized - causal-lm - transformers - pytorch - phi-2 - text-generation library_name: transformers pipeline_tag: text-generation base_model: microsoft/phi-2 inference: true --- # ๐Ÿง  Phi-2 GPTQ (Quantized) This repository provides a 4-bit GPTQ quantized version of the **Phi-2** model by Microsoft, optimized for efficient inference using `gptqmodel`. ## ๐Ÿ“Œ Model Details - **Base Model**: Microsoft Phi-2 - **Quantization**: GPTQ (4-bit) - **Quantizer**: `GPTQModel` - **Framework**: PyTorch + HuggingFace Transformers - **Device Support**: CUDA (GPU) - **License**: Apache 2.0 ## ๐Ÿš€ Features - โœ… Lightweight: 4-bit quantization significantly reduces memory usage - โœ… Fast Inference: Ideal for deployment on consumer GPUs - โœ… Compatible: Works with `transformers`, `optimum`, and `gptqmodel` - โœ… CUDA-accelerated: Automatically uses GPU for speed ## ๐Ÿ“š Usage This model is ready-to-use with the Hugging Face `transformers` library. ## ๐Ÿงช Intended Use - Research and development - Prototyping generative applications - Fast inference environments with limited GPU memory ## ๐Ÿ“– References - Microsoft Phi-2: https://huggingface.co/microsoft/phi-2 - GPTQModel: https://github.com/ModelCoud/GPTQModel - Transformers: https://github.com/huggingface/transformers ## โš–๏ธ License This model is distributed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).