Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

7,067

Full-text search

Active filters: gptq

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-BitBLAS

Text Generation • 51B • Updated Jul 22, 2024 • 6

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-BitBLAS

Text Generation • 51B • Updated Jul 22, 2024 • 9

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-GPTQ

Text Generation • 13B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-BitBLAS

Text Generation • 51B • Updated Jul 22, 2024 • 5

Xu-Ouyang/pythia-2.8b-deduped-int4-step129000-GPTQ-wikitext2

Text Generation • 3B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-GPTQ

Text Generation • 13B • Updated Jul 22, 2024 • 12

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-BitBLAS

Text Generation • 274B • Updated Jul 22, 2024 • 2

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 69B • Updated Jul 22, 2024 • 13

ChenMnZ/Llama-2-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 69B • Updated Jul 22, 2024 • 8

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-BitBLAS

Text Generation • 275B • Updated Jul 22, 2024 • 5

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-GPTQ

Text Generation • 69B • Updated Jul 22, 2024 • 21

Xu-Ouyang/pythia-2.8b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 3B • Updated Jul 22, 2024 • 4

Xu-Ouyang/pythia-12b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 11B • Updated Jul 22, 2024 • 5

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 14

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 8 • 1

Xu-Ouyang/pythia-2.8b-deduped-int3-step29000-GPTQ-wikitext2

Text Generation • 3B • Updated Jul 22, 2024 • 4

ModelCloud/gemma-2-27b-it-gptq-4bit

Text Generation • 28B • Updated Jul 23, 2024 • 33 • 12

ChenMnZ/Llama-2-7b-EfficientQAT-w4g128-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 14

ChenMnZ/Llama-3-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 6

ChenMnZ/Llama-3-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 7

ChenMnZ/Llama-3-70b-EfficientQAT-w4g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 7

Xu-Ouyang/pythia-2.8b-deduped-int3-step43000-GPTQ-wikitext2

Text Generation • 3B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 6

Llamarider222/Mixtral_8x7B_GPTQ

Text Generation • 47B • Updated Jul 22, 2024 • 15

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g64-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 7

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-BitBLAS

Text Generation • 26B • Updated Jul 22, 2024 • 5

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w4g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-BitBLAS

Text Generation • 26B • Updated Jul 22, 2024 • 6

Xu-Ouyang/pythia-2.8b-deduped-int3-step57000-GPTQ-wikitext2

Text Generation • 3B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-2-7b-EfficientQAT-w4g128-BitBLAS

Text Generation • 26B • Updated Jul 22, 2024 • 4