Instructions to use TheBloke/medalpaca-13B-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheBloke/medalpaca-13B-GPTQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TheBloke/medalpaca-13B-GPTQ")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheBloke/medalpaca-13B-GPTQ")
model = AutoModelForCausalLM.from_pretrained("TheBloke/medalpaca-13B-GPTQ")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TheBloke/medalpaca-13B-GPTQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TheBloke/medalpaca-13B-GPTQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/medalpaca-13B-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TheBloke/medalpaca-13B-GPTQ

SGLang

How to use TheBloke/medalpaca-13B-GPTQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TheBloke/medalpaca-13B-GPTQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/medalpaca-13B-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TheBloke/medalpaca-13B-GPTQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/medalpaca-13B-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TheBloke/medalpaca-13B-GPTQ with Docker Model Runner:
```
docker model run hf.co/TheBloke/medalpaca-13B-GPTQ
```

Can't load the model in webui

by GyroO - opened Apr 28, 2023

Discussion

GyroO

Apr 28, 2023

When I try to use this model I get this error

I don't know why this is happening,
I am able to load your other models normally like vicuna and wizardlm

DaTruAndi

Apr 30, 2023

Different problem here
/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 449, in load_state_dict
with open(checkpoint_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘models/TheBloke_medalpaca-13B-GPTQ-4bit/pytorch_model-00001-of-00006.bin’

tokenizer_config file also looks a bit funky.

TheBloke

Owner Apr 30, 2023

This will happen if you don't configure the GPTQ parameters:
bits = 4
groupsize = 128
model_type = llama

Then save those params for this model

I've just edited the README to add Instructions for easy download and run in text-gen-ui. Here they are:

How to easily download and use this model in text-generation-webui

Open the text-generation-webui UI as normal.

Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/medalpaca-13B-GPTQ-4bit.
Click Download.
Wait until it says it's finished downloading.
Click the Refresh icon next to Model in the top left.
In the Model drop-down: choose the model you just downloaded,medalpaca-13B-GPTQ-4bit.
If you see an error in the bottom right, ignore it - it's temporary.
Fill out the GPTQ parameters on the right: Bits = 4, Groupsize = 128, model_type = Llama
Click Save settings for this model in the top right.
Click Reload the Model in the top right.
Once it says it's loaded, click the Text Generation tab and enter a prompt!

alochiai

May 5, 2023

Ok, thank you, byt tried to use use pre_layer = 12 to load in my GTX 1060 6GB and received the error below, can anything be done?:

Traceback (most recent call last):
File “I:\oobabooga\oobabooga_windows\text-generation-webui\server.py”, line 103, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “I:\oobabooga\oobabooga_windows\text-generation-webui\modules\models.py”, line 159, in load_model
model = load_quantized(model_name)
File “I:\oobabooga\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 176, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
File “I:\oobabooga\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py”, line 226, in load_quant
model.load_state_dict(safe_load(checkpoint))
File “I:\oobabooga\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py”, line 2041, in load_state_dict
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: “model.layers.0.self_attn.k_proj.bias”, “model.layers.0.self_attn.o_proj.bias”, “model.layers.0.self_attn.q_proj.bias”, “model.layers.0.self_attn.v_proj.bias”, “model.layers.0.mlp.down_proj.bias”, “model.layers.0.mlp.gate_proj.bias”, “model.layers.0.mlp.up_proj.bias”, “model.layers.1.self_attn.k_proj.bias”, “model.layers.1.self_attn.o_proj.bias”, “model.layers.1.self_attn.q_proj.bias”, “model.layers.1.self_attn.v_proj.bias”, “model.layers.1.mlp.down_proj.bias”, “model.layers.1.mlp.gate_proj.bias”, “model.layers.1.mlp.up_proj.bias”, “model.layers.2.self_attn.k_proj.bias”, “model.layers.2.self_attn.o_proj.bias”, “model.layers.2.self_attn.q_proj.bias”, “model.layers.2.self_attn.v_proj.bias”, “model.layers.2.mlp.down_proj.bias”, “model.layers.2.mlp.gate_proj.bias”, “model.layers.2.mlp.up_proj.bias”, “model.layers.3.self_attn.k_proj.bias”, “model.layers.3.self_attn.o_proj.bias”, “model.layers.3.self_attn.q_proj.bias”, “model.layers.3.self_attn.v_proj.bias”, “model.layers.3.mlp.down_proj.bias”, “model.layers.3.mlp.gate_proj.bias”, “model.layers.3.mlp.up_proj.bias”, “model.layers.4.self_attn.k_proj.bias”, “model.layers.4.self_attn.o_proj.bias”, “model.layers.4.self_attn.q_proj.bias”, “model.layers.4.self_attn.v_proj.bias”, “model.layers.4.mlp.down_proj.bias”, “model.layers.4.mlp.gate_proj.bias”, “model.layers.4.mlp.up_proj.bias”, “model.layers.5.self_attn.k_proj.bias”, “model.layers.5.self_attn.o_proj.bias”, “model.layers.5.self_attn.q_proj.bias”, “model.layers.5.self_attn.v_proj.bias”, “model.layers.5.mlp.down_proj.bias”, “model.layers.5.mlp.gate_proj.bias”, “model.layers.5.mlp.up_proj.bias”, “model.layers.6.self_attn.k_proj.bias”, “model.layers.6.self_attn.o_proj.bias”, “model.layers.6.self_attn.q_proj.bias”, “model.layers.6.self_attn.v_proj.bias”, “model.layers.6.mlp.down_proj.bias”, “model.layers.6.mlp.gate_proj.bias”, “model.layers.6.mlp.up_proj.bias”, “model.layers.7.self_attn.k_proj.bias”, “model.layers.7.self_attn.o_proj.bias”, “model.layers.7.self_attn.q_proj.bias”, “model.layers.7.self_attn.v_proj.bias”, “model.layers.7.mlp.down_proj.bias”, “model.layers.7.mlp.gate_proj.bias”, “model.layers.7.mlp.up_proj.bias”, “model.layers.8.self_attn.k_proj.bias”, “model.layers.8.self_attn.o_proj.bias”, “model.layers.8.self_attn.q_proj.bias”, “model.layers.8.self_attn.v_proj.bias”, “model.layers.8.mlp.down_proj.bias”, “model.layers.8.mlp.gate_proj.bias”, “model.layers.8.mlp.up_proj.bias”, “model.layers.9.self_attn.k_proj.bias”, “model.layers.9.self_attn.o_proj.bias”, “model.layers.9.self_attn.q_proj.bias”, “model.layers.9.self_attn.v_proj.bias”, “model.layers.9.mlp.down_proj.bias”, “model.layers.9.mlp.gate_proj.bias”, “model.layers.9.mlp.up_proj.bias”, “model.layers.10.self_attn.k_proj.bias”, “model.layers.10.self_attn.o_proj.bias”, “model.layers.10.self_attn.q_proj.bias”, “model.layers.10.self_attn.v_proj.bias”, “model.layers.10.mlp.down_proj.bias”, “model.layers.10.mlp.gate_proj.bias”, “model.layers.10.mlp.up_proj.bias”, “model.layers.11.self_attn.k_proj.bias”, “model.layers.11.self_attn.o_proj.bias”, “model.layers.11.self_attn.q_proj.bias”, “model.layers.11.self_attn.v_proj.bias”, “model.layers.11.mlp.down_proj.bias”, “model.layers.11.mlp.gate_proj.bias”, “model.layers.11.mlp.up_proj.bias”, “model.layers.12.self_attn.k_proj.bias”, “model.layers.12.self_attn.o_proj.bias”, “model.layers.12.self_attn.q_proj.bias”, “model.layers.12.self_attn.v_proj.bias”, “model.layers.12.mlp.down_proj.bias”, “model.layers.12.mlp.gate_proj.bias”, “model.layers.12.mlp.up_proj.bias”, “model.layers.13.self_attn.k_proj.bias”, “model.layers.13.self_attn.o_proj.bias”, “model.layers.13.self_attn.q_proj.bias”, “model.layers.13.self_attn.v_proj.bias”, “model.layers.13.mlp.down_proj.bias”, “model.layers.13.mlp.gate_proj.bias”, “model.layers.13.mlp.up_proj.bias”, “model.layers.14.self_attn.k_proj.bias”, “model.layers.14.self_attn.o_proj.bias”, “model.layers.14.self_attn.q_proj.bias”, “model.layers.14.self_attn.v_proj.bias”, “model.layers.14.mlp.down_proj.bias”, “model.layers.14.mlp.gate_proj.bias”, “model.layers.14.mlp.up_proj.bias”, “model.layers.15.self_attn.k_proj.bias”, “model.layers.15.self_attn.o_proj.bias”, “model.layers.15.self_attn.q_proj.bias”, “model.layers.15.self_attn.v_proj.bias”, “model.layers.15.mlp.down_proj.bias”, “model.layers.15.mlp.gate_proj.bias”, “model.layers.15.mlp.up_proj.bias”, “model.layers.16.self_attn.k_proj.bias”, “model.layers.16.self_attn.o_proj.bias”, “model.layers.16.self_attn.q_proj.bias”, “model.layers.16.self_attn.v_proj.bias”, “model.layers.16.mlp.down_proj.bias”, “model.layers.16.mlp.gate_proj.bias”, “model.layers.16.mlp.up_proj.bias”, “model.layers.17.self_attn.k_proj.bias”, “model.layers.17.self_attn.o_proj.bias”, “model.layers.17.self_attn.q_proj.bias”, “model.layers.17.self_attn.v_proj.bias”, “model.layers.17.mlp.down_proj.bias”, “model.layers.17.mlp.gate_proj.bias”, “model.layers.17.mlp.up_proj.bias”, “model.layers.18.self_attn.k_proj.bias”, “model.layers.18.self_attn.o_proj.bias”, “model.layers.18.self_attn.q_proj.bias”, “model.layers.18.self_attn.v_proj.bias”, “model.layers.18.mlp.down_proj.bias”, “model.layers.18.mlp.gate_proj.bias”, “model.layers.18.mlp.up_proj.bias”, “model.layers.19.self_attn.k_proj.bias”, “model.layers.19.self_attn.o_proj.bias”, “model.layers.19.self_attn.q_proj.bias”, “model.layers.19.self_attn.v_proj.bias”, “model.layers.19.mlp.down_proj.bias”, “model.layers.19.mlp.gate_proj.bias”, “model.layers.19.mlp.up_proj.bias”, “model.layers.20.self_attn.k_proj.bias”, “model.layers.20.self_attn.o_proj.bias”, “model.layers.20.self_attn.q_proj.bias”, “model.layers.20.self_attn.v_proj.bias”, “model.layers.20.mlp.down_proj.bias”, “model.layers.20.mlp.gate_proj.bias”, “model.layers.20.mlp.up_proj.bias”, “model.layers.21.self_attn.k_proj.bias”, “model.layers.21.self_attn.o_proj.bias”, “model.layers.21.self_attn.q_proj.bias”, “model.layers.21.self_attn.v_proj.bias”, “model.layers.21.mlp.down_proj.bias”, “model.layers.21.mlp.gate_proj.bias”, “model.layers.21.mlp.up_proj.bias”, “model.layers.22.self_attn.k_proj.bias”, “model.layers.22.self_attn.o_proj.bias”, “model.layers.22.self_attn.q_proj.bias”, “model.layers.22.self_attn.v_proj.bias”, “model.layers.22.mlp.down_proj.bias”, “model.layers.22.mlp.gate_proj.bias”, “model.layers.22.mlp.up_proj.bias”, “model.layers.23.self_attn.k_proj.bias”, “model.layers.23.self_attn.o_proj.bias”, “model.layers.23.self_attn.q_proj.bias”, “model.layers.23.self_attn.v_proj.bias”, “model.layers.23.mlp.down_proj.bias”, “model.layers.23.mlp.gate_proj.bias”, “model.layers.23.mlp.up_proj.bias”, “model.layers.24.self_attn.k_proj.bias”, “model.layers.24.self_attn.o_proj.bias”, “model.layers.24.self_attn.q_proj.bias”, “model.layers.24.self_attn.v_proj.bias”, “model.layers.24.mlp.down_proj.bias”, “model.layers.24.mlp.gate_proj.bias”, “model.layers.24.mlp.up_proj.bias”, “model.layers.25.self_attn.k_proj.bias”, “model.layers.25.self_attn.o_proj.bias”, “model.layers.25.self_attn.q_proj.bias”, “model.layers.25.self_attn.v_proj.bias”, “model.layers.25.mlp.down_proj.bias”, “model.layers.25.mlp.gate_proj.bias”, “model.layers.25.mlp.up_proj.bias”, “model.layers.26.self_attn.k_proj.bias”, “model.layers.26.self_attn.o_proj.bias”, “model.layers.26.self_attn.q_proj.bias”, “model.layers.26.self_attn.v_proj.bias”, “model.layers.26.mlp.down_proj.bias”, “model.layers.26.mlp.gate_proj.bias”, “model.layers.26.mlp.up_proj.bias”, “model.layers.27.self_attn.k_proj.bias”, “model.layers.27.self_attn.o_proj.bias”, “model.layers.27.self_attn.q_proj.bias”, “model.layers.27.self_attn.v_proj.bias”, “model.layers.27.mlp.down_proj.bias”, “model.layers.27.mlp.gate_proj.bias”, “model.layers.27.mlp.up_proj.bias”, “model.layers.28.self_attn.k_proj.bias”, “model.layers.28.self_attn.o_proj.bias”, “model.layers.28.self_attn.q_proj.bias”, “model.layers.28.self_attn.v_proj.bias”, “model.layers.28.mlp.down_proj.bias”, “model.layers.28.mlp.gate_proj.bias”, “model.layers.28.mlp.up_proj.bias”, “model.layers.29.self_attn.k_proj.bias”, “model.layers.29.self_attn.o_proj.bias”, “model.layers.29.self_attn.q_proj.bias”, “model.layers.29.self_attn.v_proj.bias”, “model.layers.29.mlp.down_proj.bias”, “model.layers.29.mlp.gate_proj.bias”, “model.layers.29.mlp.up_proj.bias”, “model.layers.30.self_attn.k_proj.bias”, “model.layers.30.self_attn.o_proj.bias”, “model.layers.30.self_attn.q_proj.bias”, “model.layers.30.self_attn.v_proj.bias”, “model.layers.30.mlp.down_proj.bias”, “model.layers.30.mlp.gate_proj.bias”, “model.layers.30.mlp.up_proj.bias”, “model.layers.31.self_attn.k_proj.bias”, “model.layers.31.self_attn.o_proj.bias”, “model.layers.31.self_attn.q_proj.bias”, “model.layers.31.self_attn.v_proj.bias”, “model.layers.31.mlp.down_proj.bias”, “model.layers.31.mlp.gate_proj.bias”, “model.layers.31.mlp.up_proj.bias”, “model.layers.32.self_attn.k_proj.bias”, “model.layers.32.self_attn.o_proj.bias”, “model.layers.32.self_attn.q_proj.bias”, “model.layers.32.self_attn.v_proj.bias”, “model.layers.32.mlp.down_proj.bias”, “model.layers.32.mlp.gate_proj.bias”, “model.layers.32.mlp.up_proj.bias”, “model.layers.33.self_attn.k_proj.bias”, “model.layers.33.self_attn.o_proj.bias”, “model.layers.33.self_attn.q_proj.bias”, “model.layers.33.self_attn.v_proj.bias”, “model.layers.33.mlp.down_proj.bias”, “model.layers.33.mlp.gate_proj.bias”, “model.layers.33.mlp.up_proj.bias”, “model.layers.34.self_attn.k_proj.bias”, “model.layers.34.self_attn.o_proj.bias”, “model.layers.34.self_attn.q_proj.bias”, “model.layers.34.self_attn.v_proj.bias”, “model.layers.34.mlp.down_proj.bias”, “model.layers.34.mlp.gate_proj.bias”, “model.layers.34.mlp.up_proj.bias”, “model.layers.35.self_attn.k_proj.bias”, “model.layers.35.self_attn.o_proj.bias”, “model.layers.35.self_attn.q_proj.bias”, “model.layers.35.self_attn.v_proj.bias”, “model.layers.35.mlp.down_proj.bias”, “model.layers.35.mlp.gate_proj.bias”, “model.layers.35.mlp.up_proj.bias”, “model.layers.36.self_attn.k_proj.bias”, “model.layers.36.self_attn.o_proj.bias”, “model.layers.36.self_attn.q_proj.bias”, “model.layers.36.self_attn.v_proj.bias”, “model.layers.36.mlp.down_proj.bias”, “model.layers.36.mlp.gate_proj.bias”, “model.layers.36.mlp.up_proj.bias”, “model.layers.37.self_attn.k_proj.bias”, “model.layers.37.self_attn.o_proj.bias”, “model.layers.37.self_attn.q_proj.bias”, “model.layers.37.self_attn.v_proj.bias”, “model.layers.37.mlp.down_proj.bias”, “model.layers.37.mlp.gate_proj.bias”, “model.layers.37.mlp.up_proj.bias”, “model.layers.38.self_attn.k_proj.bias”, “model.layers.38.self_attn.o_proj.bias”, “model.layers.38.self_attn.q_proj.bias”, “model.layers.38.self_attn.v_proj.bias”, “model.layers.38.mlp.down_proj.bias”, “model.layers.38.mlp.gate_proj.bias”, “model.layers.38.mlp.up_proj.bias”, “model.layers.39.self_attn.k_proj.bias”, “model.layers.39.self_attn.o_proj.bias”, “model.layers.39.self_attn.q_proj.bias”, “model.layers.39.self_attn.v_proj.bias”, “model.layers.39.mlp.down_proj.bias”, “model.layers.39.mlp.gate_proj.bias”, “model.layers.39.mlp.up_proj.bias”.
Unexpected key(s) in state_dict: “model.layers.0.self_attn.k_proj.g_idx”, “model.layers.0.self_attn.o_proj.g_idx”, “model.layers.0.self_attn.q_proj.g_idx”, “model.layers.0.self_attn.v_proj.g_idx”, “model.layers.0.mlp.down_proj.g_idx”, “model.layers.0.mlp.gate_proj.g_idx”, “model.layers.0.mlp.up_proj.g_idx”, “model.layers.1.self_attn.k_proj.g_idx”, “model.layers.1.self_attn.o_proj.g_idx”, “model.layers.1.self_attn.q_proj.g_idx”, “model.layers.1.self_attn.v_proj.g_idx”, “model.layers.1.mlp.down_proj.g_idx”, “model.layers.1.mlp.gate_proj.g_idx”, “model.layers.1.mlp.up_proj.g_idx”, “model.layers.2.self_attn.k_proj.g_idx”, “model.layers.2.self_attn.o_proj.g_idx”, “model.layers.2.self_attn.q_proj.g_idx”, “model.layers.2.self_attn.v_proj.g_idx”, “model.layers.2.mlp.down_proj.g_idx”, “model.layers.2.mlp.gate_proj.g_idx”, “model.layers.2.mlp.up_proj.g_idx”, “model.layers.3.self_attn.k_proj.g_idx”, “model.layers.3.self_attn.o_proj.g_idx”, “model.layers.3.self_attn.q_proj.g_idx”, “model.layers.3.self_attn.v_proj.g_idx”, “model.layers.3.mlp.down_proj.g_idx”, “model.layers.3.mlp.gate_proj.g_idx”, “model.layers.3.mlp.up_proj.g_idx”, “model.layers.4.self_attn.k_proj.g_idx”, “model.layers.4.self_attn.o_proj.g_idx”, “model.layers.4.self_attn.q_proj.g_idx”, “model.layers.4.self_attn.v_proj.g_idx”, “model.layers.4.mlp.down_proj.g_idx”, “model.layers.4.mlp.gate_proj.g_idx”, “model.layers.4.mlp.up_proj.g_idx”, “model.layers.5.self_attn.k_proj.g_idx”, “model.layers.5.self_attn.o_proj.g_idx”, “model.layers.5.self_attn.q_proj.g_idx”, “model.layers.5.self_attn.v_proj.g_idx”, “model.layers.5.mlp.down_proj.g_idx”, “model.layers.5.mlp.gate_proj.g_idx”, “model.layers.5.mlp.up_proj.g_idx”, “model.layers.6.self_attn.k_proj.g_idx”, “model.layers.6.self_attn.o_proj.g_idx”, “model.layers.6.self_attn.q_proj.g_idx”, “model.layers.6.self_attn.v_proj.g_idx”, “model.layers.6.mlp.down_proj.g_idx”, “model.layers.6.mlp.gate_proj.g_idx”, “model.layers.6.mlp.up_proj.g_idx”, “model.layers.7.self_attn.k_proj.g_idx”, “model.layers.7.self_attn.o_proj.g_idx”, “model.layers.7.self_attn.q_proj.g_idx”, “model.layers.7.self_attn.v_proj.g_idx”, “model.layers.7.mlp.down_proj.g_idx”, “model.layers.7.mlp.gate_proj.g_idx”, “model.layers.7.mlp.up_proj.g_idx”, “model.layers.8.self_attn.k_proj.g_idx”, “model.layers.8.self_attn.o_proj.g_idx”, “model.layers.8.self_attn.q_proj.g_idx”, “model.layers.8.self_attn.v_proj.g_idx”, “model.layers.8.mlp.down_proj.g_idx”, “model.layers.8.mlp.gate_proj.g_idx”, “model.layers.8.mlp.up_proj.g_idx”, “model.layers.9.self_attn.k_proj.g_idx”, “model.layers.9.self_attn.o_proj.g_idx”, “model.layers.9.self_attn.q_proj.g_idx”, “model.layers.9.self_attn.v_proj.g_idx”, “model.layers.9.mlp.down_proj.g_idx”, “model.layers.9.mlp.gate_proj.g_idx”, “model.layers.9.mlp.up_proj.g_idx”, “model.layers.10.self_attn.k_proj.g_idx”, “model.layers.10.self_attn.o_proj.g_idx”, “model.layers.10.self_attn.q_proj.g_idx”, “model.layers.10.self_attn.v_proj.g_idx”, “model.layers.10.mlp.down_proj.g_idx”, “model.layers.10.mlp.gate_proj.g_idx”, “model.layers.10.mlp.up_proj.g_idx”, “model.layers.11.self_attn.k_proj.g_idx”, “model.layers.11.self_attn.o_proj.g_idx”, “model.layers.11.self_attn.q_proj.g_idx”, “model.layers.11.self_attn.v_proj.g_idx”, “model.layers.11.mlp.down_proj.g_idx”, “model.layers.11.mlp.gate_proj.g_idx”, “model.layers.11.mlp.up_proj.g_idx”, “model.layers.12.self_attn.k_proj.g_idx”, “model.layers.12.self_attn.o_proj.g_idx”, “model.layers.12.self_attn.q_proj.g_idx”, “model.layers.12.self_attn.v_proj.g_idx”, “model.layers.12.mlp.down_proj.g_idx”, “model.layers.12.mlp.gate_proj.g_idx”, “model.layers.12.mlp.up_proj.g_idx”, “model.layers.13.self_attn.k_proj.g_idx”, “model.layers.13.self_attn.o_proj.g_idx”, “model.layers.13.self_attn.q_proj.g_idx”, “model.layers.13.self_attn.v_proj.g_idx”, “model.layers.13.mlp.down_proj.g_idx”, “model.layers.13.mlp.gate_proj.g_idx”, “model.layers.13.mlp.up_proj.g_idx”, “model.layers.14.self_attn.k_proj.g_idx”, “model.layers.14.self_attn.o_proj.g_idx”, “model.layers.14.self_attn.q_proj.g_idx”, “model.layers.14.self_attn.v_proj.g_idx”, “model.layers.14.mlp.down_proj.g_idx”, “model.layers.14.mlp.gate_proj.g_idx”, “model.layers.14.mlp.up_proj.g_idx”, “model.layers.15.self_attn.k_proj.g_idx”, “model.layers.15.self_attn.o_proj.g_idx”, “model.layers.15.self_attn.q_proj.g_idx”, “model.layers.15.self_attn.v_proj.g_idx”, “model.layers.15.mlp.down_proj.g_idx”, “model.layers.15.mlp.gate_proj.g_idx”, “model.layers.15.mlp.up_proj.g_idx”, “model.layers.16.self_attn.k_proj.g_idx”, “model.layers.16.self_attn.o_proj.g_idx”, “model.layers.16.self_attn.q_proj.g_idx”, “model.layers.16.self_attn.v_proj.g_idx”, “model.layers.16.mlp.down_proj.g_idx”, “model.layers.16.mlp.gate_proj.g_idx”, “model.layers.16.mlp.up_proj.g_idx”, “model.layers.17.self_attn.k_proj.g_idx”, “model.layers.17.self_attn.o_proj.g_idx”, “model.layers.17.self_attn.q_proj.g_idx”, “model.layers.17.self_attn.v_proj.g_idx”, “model.layers.17.mlp.down_proj.g_idx”, “model.layers.17.mlp.gate_proj.g_idx”, “model.layers.17.mlp.up_proj.g_idx”, “model.layers.18.self_attn.k_proj.g_idx”, “model.layers.18.self_attn.o_proj.g_idx”, “model.layers.18.self_attn.q_proj.g_idx”, “model.layers.18.self_attn.v_proj.g_idx”, “model.layers.18.mlp.down_proj.g_idx”, “model.layers.18.mlp.gate_proj.g_idx”, “model.layers.18.mlp.up_proj.g_idx”, “model.layers.19.self_attn.k_proj.g_idx”, “model.layers.19.self_attn.o_proj.g_idx”, “model.layers.19.self_attn.q_proj.g_idx”, “model.layers.19.self_attn.v_proj.g_idx”, “model.layers.19.mlp.down_proj.g_idx”, “model.layers.19.mlp.gate_proj.g_idx”, “model.layers.19.mlp.up_proj.g_idx”, “model.layers.20.self_attn.k_proj.g_idx”, “model.layers.20.self_attn.o_proj.g_idx”, “model.layers.20.self_attn.q_proj.g_idx”, “model.layers.20.self_attn.v_proj.g_idx”, “model.layers.20.mlp.down_proj.g_idx”, “model.layers.20.mlp.gate_proj.g_idx”, “model.layers.20.mlp.up_proj.g_idx”, “model.layers.21.self_attn.k_proj.g_idx”, “model.layers.21.self_attn.o_proj.g_idx”, “model.layers.21.self_attn.q_proj.g_idx”, “model.layers.21.self_attn.v_proj.g_idx”, “model.layers.21.mlp.down_proj.g_idx”, “model.layers.21.mlp.gate_proj.g_idx”, “model.layers.21.mlp.up_proj.g_idx”, “model.layers.22.self_attn.k_proj.g_idx”, “model.layers.22.self_attn.o_proj.g_idx”, “model.layers.22.self_attn.q_proj.g_idx”, “model.layers.22.self_attn.v_proj.g_idx”, “model.layers.22.mlp.down_proj.g_idx”, “model.layers.22.mlp.gate_proj.g_idx”, “model.layers.22.mlp.up_proj.g_idx”, “model.layers.23.self_attn.k_proj.g_idx”, “model.layers.23.self_attn.o_proj.g_idx”, “model.layers.23.self_attn.q_proj.g_idx”, “model.layers.23.self_attn.v_proj.g_idx”, “model.layers.23.mlp.down_proj.g_idx”, “model.layers.23.mlp.gate_proj.g_idx”, “model.layers.23.mlp.up_proj.g_idx”, “model.layers.24.self_attn.k_proj.g_idx”, “model.layers.24.self_attn.o_proj.g_idx”, “model.layers.24.self_attn.q_proj.g_idx”, “model.layers.24.self_attn.v_proj.g_idx”, “model.layers.24.mlp.down_proj.g_idx”, “model.layers.24.mlp.gate_proj.g_idx”, “model.layers.24.mlp.up_proj.g_idx”, “model.layers.25.self_attn.k_proj.g_idx”, “model.layers.25.self_attn.o_proj.g_idx”, “model.layers.25.self_attn.q_proj.g_idx”, “model.layers.25.self_attn.v_proj.g_idx”, “model.layers.25.mlp.down_proj.g_idx”, “model.layers.25.mlp.gate_proj.g_idx”, “model.layers.25.mlp.up_proj.g_idx”, “model.layers.26.self_attn.k_proj.g_idx”, “model.layers.26.self_attn.o_proj.g_idx”, “model.layers.26.self_attn.q_proj.g_idx”, “model.layers.26.self_attn.v_proj.g_idx”, “model.layers.26.mlp.down_proj.g_idx”, “model.layers.26.mlp.gate_proj.g_idx”, “model.layers.26.mlp.up_proj.g_idx”, “model.layers.27.self_attn.k_proj.g_idx”, “model.layers.27.self_attn.o_proj.g_idx”, “model.layers.27.self_attn.q_proj.g_idx”, “model.layers.27.self_attn.v_proj.g_idx”, “model.layers.27.mlp.down_proj.g_idx”, “model.layers.27.mlp.gate_proj.g_idx”, “model.layers.27.mlp.up_proj.g_idx”, “model.layers.28.self_attn.k_proj.g_idx”, “model.layers.28.self_attn.o_proj.g_idx”, “model.layers.28.self_attn.q_proj.g_idx”, “model.layers.28.self_attn.v_proj.g_idx”, “model.layers.28.mlp.down_proj.g_idx”, “model.layers.28.mlp.gate_proj.g_idx”, “model.layers.28.mlp.up_proj.g_idx”, “model.layers.29.self_attn.k_proj.g_idx”, “model.layers.29.self_attn.o_proj.g_idx”, “model.layers.29.self_attn.q_proj.g_idx”, “model.layers.29.self_attn.v_proj.g_idx”, “model.layers.29.mlp.down_proj.g_idx”, “model.layers.29.mlp.gate_proj.g_idx”, “model.layers.29.mlp.up_proj.g_idx”, “model.layers.30.self_attn.k_proj.g_idx”, “model.layers.30.self_attn.o_proj.g_idx”, “model.layers.30.self_attn.q_proj.g_idx”, “model.layers.30.self_attn.v_proj.g_idx”, “model.layers.30.mlp.down_proj.g_idx”, “model.layers.30.mlp.gate_proj.g_idx”, “model.layers.30.mlp.up_proj.g_idx”, “model.layers.31.self_attn.k_proj.g_idx”, “model.layers.31.self_attn.o_proj.g_idx”, “model.layers.31.self_attn.q_proj.g_idx”, “model.layers.31.self_attn.v_proj.g_idx”, “model.layers.31.mlp.down_proj.g_idx”, “model.layers.31.mlp.gate_proj.g_idx”, “model.layers.31.mlp.up_proj.g_idx”, “model.layers.32.self_attn.k_proj.g_idx”, “model.layers.32.self_attn.o_proj.g_idx”, “model.layers.32.self_attn.q_proj.g_idx”, “model.layers.32.self_attn.v_proj.g_idx”, “model.layers.32.mlp.down_proj.g_idx”, “model.layers.32.mlp.gate_proj.g_idx”, “model.layers.32.mlp.up_proj.g_idx”, “model.layers.33.self_attn.k_proj.g_idx”, “model.layers.33.self_attn.o_proj.g_idx”, “model.layers.33.self_attn.q_proj.g_idx”, “model.layers.33.self_attn.v_proj.g_idx”, “model.layers.33.mlp.down_proj.g_idx”, “model.layers.33.mlp.gate_proj.g_idx”, “model.layers.33.mlp.up_proj.g_idx”, “model.layers.34.self_attn.k_proj.g_idx”, “model.layers.34.self_attn.o_proj.g_idx”, “model.layers.34.self_attn.q_proj.g_idx”, “model.layers.34.self_attn.v_proj.g_idx”, “model.layers.34.mlp.down_proj.g_idx”, “model.layers.34.mlp.gate_proj.g_idx”, “model.layers.34.mlp.up_proj.g_idx”, “model.layers.35.self_attn.k_proj.g_idx”, “model.layers.35.self_attn.o_proj.g_idx”, “model.layers.35.self_attn.q_proj.g_idx”, “model.layers.35.self_attn.v_proj.g_idx”, “model.layers.35.mlp.down_proj.g_idx”, “model.layers.35.mlp.gate_proj.g_idx”, “model.layers.35.mlp.up_proj.g_idx”, “model.layers.36.self_attn.k_proj.g_idx”, “model.layers.36.self_attn.o_proj.g_idx”, “model.layers.36.self_attn.q_proj.g_idx”, “model.layers.36.self_attn.v_proj.g_idx”, “model.layers.36.mlp.down_proj.g_idx”, “model.layers.36.mlp.gate_proj.g_idx”, “model.layers.36.mlp.up_proj.g_idx”, “model.layers.37.self_attn.k_proj.g_idx”, “model.layers.37.self_attn.o_proj.g_idx”, “model.layers.37.self_attn.q_proj.g_idx”, “model.layers.37.self_attn.v_proj.g_idx”, “model.layers.37.mlp.down_proj.g_idx”, “model.layers.37.mlp.gate_proj.g_idx”, “model.layers.37.mlp.up_proj.g_idx”, “model.layers.38.self_attn.k_proj.g_idx”, “model.layers.38.self_attn.o_proj.g_idx”, “model.layers.38.self_attn.q_proj.g_idx”, “model.layers.38.self_attn.v_proj.g_idx”, “model.layers.38.mlp.down_proj.g_idx”, “model.layers.38.mlp.gate_proj.g_idx”, “model.layers.38.mlp.up_proj.g_idx”, “model.layers.39.self_attn.k_proj.g_idx”, “model.layers.39.self_attn.o_proj.g_idx”, “model.layers.39.self_attn.q_proj.g_idx”, “model.layers.39.self_attn.v_proj.g_idx”, “model.layers.39.mlp.down_proj.g_idx”, “model.layers.39.mlp.gate_proj.g_idx”, “model.layers.39.mlp.up_proj.g_idx”.

TheBloke

Owner May 5, 2023

OK yeah there's a problem with pre_layer on these models at the moment. I don't currently have a solution for that I'm afraid.

There will be new GPTQ code available in the next week or two which should hopefully resolve this.

alochiai

May 5, 2023

fyi, the model koala7b gptq 4 bit 128 you made available here this parameter works...

audioscavenger

Sep 24, 2023

you need to use model loader ExLlama_HF or it will crash

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment