---
quantized_by: QuixiAI
pipeline_tag: text-generation
language:
- en
base_model_relation: quantized
base_model: QuixiAI/Ina-v11.1
license: cc-by-nc-4.0
---
# Llamacpp Quantizations of Ina-v11.1
**Ina** interprets persona definitions as *executable instructions*.
The model follows `<>` blocks with extremely high fidelity even during 10k–15k token erotic or dark-fiction role-play sessions.
Fine-tuned by **BaiAI** and **Eric Hartford (QuixiAI)** using QLoRA + DPO on large volumes of RP logs, creator-voice datasets, and persona modules.
Contributor Credits:
- "Cheshire Cat"
- [FitQueen666](https://huggingface.co/FitQueen666)
- [Jaroslavs Samcuks](https://huggingface.co/yarcat)
- [Eric Hartford](https://huggingface.co/QuixiAI)
---
Using llama.cpp
Original model: https://huggingface.co/QuixiAI/Ina-v11.1
Run them in [LM Studio](https://lmstudio.ai/)
Run them directly with [llama.cpp](https://github.com/ggml-org/llama.cpp), or any other llama.cpp based project
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Ina-v11.1-Q8_0.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/tree/main/Ina-v11.1-Q8_0) | Q8_0 | 70GB | true | Extremely high quality, generally unneeded but max available quant. |
| [Ina-v11.1-Q6_K.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/tree/main/Ina-v11.1-Q6_K) | Q6_K | 54GB | true | Very high quality, near perfect, *recommended*. |
| [Ina-v11.1-Q5_K_M.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q5_K_M.gguf) | Q5_K_M | 47GB | true | High quality, *recommended*. |
| [Ina-v11.1-Q5_K_S.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q5_K_S.gguf) | Q5_K_S | 38GB | true | High quality, *recommended*. |
| [Ina-v11.1-Q4_K_M.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q4_K_M.gguf) | Q4_K_M | 40GB | true | Good quality, default size for most use cases, *recommended*. |
| [Ina-v11.1-Q4_1.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q4_1.gguf) | Q4_1 | 41GB | true | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
| [Ina-v11.1-Q4_K_S.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q4_K_S.gguf) | Q4_K_S | 38GB | true | Slightly lower quality with more space savings, *recommended*. |
| [Ina-v11.1-Q4_0.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q4_0.gguf) | Q4_0 | 37GB | true | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| [Ina-v11.1-IQ4_NL.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-IQ4_NL.gguf) | IQ4_NL | 38GB | true | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| [Ina-v11.1-IQ4_XS.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-IQ4_XS.gguf) | IQ4_XS | 36GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Ina-v11.1-Q3_K_L.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q3_K_L.gguf) | Q3_K_L | 35GB | true | Lower quality but usable, good for low RAM availability. |
| [Ina-v11.1-Q3_K_M.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q3_K_M.gguf) | Q3_K_M | 32GB | true | Low quality. |
| [Ina-v11.1-IQ3_M.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-IQ3_M.gguf) | IQ3_M | 30GB | true | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Ina-v11.1-Q3_K_S.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-Q3_K_S.gguf) | Q3_K_S | 29GB | true | Low quality, not recommended. |
| [Ina-v11.1-IQ3_XS.gguf](https://huggingface.co/QuixiAI/Ina-v11.1-gguf/blob/main/Ina-v11.1-IQ3_XS.gguf) | IQ3_XS | 27GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
## Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (Ina-v11.1-Q8_0) or download them all in place (./)
## Which file should I choose?
Click here for details
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggml-org/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
## Credits
I copied Bartowski's model card and made it my own, cheers!