Instructions to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="maddes8cht/tiiuae-falcon-40b-instruct-gguf",
	filename="tiiuae-falcon-40b-instruct-Q2_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

Use Docker

docker model run hf.co/maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with Ollama:
```
ollama run hf.co/maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
```

Unsloth Studio new

How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for maddes8cht/tiiuae-falcon-40b-instruct-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for maddes8cht/tiiuae-falcon-40b-instruct-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for maddes8cht/tiiuae-falcon-40b-instruct-gguf to start chatting

Docker Model Runner
How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with Docker Model Runner:
```
docker model run hf.co/maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M
```

Lemonade

How to use maddes8cht/tiiuae-falcon-40b-instruct-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull maddes8cht/tiiuae-falcon-40b-instruct-gguf:Q4_K_M

Run and chat with the model

lemonade run user.tiiuae-falcon-40b-instruct-gguf-Q4_K_M

List all available models

lemonade list

Failure loading in text-generation-webui.

by Nafnlaus - opened Sep 24, 2023

Discussion

Nafnlaus

Sep 24, 2023

Using the command:

python server.py --model maddes8cht_tiiuae-falcon-40b-instruct-gguf --listen --listen-port 4664 --verbose --api --xformers

... (or specifying a specific gguf file), I get:

2023-09-24 22:26:10 INFO:Loading settings from settings.json...
Traceback (most recent call last):
File "/home/user/text-generation-webui/server.py", line 216, in
model_settings = get_model_metadata(model_name)
File "/home/user/text-generation-webui/modules/models_settings.py", line 31, in get_model_metadata
for k in settings[pat]:
TypeError: 'NoneType' object is not iterable

I have no problem loading other ggufs.

maddes8cht

Owner Sep 25, 2023

Can you specify on which quantization levels the error occurs?
All of them?
This was one of my very first models I quantized, and while things where not as smooth in the beginning, i'm pretty sure to have tested them at least in Llama.cpp's main and server.

Did you get any of my other quantized models working ?

Nafnlaus

Sep 25, 2023

All of them, sadly.

I haven't tried any of your other models yet - I was waiting for maddes8cht/ehartford-WizardLM-Uncensored-Falcon-40b, which it looks like is now online, so I'll download it and try it out as soon as I can. :)

Thanks for your work, BTW, on actual open-source-licensed models. I wish more people would pay attention to licensing! LLaMA 2's viral license is a lot more insidious than I think a lot of people realize, harnessing the open-source community to do development for them and only allowing LLaMA 2 derivatives's generations to be used for training other LLaMA 2-derived models, while retaining the full commercial rights if any project ever goes big. FYI, if you're looking for more open-source models, I noticed that TigerBot is Apache licensed. :)

maddes8cht

Owner Sep 25, 2023

I would really like to get this fixed.
So I am eager to know if the wizardLM files work, please give me feedback.
And it could also be helpful to get a link on some examples of gguf files that do work for you.

Nafnlaus

Sep 25, 2023

I've tested one of the wizardLM files (the smallest one), and it does work. :) Will test more later this evening.

maddes8cht

Owner Sep 26, 2023

•

edited Sep 26, 2023

Okay - tht's strange.
I've tested again both of them, and they both work for me in Llama.cpp. I.ve also looked into the file with a hex-viewer if this one, the 40b instruct, does have the same gguf-file version, which it does. In the End I was almost sure that none of my models would work for me and it's possibly a problem with Oobaboga .

But glad to hear that it works.
I tested with the files I quantized and then used for the upload.
I may have to upload the model again - maybe something went wrong with the upload. It took me a while to get the uploads right without always losing the connection with these really big files and an asymmetric internet connection that only provides a fraction of the speed for the upload.

maddes8cht

Owner Sep 26, 2023

•

edited Sep 26, 2023

FYI, if you're looking for more open-source models, I noticed that TigerBot is Apache licensed. :)

It's still a Llama 2 based model, and I don't think relicensing it as Apache 2 is totally ok. They are allowed to license their own work as Apache 2, which should be mainly their training dataset, but not the full reaulting model - I think this is exactly what Meta's Llama 2 license prohibits.

So this is a similar blurry situation as with the early Alpaca models based on Llama (1): They released it as open source, which was then their dataset - but the status of the resulting models was very unclear.

BTW, the Bloke already have the TigerBot models converted.

This is about models he doesn't seem to care anymore.
I don't see a point in duplicating work he has already done.

Nafnlaus

Sep 30, 2023

"It's still a Llama 2 based model" - are you sure about that? I was under the impression that it was a foundational model. Says it was trained on 300B tokens, which would be an awful lot for a non-foundational model.

But yeah, no need to duplicate work :)

autobots

Oct 15, 2023

it's broken again as of 9 ish days ago due to changes to llama.cpp

ulymp

Nov 26, 2023

This model works for me with the latest llama.cpp. (Did not try in text-generation-webui though.)

maddes8cht

Owner Nov 27, 2023

Yes, all the falcon models where updated somewhere around end of october / begin of novenber and should work on current Llama.cpp versions.
:)

davidhung

Nov 27, 2023

•

edited Nov 27, 2023

I get this error when loading with llama.cpp on text gen

...
llama_model_loader: - type  f32:  242 tensors
llama_model_loader: - type q8_0:    1 tensors
llama_model_loader: - type q6_K:  241 tensors
ERROR: byte not found in vocab: '
'

Fixed this by reinstalling text gen with newest version

maddes8cht

Owner Nov 27, 2023

Llama.cpp is under heavy construction, third party software using it (like oobabooga) needs to be updated as well to stay up to date.
There has been several changes regarding falcon support in Llama.cpp which might have taken some time to be implemented in Oobabooga.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment