Instructions to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PORTULAN/gervasio-8b-portuguese-ptpt-decoder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PORTULAN/gervasio-8b-portuguese-ptpt-decoder")
model = AutoModelForCausalLM.from_pretrained("PORTULAN/gervasio-8b-portuguese-ptpt-decoder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="PORTULAN/gervasio-8b-portuguese-ptpt-decoder",
	filename="gervasio-8b-portuguese-ptpt-decoder-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
# Run inference directly in the terminal:
llama-cli -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
# Run inference directly in the terminal:
llama-cli -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
# Run inference directly in the terminal:
./llama-cli -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Use Docker

docker model run hf.co/PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

LM Studio
Jan

vLLM

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PORTULAN/gervasio-8b-portuguese-ptpt-decoder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PORTULAN/gervasio-8b-portuguese-ptpt-decoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

SGLang

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PORTULAN/gervasio-8b-portuguese-ptpt-decoder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PORTULAN/gervasio-8b-portuguese-ptpt-decoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PORTULAN/gervasio-8b-portuguese-ptpt-decoder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PORTULAN/gervasio-8b-portuguese-ptpt-decoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Ollama:
```
ollama run hf.co/PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
```

Unsloth Studio new

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PORTULAN/gervasio-8b-portuguese-ptpt-decoder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PORTULAN/gervasio-8b-portuguese-ptpt-decoder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for PORTULAN/gervasio-8b-portuguese-ptpt-decoder to start chatting

Pi new

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Run Hermes

hermes

Docker Model Runner
How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Docker Model Runner:
```
docker model run hf.co/PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16
```

Lemonade

How to use PORTULAN/gervasio-8b-portuguese-ptpt-decoder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull PORTULAN/gervasio-8b-portuguese-ptpt-decoder:F16

Run and chat with the model

lemonade run user.gervasio-8b-portuguese-ptpt-decoder-F16

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This is the model card for Gervásio 8B PTPT decoder.
This model is integrated in the Evaristo.ai chatbot, where its generative capabilities can be experimented with on the fly through a GUI.
You may be interested also in some of the other models in the Albertina (encoders) and Serafim (sentence encoder) families.

Gervásio 8B PTPT

Gervásio 8B PTPT is an open decoder for the Portuguese language.

It is a decoder of the LLaMA family, based on the neural architecture Transformer and developed over the LLaMA 3.1 8B Instruct model. Its further improvement through additional training was done over language resources that include data sets of Portuguese prepared for this purpose, that include extraGLUE-Instruct , as well as other data sets whose release is being prepared (MMLU PT, Natural Instructions PT, Wikipedia subset, Provérbios PT).

Gervásio 8B PTPT is openly distributed for free under an open license, including thus for research and commercial purposes, and given its size, can be run on consumer-grade hardware.

Gervásio 8B PTPT is developed by NLX-Natural Language and Speech Group, at the University of Lisbon, Faculty of Sciences, Department of Informatics, Portugal.

For the record, its full name is Gervásio Produz Textos em Português, to which corresponds the natural acronym GPT PT, and which is known more shortly as Gervásio PT* or, even more briefly, just as Gervásio, among its acquaintances.

Gervásio 8B PTPT is developed by a team from the University of Lisbon, Portugal.

Model Description

The model has 8 billion parameters, over 32 layers, with a hidden size of 4096, an intermediate size of 14336, and 32 attention heads. It uses a RoPE tokenizer with a vocabulary of size 128256.

Training Data

Gervásio 8B PTPT was trained on various datasets, either native to European Portuguese or translated into European Portuguese. For the latter, we selected only those datasets where the outcome of their translation into European Portuguese could preserve, in the target language, the linguistic properties at stake.

The training data comprises:

extraGLUE-Instruct
MMLU PT (multiple choice question answering).
A subset of Natural Instructions (mostly multiple choice question answering tasks).
A manually curated subset of Wikipedia.
A manually curated list of proverbs.

Training Details

We applied supervised fine-tuning with a causal language modeling training objective following a zero-out technique during the fine-tuning process. Specifically, while the entire prompt and chat template received attention during fine-tuning, only the response tokens were subjected to back-propagation.

To accelerate training, the Fully Sharded Data Parallel (FSDP) paradigm was used over 10 L40S GPUs.

Performance

For testing, we use translations of the standard benchmarks GPQA Diamond, MMLU and MMLU Pro, as well as the CoPA, MRPC and RTE datasets in extraGLUE.

Model	GPQA Diamond PT	MMLU PT	MMLU Pro PT	CoPA	MRPC	RTE	Average
Gervásio 8B PTPT	34.85	62.15	36.79	87.00	77.45	77.62	62.64
LLaMA 3.1 8B Instruct	32.32	61.49	36.10	83.00	75.25	79.42	61.26

How to use

You can use this model directly with a pipeline for causal language modeling:

>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio-8b-portuguese-ptpt-decoder')
>>> generator("A comida portuguesa é", max_new_tokens=10)

Chatbot

This model is integrated in the chatbot Evaristo.ai, where its generative capabilities can be experimented with on the fly through a GUI.

Please cite

@misc{gervasio,
      title={Advancing Generative AI for Portuguese with
             Open Decoder Gervásio PT-*}, 
      author={Rodrigo Santos, João Silva, Luís Gomes,
              João Rodrigues, António Branco},
      year={2024},
      eprint={2402.18766},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Please use the above canonical reference when using or citing this model.

Acknowledgments

The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação I.P. under the grant C625734525-00462629, of Plano de Recuperação e Resiliência, call RE-C05-i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização; research project "Hey, Hal, curb your hallucination! / Enhancing AI chatbots with enhanced RAG solutions", funded by FCT-Fundação para a Ciência e a Tecnologia under the grant 2024.07592.IACDC; project "CLARIN – Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem", funded by programme Lisboa2030 under the grant LISBOA2030-FEDER-01316900PORTULAN.