Instructions to use SebastianSchramm/Cerebras-GPT-111M-instruction with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SebastianSchramm/Cerebras-GPT-111M-instruction with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SebastianSchramm/Cerebras-GPT-111M-instruction")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SebastianSchramm/Cerebras-GPT-111M-instruction") model = AutoModelForCausalLM.from_pretrained("SebastianSchramm/Cerebras-GPT-111M-instruction") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SebastianSchramm/Cerebras-GPT-111M-instruction with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SebastianSchramm/Cerebras-GPT-111M-instruction" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SebastianSchramm/Cerebras-GPT-111M-instruction", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SebastianSchramm/Cerebras-GPT-111M-instruction
- SGLang
How to use SebastianSchramm/Cerebras-GPT-111M-instruction with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SebastianSchramm/Cerebras-GPT-111M-instruction" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SebastianSchramm/Cerebras-GPT-111M-instruction", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SebastianSchramm/Cerebras-GPT-111M-instruction" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SebastianSchramm/Cerebras-GPT-111M-instruction", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SebastianSchramm/Cerebras-GPT-111M-instruction with Docker Model Runner:
docker model run hf.co/SebastianSchramm/Cerebras-GPT-111M-instruction
Instruction-tuned Cerebras GPT 111M
The smallest of cerebras GPT models with only 111M parameters instruction fine-tuned.
Model Description
Instruction fine-tuned cerebras-GPT-111M
Evaluation
The model has been evaluated with Huggingface's Open LLM leaderboard. Have a look at the leaderboard for more details: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard The performance of the instruction fine-tuned model does improve compared to the cerebras base model by about 5.7% (average score):
| Model | Average | ARC (25-shot) | HellaSwag (10-shot) | MMLU (5-shot) | TruthfulQA (0-shot) |
|---|---|---|---|---|---|
| SebastianSchramm/Cerebras-GPT-111M-instruction | 31.6 | 24.3 | 26.2 | 26.5 | 49.5 |
| cerebras/Cerebras-GPT-111M | 29.9 | 20 | 26.7 | 26.7 | 46.3 |
Training data
The model was fine-tuned with the following data: alpaca_gpt4_data (data generated by GPT-4 using Alpaca prompts for fine-tuning LLMs) and alpaca_data_cleaned.
Prompt template
Fine-tuning was performed with the promp template from stanford alpaca:
PROMPT_DICT = {
"prompt_input": (
"Below is an instruction that describes a task, paired with an input that provides further context. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
),
"prompt_no_input": (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response:"
),
}
Usage
It is recommended to format input according to the prompt template mentioned above during inference for best results.
- Downloads last month
- 22
Model tree for SebastianSchramm/Cerebras-GPT-111M-instruction
Base model
cerebras/Cerebras-GPT-111M