README.md · Writer/Palmyra-local-1

Palmyra-local-1_7B / README.md

kiranr

Upload folder using huggingface_hub

a0e844b verified 11 months ago

preview code

raw

history blame contribute delete

5.9 kB

	---
	base_model: Writer/Palmyra-Local-1.7B
	tags:
	- instruct
	- finetune
	- DPO
	- distillation
	- small
	- local
	- On Device
	- Transformers.js
	- Enterprise LLM
	- Enterprise
	- Enterprise ready
	model_type: palmyra
	model-index:
	- name: Palmyra-Med-70B
	results: []
	license: other
	license_name: writer-open-model-license
	license_link: https://writer.com/legal/open-model-license/
	extra_gated_prompt: >-
	By clicking "Agree", you agree to the [License
	Agreement](https://writer.com/legal/open-model-license/)
	and acknowledge Writer's [Privacy
	Policy](https://writer.com/legal/acceptable-use/).
	extra_gated_fields:
	Name: text
	Email: text
	Organization or Affiliation: text
	Receive email updates and promotions on Writer products, services, and research?:
	type: select
	options:
	- 'Yes'
	- 'No'
	I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox
	language:
	- en
	---

	Palmyra-local-1.7B-Instruct

	Introduction
	Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.

	Compared to earlier versions, Palmyra-local brings the following enhancements:

	- Stronger domain reasoning in code and math, powered by targeted expert tuning and curated domain datasets.
	- Improved instruction-following, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
	- Robust prompt handling, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
	- Extended context support, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
	- Multilingual capabilities, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.

	This repository includes the instruction-tuned Palmyra-local 1.7B model, with the following architecture details:

	- Type: Causal Language Model
	- Training Stages: Pretraining + Instruction Tuning
	- Architecture: Transformer with RoPE positional encoding
	- Total Parameters: 1.7B
	- Number of Layers: 28
	- Attention Heads: GQA


	## Training Details
	- Architecture: Palmyra
	- Training Method: From scratch
	- Attention Mechanism: GQA
	- Training Data: [~1T packed dataset]


	## Benchmark Results

	\| Benchmark \| Palmyra-local-1.7B \| Qwen2.5-1.5B-Instruct \| GPT-4 mini \| Llama-3.2-1B-Instruct \| Llama-3.2-3B-Instruct \|
	\|-----------\|--------------------\|----------------------\|------------\|----------------------\|----------------------\|
	\| HumanEval \| 74.10 \| 61.60 \| N/A \| N/A \| N/A \|
	\| MBPP \| 66.86 \| 63.20 \| N/A \| N/A \| N/A \|
	\| GSM8K \| 81.0 \| 73.20 \| 88.6 \| N/A \| 75.6 \|
	\| MATH \| 60.94 \| 55.20 \| 64.0 \| N/A \| 46.7 \|
	\| MMLU \| 59.82 \| 58.37 \| 67.3 \| 32.2 \| 58.0 \|
	\| MMLU Pro \| 34.10 \| 32.40 \| 52.8 \| N/A \| N/A \|
	\| Average \| 62.8 \| 57.33 \| N/A \| N/A \| N/A \|

	Notes:

	- HumanEval and MBPP: Benchmark data for these tasks were not available for GPT-4 mini, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct based on the model created sources.


	## Usage

	### Install dependencies

	requirements.txt

	```txt
	transformers==4.51.0
	torch==2.6.0
	tokenizers==0.21.1
	accelerate==1.6.0
	```

	```bash
	pip install -r requirements.txt
	```

	---

	### Inference

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "Writer/Palmyra-local-1_7B"
	auth_token = "xxx"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)

	# Load model with quantization for lower memory usage (optional)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True,
	token=auth_token,
	)

	# Prepare input
	messages = [
	{"role": "user", "content": "Write a blog post about strangelets"},
	]

	# Check if apply_chat_template is available, fallback if not
	if hasattr(tokenizer, "apply_chat_template"):
	input_ids = tokenizer.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	)
	else:
	input_text = messages[0]["content"]
	input_ids = tokenizer(input_text, return_tensors="pt").input_ids

	# Ensure input_ids is on the same device as the model
	input_ids = input_ids.to(model.device)

	# Generation config
	gen_conf = {
	"max_new_tokens": 256,
	"eos_token_id": tokenizer.eos_token_id,
	"temperature": 0.7,
	"top_p": 0.9,
	}

	# Generate output
	with torch.inference_mode():
	output_id = model.generate(input_ids, **gen_conf)

	# Decode output
	output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)

	print(output_text)
	```


	### Citation and Related Information

	To cite this model:

	```
	@misc{Palmyra-Local-1.7B,
	author = {Writer Engineering team},
	title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
	howpublished = {\url{https://dev.writer.com}},
	year = 2025,
	month = March
	}
	```

	Contact
	Hello@writer.com