| --- |
| library_name: transformers |
| license: apache-2.0 |
| tags: |
| - math |
| - reasoning |
| - text-generation |
| - ads |
| - distillation |
| - code |
| language: |
| - en |
| pipeline_tag: text-generation |
| base_model: [] |
| --- |
| |
| NoesisLab Privacy Policy for OpenRouter Integration |
|
|
| 1. Data Processing: NoesisLab processes user prompts solely for the purpose of generating model inferences. We do not use any data transmitted through OpenRouter to train or fine-tune our models without explicit user consent. |
| 2. Data Retention: We do not store personally identifiable information. Logs are kept for a maximum of 7 days only for debugging and ensuring service stability, after which they are permanently deleted. |
| 3. Security: We implement industry-standard encryption to protect data in transit between OpenRouter and our inference endpoints. |
| 4. Third Parties: We never sell or share user data with third-party organizations. |
|
|
| # Kai-30B-Instruct |
|
|
| A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our **ADS (Adaptive Dual-Search Distillation)** technique. The largest model in the Kai family. |
|
|
| ## Model Details |
|
|
| | | | |
| |---|---| |
| | **Model** | Kai-30B-Instruct | |
| | **Architecture** | Qwen2ForCausalLM | |
| | **Parameters** | ~30B | |
| | **Hidden size** | 5120 | |
| | **Intermediate size** | 27648 | |
| | **Layers** | 64 | |
| | **Attention heads** | 40 (8 KV heads, GQA) | |
| | **Context length** | 32768 | |
| | **Precision** | bfloat16 | |
| | **Vocab size** | 152064 | |
| | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) | |
|
|
| ## Benchmark Results (5-shot, acc_norm) |
| |
| | Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B | |
| |-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
| | **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 | |
| | **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 | |
| | **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 | |
| | **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 | |
| |
|  |
| |
| ## What is ADS? |
| |
| **Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture. |
| |
| |
| ## Usage |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "NoesisLab/Kai-30B-Instruct", |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct") |
| |
| messages = [{"role": "user", "content": "What is 25 * 4?"}] |
| input_ids = tokenizer.apply_chat_template( |
| messages, add_generation_prompt=True, return_tensors="pt" |
| ).to(model.device) |
|
|
| output = model.generate( |
| input_ids, |
| max_new_tokens=512, |
| temperature=0.6, |
| top_p=0.8, |
| do_sample=True, |
| ) |
| print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)) |
| ``` |
| |
| ## Citation |
|
|
| ```bibtex |
| @misc{noesislab2026kai30b, |
| title={Kai-30B-Instruct}, |
| author={NoesisLab}, |
| year={2026}, |
| url={https://huggingface.co/NoesisLab/Kai-30B-Instruct} |
| } |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|