Update readme.md

Browse files

Files changed (1) hide show

README.md +242 -3

README.md CHANGED Viewed

@@ -1,3 +1,242 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- code
+- industrial-code
+- reasoning
+- thinking
+- verilog
+- cuda
+- triton
+- chip-design
+- cad
+---
+# InCoder-32B-Thinking: Reasoning Code Model for Industrial Scenarios
+<div align="center">
+[![HuggingFace](https://img.shields.io/badge/🤗-Model%20Hub-yellow)](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Thinking)
+[![GitHub](https://img.shields.io/badge/GitHub-Industrial--Coder-blue)](https://github.com/CSJianYang/Industrial-Coder)
+[![arXiv](https://img.shields.io/badge/arXiv-2603.16790-red)](https://huggingface.co/papers/2603.16790)
+[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)
+</div>
+## Model Summary
+**InCoder-32B-Thinking** is the reasoning variant of the InCoder family. It extends [InCoder-32B](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder) with chain-of-thought reasoning via `<think>...</think>` tags, enabling step-by-step problem decomposition before generating code. This is particularly effective for complex industrial tasks that require multi-step reasoning — debugging RTL modules, optimizing GPU kernels, or diagnosing embedded firmware issues.
+For the instruction-tuned variant (without thinking), see [IndustrialCoder](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder). For the pre-trained base model, see [IndustrialCoder-Base](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Base).
+---
+## Key Results
+### General Code Benchmarks
+| Benchmark | InCoder-32B | InCoder-32B-Thinking |
+|---|:---:|:---:|
+| HumanEval+ | 89.6 | **91.5** |
+| MBPP+ | 78.3 | **80.1** |
+| BigCodeBench (Full) | 49.8 | **51.2** |
+| LiveCodeBench (Pass@1) | 49.14 | **52.3** |
+### Industrial Code Benchmarks
+| Benchmark | Domain | InCoder-32B | InCoder-32B-Thinking |
+|---|---|:---:|:---:|
+| VeriScope Score | Chip Design | 80.7 | **82.3** |
+| CAD-Coder Compile (%) | 3D Modeling | 82.0 | **84.0** |
+| KernelBench L1 (%) | GPU Optimization | 22.2 | **24.0** |
+> The thinking variant shows consistent improvements across both general and industrial benchmarks, with the largest gains on tasks requiring multi-step reasoning.
+---
+## Model Architecture
+Same architecture as InCoder-32B, with thinking-aware post-training:
+| Hyperparameter | Value |
+|---|---|
+| Parameters | ~32B |
+| Layers | 64 |
+| Hidden Size | 5,120 |
+| Attention Heads | 40 (8 KV heads, GQA) |
+| Max Context Length | 131,072 (128K) |
+| Positional Encoding | RoPE (θ = 500,000) |
+| Precision | BFloat16 |
+---
+## How Thinking Mode Works
+InCoder-32B-Thinking generates a reasoning trace inside `<think>...</think>` tags before producing the final answer. This allows the model to:
+1. **Decompose** complex problems into sub-tasks
+2. **Reason** about constraints, edge cases, and hardware semantics
+3. **Plan** the solution structure before writing code
+Example output:
+```
+<think>
+The user wants a UART transmitter module. Let me think through the design:
+1. Need a state machine: IDLE -> START_BIT -> DATA_BITS -> STOP_BIT
+2. 8N1 means: 8 data bits, no parity, 1 stop bit
+3. Need a baud rate counter derived from the clock frequency
+4. Shift register to serialize the 8-bit data LSB first
+</think>
+module uart_tx (
+    input wire clk,
+    ...
+```
+You can **disable** thinking mode to get direct answers (behaves like the instruct variant):
+```python
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True,
+    enable_thinking=False
+)
+```
+---
+## Usage
+### Installation
+```bash
+pip install transformers accelerate
+```
+### Thinking Mode (default)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "Multilingual-Multimodal-NLP/IndustrialCoder-Thinking"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [
+    {"role": "user", "content": "Optimize this CUDA kernel for better memory coalescing:\n__global__ void add(float *a, float *b, float *c, int N) {\n    int i = threadIdx.x;\n    if (i < N) c[i] = a[i] + b[i];\n}"}
+]
+# Thinking mode (default) — model reasons before answering
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+with torch.no_grad():
+    out = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.85, top_k=20)
+output = tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False)
+# Parse thinking and response
+if "</think>" in output:
+    thinking = output.split("</think>")[0].replace("<think>\n", "").strip()
+    response = output.split("</think>")[1].strip()
+    print(f"Thinking:\n{thinking}\n\nResponse:\n{response}")
+else:
+    print(output)
+```
+### Non-Thinking Mode
+```python
+# Disable thinking — direct answer without reasoning trace
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True,
+    enable_thinking=False
+)
+```
+### With Tool Calls
+```python
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "run_verilog_sim",
+        "description": "Run Verilog simulation with Icarus Verilog",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "code": {"type": "string", "description": "Verilog source code"},
+                "testbench": {"type": "string", "description": "Testbench code"}
+            }
+        }
+    }
+}]
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True, tools=tools
+)
+```
+### Deployment with vLLM
+```bash
+vllm serve Multilingual-Multimodal-NLP/IndustrialCoder-Thinking \
+    --tensor-parallel-size 4 --max-model-len 32768 --trust-remote-code
+```
+### Recommended Sampling Parameters
+| Use case | temperature | top_p | top_k | max_new_tokens |
+|---|:---:|:---:|:---:|:---:|
+| Thinking (default) | 0.6 | 0.85 | 20 | 8192 |
+| Non-thinking / precise | 0.2 | 0.95 | — | 4096 |
+---
+## Model Family
+| Model | Type | HuggingFace |
+|---|---|---|
+| InCoder-32B-Base | Pre-trained | [🤗 IndustrialCoder-Base](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Base) |
+| InCoder-32B | Instruct | [🤗 IndustrialCoder](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder) |
+| **InCoder-32B-Thinking** | **Reasoning** | [🤗 IndustrialCoder-Thinking](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Thinking) |
+| InCoder-32B-FP8 | FP8 Quantized | [🤗 IndustrialCoder-32B-FP8](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-FP8) |
+| InCoder-32B-AWQ-INT4 | AWQ INT4 | [🤗 IndustrialCoder-32B-AWQ-INT4](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4) |
+| InCoder-32B-GPTQ-INT4 | GPTQ INT4 | [🤗 IndustrialCoder-32B-GPTQ-INT4](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-GPTQ-INT4) |
+---
+## Limitations & Disclaimers
+- The thinking trace may occasionally contain reasoning errors or hallucinated constraints — always verify the final code output.
+- For simple tasks, thinking mode adds latency; use `enable_thinking=False` for straightforward generation.
+- Based on failure analysis, the model may struggle with:
+  - **API Knowledge**: Linker errors from undefined HAL/CMSIS functions in embedded C.
+  - **Functional Semantics**: Producing compilable but functionally incorrect RTL under complex logic scenarios.
+  - **Optimization**: Correct but sub-optimal GPU kernel performance.
+Always review and test generated code in a sandboxed environment. Industrial code (RTL, embedded firmware, GPU kernels) requires expert review before deployment.
+---
+## Citation
+```bibtex
+@article{yang2026incoder,
+  title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
+  author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn
+          and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin
+          and others},
+  journal={arXiv preprint arXiv:2603.16790},
+  year={2026}
+}
+```