wuyuverse commited on
Commit
472b60a
·
verified ·
1 Parent(s): a76782c

Update readme.md

Browse files
Files changed (1) hide show
  1. README.md +242 -3
README.md CHANGED
@@ -1,3 +1,242 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - code
7
+ - industrial-code
8
+ - reasoning
9
+ - thinking
10
+ - verilog
11
+ - cuda
12
+ - triton
13
+ - chip-design
14
+ - cad
15
+ ---
16
+
17
+ # InCoder-32B-Thinking: Reasoning Code Model for Industrial Scenarios
18
+
19
+ <div align="center">
20
+
21
+ [![HuggingFace](https://img.shields.io/badge/🤗-Model%20Hub-yellow)](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Thinking)
22
+ [![GitHub](https://img.shields.io/badge/GitHub-Industrial--Coder-blue)](https://github.com/CSJianYang/Industrial-Coder)
23
+ [![arXiv](https://img.shields.io/badge/arXiv-2603.16790-red)](https://huggingface.co/papers/2603.16790)
24
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)
25
+
26
+ </div>
27
+
28
+ ## Model Summary
29
+
30
+ **InCoder-32B-Thinking** is the reasoning variant of the InCoder family. It extends [InCoder-32B](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder) with chain-of-thought reasoning via `<think>...</think>` tags, enabling step-by-step problem decomposition before generating code. This is particularly effective for complex industrial tasks that require multi-step reasoning — debugging RTL modules, optimizing GPU kernels, or diagnosing embedded firmware issues.
31
+
32
+ For the instruction-tuned variant (without thinking), see [IndustrialCoder](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder). For the pre-trained base model, see [IndustrialCoder-Base](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Base).
33
+
34
+ ---
35
+
36
+ ## Key Results
37
+
38
+ ### General Code Benchmarks
39
+
40
+ | Benchmark | InCoder-32B | InCoder-32B-Thinking |
41
+ |---|:---:|:---:|
42
+ | HumanEval+ | 89.6 | **91.5** |
43
+ | MBPP+ | 78.3 | **80.1** |
44
+ | BigCodeBench (Full) | 49.8 | **51.2** |
45
+ | LiveCodeBench (Pass@1) | 49.14 | **52.3** |
46
+
47
+ ### Industrial Code Benchmarks
48
+
49
+ | Benchmark | Domain | InCoder-32B | InCoder-32B-Thinking |
50
+ |---|---|:---:|:---:|
51
+ | VeriScope Score | Chip Design | 80.7 | **82.3** |
52
+ | CAD-Coder Compile (%) | 3D Modeling | 82.0 | **84.0** |
53
+ | KernelBench L1 (%) | GPU Optimization | 22.2 | **24.0** |
54
+
55
+ > The thinking variant shows consistent improvements across both general and industrial benchmarks, with the largest gains on tasks requiring multi-step reasoning.
56
+
57
+ ---
58
+
59
+ ## Model Architecture
60
+
61
+ Same architecture as InCoder-32B, with thinking-aware post-training:
62
+
63
+ | Hyperparameter | Value |
64
+ |---|---|
65
+ | Parameters | ~32B |
66
+ | Layers | 64 |
67
+ | Hidden Size | 5,120 |
68
+ | Attention Heads | 40 (8 KV heads, GQA) |
69
+ | Max Context Length | 131,072 (128K) |
70
+ | Positional Encoding | RoPE (θ = 500,000) |
71
+ | Precision | BFloat16 |
72
+
73
+ ---
74
+
75
+ ## How Thinking Mode Works
76
+
77
+ InCoder-32B-Thinking generates a reasoning trace inside `<think>...</think>` tags before producing the final answer. This allows the model to:
78
+
79
+ 1. **Decompose** complex problems into sub-tasks
80
+ 2. **Reason** about constraints, edge cases, and hardware semantics
81
+ 3. **Plan** the solution structure before writing code
82
+
83
+ Example output:
84
+ ```
85
+ <think>
86
+ The user wants a UART transmitter module. Let me think through the design:
87
+ 1. Need a state machine: IDLE -> START_BIT -> DATA_BITS -> STOP_BIT
88
+ 2. 8N1 means: 8 data bits, no parity, 1 stop bit
89
+ 3. Need a baud rate counter derived from the clock frequency
90
+ 4. Shift register to serialize the 8-bit data LSB first
91
+ </think>
92
+
93
+ module uart_tx (
94
+ input wire clk,
95
+ ...
96
+ ```
97
+
98
+ You can **disable** thinking mode to get direct answers (behaves like the instruct variant):
99
+ ```python
100
+ text = tokenizer.apply_chat_template(
101
+ messages, tokenize=False, add_generation_prompt=True,
102
+ enable_thinking=False
103
+ )
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Usage
109
+
110
+ ### Installation
111
+
112
+ ```bash
113
+ pip install transformers accelerate
114
+ ```
115
+
116
+ ### Thinking Mode (default)
117
+
118
+ ```python
119
+ from transformers import AutoTokenizer, AutoModelForCausalLM
120
+ import torch
121
+
122
+ model_id = "Multilingual-Multimodal-NLP/IndustrialCoder-Thinking"
123
+
124
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
125
+ model = AutoModelForCausalLM.from_pretrained(
126
+ model_id,
127
+ torch_dtype=torch.bfloat16,
128
+ device_map="auto",
129
+ trust_remote_code=True,
130
+ )
131
+
132
+ messages = [
133
+ {"role": "user", "content": "Optimize this CUDA kernel for better memory coalescing:\n__global__ void add(float *a, float *b, float *c, int N) {\n int i = threadIdx.x;\n if (i < N) c[i] = a[i] + b[i];\n}"}
134
+ ]
135
+
136
+ # Thinking mode (default) — model reasons before answering
137
+ text = tokenizer.apply_chat_template(
138
+ messages, tokenize=False, add_generation_prompt=True
139
+ )
140
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
141
+
142
+ with torch.no_grad():
143
+ out = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.85, top_k=20)
144
+
145
+ output = tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False)
146
+
147
+ # Parse thinking and response
148
+ if "</think>" in output:
149
+ thinking = output.split("</think>")[0].replace("<think>\n", "").strip()
150
+ response = output.split("</think>")[1].strip()
151
+ print(f"Thinking:\n{thinking}\n\nResponse:\n{response}")
152
+ else:
153
+ print(output)
154
+ ```
155
+
156
+ ### Non-Thinking Mode
157
+
158
+ ```python
159
+ # Disable thinking — direct answer without reasoning trace
160
+ text = tokenizer.apply_chat_template(
161
+ messages, tokenize=False, add_generation_prompt=True,
162
+ enable_thinking=False
163
+ )
164
+ ```
165
+
166
+ ### With Tool Calls
167
+
168
+ ```python
169
+ tools = [{
170
+ "type": "function",
171
+ "function": {
172
+ "name": "run_verilog_sim",
173
+ "description": "Run Verilog simulation with Icarus Verilog",
174
+ "parameters": {
175
+ "type": "object",
176
+ "properties": {
177
+ "code": {"type": "string", "description": "Verilog source code"},
178
+ "testbench": {"type": "string", "description": "Testbench code"}
179
+ }
180
+ }
181
+ }
182
+ }]
183
+
184
+ text = tokenizer.apply_chat_template(
185
+ messages, tokenize=False, add_generation_prompt=True, tools=tools
186
+ )
187
+ ```
188
+
189
+ ### Deployment with vLLM
190
+
191
+ ```bash
192
+ vllm serve Multilingual-Multimodal-NLP/IndustrialCoder-Thinking \
193
+ --tensor-parallel-size 4 --max-model-len 32768 --trust-remote-code
194
+ ```
195
+
196
+ ### Recommended Sampling Parameters
197
+
198
+ | Use case | temperature | top_p | top_k | max_new_tokens |
199
+ |---|:---:|:---:|:---:|:---:|
200
+ | Thinking (default) | 0.6 | 0.85 | 20 | 8192 |
201
+ | Non-thinking / precise | 0.2 | 0.95 | — | 4096 |
202
+
203
+ ---
204
+
205
+ ## Model Family
206
+
207
+ | Model | Type | HuggingFace |
208
+ |---|---|---|
209
+ | InCoder-32B-Base | Pre-trained | [🤗 IndustrialCoder-Base](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Base) |
210
+ | InCoder-32B | Instruct | [🤗 IndustrialCoder](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder) |
211
+ | **InCoder-32B-Thinking** | **Reasoning** | [🤗 IndustrialCoder-Thinking](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-Thinking) |
212
+ | InCoder-32B-FP8 | FP8 Quantized | [🤗 IndustrialCoder-32B-FP8](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-FP8) |
213
+ | InCoder-32B-AWQ-INT4 | AWQ INT4 | [🤗 IndustrialCoder-32B-AWQ-INT4](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4) |
214
+ | InCoder-32B-GPTQ-INT4 | GPTQ INT4 | [🤗 IndustrialCoder-32B-GPTQ-INT4](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-GPTQ-INT4) |
215
+
216
+ ---
217
+
218
+ ## Limitations & Disclaimers
219
+
220
+ - The thinking trace may occasionally contain reasoning errors or hallucinated constraints — always verify the final code output.
221
+ - For simple tasks, thinking mode adds latency; use `enable_thinking=False` for straightforward generation.
222
+ - Based on failure analysis, the model may struggle with:
223
+ - **API Knowledge**: Linker errors from undefined HAL/CMSIS functions in embedded C.
224
+ - **Functional Semantics**: Producing compilable but functionally incorrect RTL under complex logic scenarios.
225
+ - **Optimization**: Correct but sub-optimal GPU kernel performance.
226
+
227
+ Always review and test generated code in a sandboxed environment. Industrial code (RTL, embedded firmware, GPU kernels) requires expert review before deployment.
228
+
229
+ ---
230
+
231
+ ## Citation
232
+
233
+ ```bibtex
234
+ @article{yang2026incoder,
235
+ title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
236
+ author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn
237
+ and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin
238
+ and others},
239
+ journal={arXiv preprint arXiv:2603.16790},
240
+ year={2026}
241
+ }
242
+ ```