βοΈ KiyEngine V3 (Mamba-MoE)
"Where Linear Recurrence meets Sparse Intuition."
KiyEngine V3 is a high-performance chess evaluation model utilizing a hybrid Mamba State-Space Model (SSM) and Sparse Mixture-of-Experts (MoE) architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.
π Highlights
- Architecture: Mamba-SSM core for linear-time sequence modeling
- MoE Strategy: 32 total experts (8 per layer) with Top-2 Gated Routing
- Training Achievement: Final Converged Loss: 5.46
- Target Performance: Designed to bridge the gap between neural intuition and traditional brute-force search
π§ Model Architecture
Unlike traditional Transformers, KiyEngine V3 uses Mamba blocks to handle long-range game dependencies efficiently, coupled with a Sparse MoE layer to specialize in different phases of the game (Opening, Middlegame, Endgame).
Hyperparameters
| Parameter | Value | Description |
|---|---|---|
d_model |
384 | Hidden dimension size |
n_layers |
4 | Number of Mamba-MoE blocks |
n_experts |
8 | Experts per layer (Total: 32) |
top_k |
2 | Experts activated per token |
d_state |
16 | SSM state dimension |
d_conv |
4 | Convolution kernel size |
expansion_factor |
2 | MLP expansion ratio |
vocab_size |
768 | Input representation (Squares Γ Pieces) |
π» Usage
You can load and test the "brain" of KiyEngine V3 directly via the transformers library:
from transformers import AutoConfig, AutoModel
import torch
# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
# Set to evaluation mode
model.eval()
print("β
KiyEngine V3 ready for inference.")
# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))
with torch.no_grad():
# Gα»i model
output = model(dummy_input)
# β
CΓ‘ch lαΊ₯y dα»― liα»u chuαΊ©n tα»« KiyEngineOutput
print(f"π Success!")
print(f"1. Policy Logits (Dα»± ΔoΓ‘n nΖ°α»c Δi): {output.policy_logits.shape}")
# Kα»³ vα»ng: torch.Size([1, 768]) -> Dα»± ΔoΓ‘n xΓ‘c suαΊ₯t cho 768 nΖ°α»c Δi cΓ³ thα»
print(f"2. Value (ΔΓ‘nh giΓ‘ thαΊΏ cα»): {output.value.shape}")
# Kα»³ vα»ng: torch.Size([1, 1]) -> Δiα»m sα» tα»« -1 (Thua) ΔαΊΏn 1 (ThαΊ―ng)
print(f"3. Last Hidden State (TΖ° duy): {output.last_hidden_state.shape}")
# Kα»³ vα»ng: torch.Size([1, 64, 384])
π Training Progress
The model was trained on 1.5M+ high-quality Lichess games. The loss curve demonstrated exceptional convergence due to the MoE routing stability.
- Initial Loss: 7.78
- Final Loss: 5.46 (Epoch 10)
- Optimizer: AdamW with OneCycleLR
- Training Time: ~5 hours on Tesla P100.
Loss Curve
(Consider adding a training curve image here)
π Repository Structure
KiyEngine-V3-Mamba-MoE/
βββ model.safetensors # Optimized weights (272MB)
βββ config.json # Model configuration
βββ configuration_kiyengine.py # Custom config class
βββ modeling_kiyengine.py # Core PyTorch implementation
βββ README.md # This file
π― Performance
| Metric | Value |
|---|---|
| Final Training Loss | 5.46 |
| Model Size | 272 MB |
| Parameters | 68.06M (calculated from architecture) |
| Inference Speed | TBD |
| Target ELO | TBD |
π οΈ Roadmap
- Train V3 Mamba-MoE weights
- Push to Hugging Face Hub
- Implement native Rust inference via
candle-core - Integrate with UCI protocol for GUI play (Arena, CuteChess)
- Benchmark against Stockfish and Leela Chess Zero
- Add ONNX export for deployment
- Create interactive demo on Hugging Face Spaces
π¬ Technical Details
Why Mamba?
Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. Mamba's linear-time recurrence allows the model to process entire games efficiently while maintaining long-range dependencies.
Why MoE?
Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The Mixture-of-Experts architecture allows the model to:
- Specialize experts for different game phases
- Route positions to the most relevant expert
- Maintain parameter efficiency while increasing model capacity
π Dataset
- Source: Lichess Database
- Games: 1.5M+ high-quality games
π€ Citation
If you use KiyEngine V3 in your research or projects, please cite:
@misc{kiyengine-v3-2026,
author = {Kiy-K},
title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}
π License
This model is released under the MIT License. See the LICENSE file for details.
π€ Author
Kiy-K
"Building the next generation of neural chess engines."
- π€ Hugging Face: @Kiy-K
- π§ Contact: [[email protected]]
π Acknowledgments
- Mamba: Based on the Mamba architecture by Gu & Dao
- Dataset: Lichess Open Database
- Inspiration: Stockfish, Leela Chess Zero, and the broader chess AI community
β οΈ Limitations
- Model is currently a neural network component and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
- Performance may vary across different game phases
- Requires further validation against established benchmarks
Star this repo if you find it useful! β
Made with βοΈ and π€ by Kiy-K
- Downloads last month
- 72