♟️ KiyEngine V3 (Mamba-MoE)

"Where Linear Recurrence meets Sparse Intuition."

KiyEngine V3 is a high-performance chess evaluation model utilizing a hybrid Mamba State-Space Model (SSM) and Sparse Mixture-of-Experts (MoE) architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.

🚀 Highlights

Architecture: Mamba-SSM core for linear-time sequence modeling
MoE Strategy: 32 total experts (8 per layer) with Top-2 Gated Routing
Training Achievement: Final Converged Loss: 5.46
Target Performance: Designed to bridge the gap between neural intuition and traditional brute-force search

🧠 Model Architecture

Unlike traditional Transformers, KiyEngine V3 uses Mamba blocks to handle long-range game dependencies efficiently, coupled with a Sparse MoE layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

Hyperparameters

Parameter	Value	Description
`d_model`	384	Hidden dimension size
`n_layers`	4	Number of Mamba-MoE blocks
`n_experts`	8	Experts per layer (Total: 32)
`top_k`	2	Experts activated per token
`d_state`	16	SSM state dimension
`d_conv`	4	Convolution kernel size
`expansion_factor`	2	MLP expansion ratio
`vocab_size`	768	Input representation (Squares × Pieces)

💻 Usage

You can load and test the "brain" of KiyEngine V3 directly via the transformers library:

from transformers import AutoConfig, AutoModel
import torch

# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# Set to evaluation mode
model.eval()

print("✅ KiyEngine V3 ready for inference.")

# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))

with torch.no_grad():
    # Gọi model
    output = model(dummy_input)

    # ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
    print(f"🎉 Success!")
    print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}") 
    # Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể
    
    print(f"2. Value (Đánh giá thế cờ):      {output.value.shape}")
    # Kỳ vọng: torch.Size([1, 1])   -> Điểm số từ -1 (Thua) đến 1 (Thắng)
    
    print(f"3. Last Hidden State (Tư duy):   {output.last_hidden_state.shape}")
    # Kỳ vọng: torch.Size([1, 64, 384])

📈 Training Progress

The model was trained on 1.5M+ high-quality Lichess games. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

Initial Loss: 7.78
Final Loss: 5.46 (Epoch 10)
Optimizer: AdamW with OneCycleLR
Training Time: ~5 hours on Tesla P100.

Loss Curve

(Consider adding a training curve image here)

📂 Repository Structure

KiyEngine-V3-Mamba-MoE/
├── model.safetensors              # Optimized weights (272MB)
├── config.json                    # Model configuration
├── configuration_kiyengine.py     # Custom config class
├── modeling_kiyengine.py          # Core PyTorch implementation
└── README.md                      # This file

🎯 Performance

Metric	Value
Final Training Loss	5.46
Model Size	272 MB
Parameters	68.06M (calculated from architecture)
Inference Speed	TBD
Target ELO	TBD

🛠️ Roadmap

Train V3 Mamba-MoE weights
Push to Hugging Face Hub
Implement native Rust inference via candle-core
Integrate with UCI protocol for GUI play (Arena, CuteChess)
Benchmark against Stockfish and Leela Chess Zero
Add ONNX export for deployment
Create interactive demo on Hugging Face Spaces

🔬 Technical Details

Why Mamba?

Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. Mamba's linear-time recurrence allows the model to process entire games efficiently while maintaining long-range dependencies.

Why MoE?

Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The Mixture-of-Experts architecture allows the model to:

Specialize experts for different game phases
Route positions to the most relevant expert
Maintain parameter efficiency while increasing model capacity

📊 Dataset

Source: Lichess Database
Games: 1.5M+ high-quality games

🤝 Citation

If you use KiyEngine V3 in your research or projects, please cite:

@misc{kiyengine-v3-2026,
  author = {Kiy-K},
  title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}

📝 License

This model is released under the MIT License. See the LICENSE file for details.

👤 Author

Kiy-K
"Building the next generation of neural chess engines."

🤗 Hugging Face: @Kiy-K
📧 Contact: [[email protected]]

🙏 Acknowledgments

Mamba: Based on the Mamba architecture by Gu & Dao
Dataset: Lichess Open Database
Inspiration: Stockfish, Leela Chess Zero, and the broader chess AI community

⚠️ Limitations

Model is currently a neural network component and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
Performance may vary across different game phases
Requires further validation against established benchmarks

Star this repo if you find it useful! ⭐

Made with ♟️ and 🤖 by Kiy-K

Downloads last month: 72

Video Preview

Reinforcement Learning

Paper for Kiy-K/KiyEngine-V3

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 149