β™ŸοΈ KiyEngine V3 (Mamba-MoE)

"Where Linear Recurrence meets Sparse Intuition."

KiyEngine V3 is a high-performance chess evaluation model utilizing a hybrid Mamba State-Space Model (SSM) and Sparse Mixture-of-Experts (MoE) architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.


πŸš€ Highlights

  • Architecture: Mamba-SSM core for linear-time sequence modeling
  • MoE Strategy: 32 total experts (8 per layer) with Top-2 Gated Routing
  • Training Achievement: Final Converged Loss: 5.46
  • Target Performance: Designed to bridge the gap between neural intuition and traditional brute-force search

🧠 Model Architecture

Unlike traditional Transformers, KiyEngine V3 uses Mamba blocks to handle long-range game dependencies efficiently, coupled with a Sparse MoE layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

Hyperparameters

Parameter Value Description
d_model 384 Hidden dimension size
n_layers 4 Number of Mamba-MoE blocks
n_experts 8 Experts per layer (Total: 32)
top_k 2 Experts activated per token
d_state 16 SSM state dimension
d_conv 4 Convolution kernel size
expansion_factor 2 MLP expansion ratio
vocab_size 768 Input representation (Squares Γ— Pieces)

πŸ’» Usage

You can load and test the "brain" of KiyEngine V3 directly via the transformers library:

from transformers import AutoConfig, AutoModel
import torch

# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# Set to evaluation mode
model.eval()

print("βœ… KiyEngine V3 ready for inference.")

# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))

with torch.no_grad():
    # Gọi model
    output = model(dummy_input)

    # βœ… CΓ‘ch lαΊ₯y dα»― liệu chuαΊ©n tα»« KiyEngineOutput
    print(f"πŸŽ‰ Success!")
    print(f"1. Policy Logits (Dα»± Δ‘oΓ‘n nΖ°α»›c Δ‘i): {output.policy_logits.shape}") 
    # Kα»³ vọng: torch.Size([1, 768]) -> Dα»± Δ‘oΓ‘n xΓ‘c suαΊ₯t cho 768 nΖ°α»›c Δ‘i cΓ³ thể
    
    print(f"2. Value (ĐÑnh giÑ thế cờ):      {output.value.shape}")
    # Kα»³ vọng: torch.Size([1, 1])   -> Điểm sα»‘ tα»« -1 (Thua) Δ‘αΊΏn 1 (ThαΊ―ng)
    
    print(f"3. Last Hidden State (TΖ° duy):   {output.last_hidden_state.shape}")
    # Kỳ vọng: torch.Size([1, 64, 384])

πŸ“ˆ Training Progress

The model was trained on 1.5M+ high-quality Lichess games. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

  • Initial Loss: 7.78
  • Final Loss: 5.46 (Epoch 10)
  • Optimizer: AdamW with OneCycleLR
  • Training Time: ~5 hours on Tesla P100.

Loss Curve

(Consider adding a training curve image here)


πŸ“‚ Repository Structure

KiyEngine-V3-Mamba-MoE/
β”œβ”€β”€ model.safetensors              # Optimized weights (272MB)
β”œβ”€β”€ config.json                    # Model configuration
β”œβ”€β”€ configuration_kiyengine.py     # Custom config class
β”œβ”€β”€ modeling_kiyengine.py          # Core PyTorch implementation
└── README.md                      # This file

🎯 Performance

Metric Value
Final Training Loss 5.46
Model Size 272 MB
Parameters 68.06M (calculated from architecture)
Inference Speed TBD
Target ELO TBD

πŸ› οΈ Roadmap

  • Train V3 Mamba-MoE weights
  • Push to Hugging Face Hub
  • Implement native Rust inference via candle-core
  • Integrate with UCI protocol for GUI play (Arena, CuteChess)
  • Benchmark against Stockfish and Leela Chess Zero
  • Add ONNX export for deployment
  • Create interactive demo on Hugging Face Spaces

πŸ”¬ Technical Details

Why Mamba?

Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. Mamba's linear-time recurrence allows the model to process entire games efficiently while maintaining long-range dependencies.

Why MoE?

Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The Mixture-of-Experts architecture allows the model to:

  • Specialize experts for different game phases
  • Route positions to the most relevant expert
  • Maintain parameter efficiency while increasing model capacity

πŸ“Š Dataset

  • Source: Lichess Database
  • Games: 1.5M+ high-quality games

🀝 Citation

If you use KiyEngine V3 in your research or projects, please cite:

@misc{kiyengine-v3-2026,
  author = {Kiy-K},
  title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}

πŸ“ License

This model is released under the MIT License. See the LICENSE file for details.


πŸ‘€ Author

Kiy-K
"Building the next generation of neural chess engines."


πŸ™ Acknowledgments

  • Mamba: Based on the Mamba architecture by Gu & Dao
  • Dataset: Lichess Open Database
  • Inspiration: Stockfish, Leela Chess Zero, and the broader chess AI community

⚠️ Limitations

  • Model is currently a neural network component and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
  • Performance may vary across different game phases
  • Requires further validation against established benchmarks

Star this repo if you find it useful! ⭐

Made with β™ŸοΈ and πŸ€– by Kiy-K

Downloads last month
72
Video Preview
loading

Paper for Kiy-K/KiyEngine-V3