Karsten Kuhnke PRO

mindchain

https://www.linkedin.com/in/jankarstenkuhnke/

AI & ML interests

Mechanistic Interpretability, Sparse Autoencoders, JumpReLU, Reward Modeling, RLHF, AI Alignment, Function Calling, Gemma, Nemotron

Recent Activity

upvoted a collection 18 minutes ago

V-JEPA 2

liked a model 20 minutes ago

facebook/sam-3d-objects

liked a model 29 minutes ago

qualcomm/RF-DETR

View all activity

Organizations

Posts 2

Post

Neural Traffic Control: Orchestrating Multi-Path Reasoning 🚥
The future of AI isn't just about "better" models—it’s about high-precision orchestration. We are moving from linear processing to Parallel MTP-Reasoning, where we manage neural traffic across stabilized, transparent, and recursive highways.

1️⃣ The Backbone: Stabilized High-Dimensional Routing (arXiv:2512.24880) Using DeepSeek’s mHC (Manifold-Constrained Hyper-Connections), we solve the instability of deep MoE architectures. By projecting weight updates onto the Birkhoff Polytope, we ensure that our "Simpsons-style" expert lanes maintain mathematical identity. This is the hardware-level stability needed to run multiple reasoning paths without collapse.

2️⃣ The Vision: Gemma Scope 2 & Feature Steering You can't steer what you can't see. Gemma Scope 2 provides the "X-ray" for our highways. By using Sparse Autoencoders (SAEs), our Meta-Controller identifies the active features in each expert lane. We don't just route data; we route intent by monitoring feature-drift in real-time.

3️⃣ The Logic: Recursive Open Meta-Agents (arXiv:2512.24601) We integrate the ROMA (Recursive Open Meta-Agent) framework. Instead of a flat response, the model operates in a recursive loop, refining its internal state before any output occurs. This is the "brain" of our [Meta-Controller GitHub Repo], enabling the model to simulate and discard weak logic internally.

4️⃣ The Simulation: Parallel MTP-Reasoning This is where it comes together: Multi-Token Prediction (MTP) meets Parallel Simulation. Our Python-driven controller runs three parallel Gemma 3 instances.

The Process: 3 paths generated simultaneously.

The Filter: A 500-token lookahead window.

The Decision: The Meta-Controller uses SAE-data from Gemma Scope to select the path with the highest logical fidelity.

The Result: A self-correcting, transparent, and multi-threaded reasoning engine. We aren't just scaling parameters; we are scaling architectural precision. 🧠

Post

2784

The Architecture of 2026: Beyond the Token Trap 🚀

We are witnessing a tectonic shift in Transformer architecture. It’s no longer just about "predicting the next token"—it’s about executing latent plans on a high-speed data highway.

What happens when we combine DeepSeek’s stability with Google’s strategic intelligence?

1️⃣ The Infrastructure: DeepSeek’s mHC Moving from a single-lane residual stream to a multi-lane highway. Using the Birkhoff Polytope, mHC ensures mathematical stability (Identity Mapping) while routing specialized data through dedicated lanes.

2️⃣ The Intelligence: Google’s Meta-Controller An internal AI unit that lives inside the Transformer. It escapes the "Token Trap" by extracting data to create a latent plan, steering the model via Temporal Abstraction.

The Synergy: In a Topological Transformer, the Meta-Controller finally has the "dedicated lanes" it needs to steer complex reasoning without causing gradient explosions.

We aren't just making models bigger; we are making them architecturally smarter. 🧠

#MachineLearning #DeepSeek #GoogleAI #Transformer #AIArchitecture

View all Posts