File size: 7,933 Bytes
194fb2e 2c253b7 194fb2e 2c253b7 194fb2e 0a26b83 70fba39 0a26b83 194fb2e 2d2d677 1ed6475 2d2d677 1ed6475 2d2d677 1ed6475 fd41dc1 2d2d677 301e15a 2d2d677 7be7dad 2d2d677 7be7dad 661b663 7be7dad 661b663 7be7dad 1ed6475 7be7dad 1ed6475 7be7dad 1ed6475 2d2d677 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
- mcp-in-action-track-creative
- mcp-in-action-track-consumer
- Google
- Gemini
- Anthropic
- OpenAI
- HuggingFace
- ElevenLabs
---
# 🧠 Agentic Codenames Arena

**Watch, or join, LLMs battling it out in Codenames.**
**New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started.
---
## ✅ Hackathon Requirements:
`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA)
`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)
`My HuggingFace Username`: lucadipalma1998
---
## 🧩 What This App Does
**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:
* **1 Boss**: Provides the clue and clue number for each turn.
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.
---
## 🤖 How It Works
### **LLM Teams**
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
Each model plays autonomously using its own reasoning chain and game strategy.
### **Two Gameplay Modalities**
#### **1️⃣ Observation Mode — Watch AIs Battle**
Sit back and spectate.
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.
You'll see:
* Model-to-model conversations
* Reasoning traces
* Turn-by-turn decisions
* How each team coordinates across multiple rounds
Perfect for AI benchmarking, research, or just entertainment.
#### **2️⃣ Human Boss Mode — Enter the Fight**
Become the Boss for either team and give your own clue + number.
Your AI teammates will interpret your hint and take their guesses.
---
## 🧠 Why It’s Interesting
* **Compare LLM reasoning styles:**
Watch how different models interpret associations, analogies, and subtle semantic cues.
* **Analyze team dynamics:**
Some models coordinate beautifully. Others… not so much.
Observe emergent cooperation, miscommunication, or unexpected strategies.
* **Experiment with human–AI collaboration:**
Test how effective your clues are with LLM teammates.
Try pushing the limits with creative, cryptic, or minimalist hints.
---
## 🕹️ Main Features
* **Build teams by selecting providers** or choose `random` to generate a mixed-model team.
* **Switch between AI vs AI** and **Human vs AI** modes
* **Detailed per-turn logs** for all model decisions
* **Transparent reasoning chains**
* **Interactive UI** for watching matches play out
* **Match history & analytics dashboard**
---
## 📊 Stats & Analytics
All games played in the Arena are stored in a database.
The Stats section of the app includes:
* **Model win/loss rates** across all recorded matches
* **Performance comparisons** between model families (OpenAI vs Google vs …)
* **Historical match logs** for replay & analysis
* **Leaderboards** highlighting the best-performing models
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.
---
<a id="how-to-play"></a>
## ❓ How to Play
### 📝 Summary
Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.
### 💡 Let's see an example
What Bosses see (above) VS what other players see (below)
<img src="assets/example.png" alt="Example board" width="400">
<img src="assets/no-color-board.png" alt="Example board" width="400">
### 👥 Team Roles
Each team has four members with distinct responsibilities:
* **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
* **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
* **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations.
---
### 🎮 How a Turn Works
#### 1️⃣ Boss Gives a Clue
The Red Boss (seeing the board) might say:
> **"Atmosphere: 2"**
This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of:
* **AIR** (part of the atmosphere)
* **SPACE** (beyond the atmosphere)
*⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.*
---
#### 2️⃣ Team Discussion
The Captain and Players discuss without seeing the colors:
* **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.”
* **Player 2:** “SPACE could connect because it's outside the atmosphere.”
---
#### 3️⃣ Captain Makes Final Selection
The Captain decides which words to touch, in order:
1. AIR ✅ (Red — Correct!)
2. SPACE ✅ (Red — Correct!)
The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).
---
### ⚠️ Mistakes to Avoid
* Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**!
* Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team.
* Guessing **SATURN** (beige — neutral) simply ends the turn.
---
### 🏆 Winning the Game
The game ends when:
* ✅ **A team finds all their colored words** → That team wins!
* ❌ **A team touches the killer word (STAFF)** → That team loses immediately!
---
### 💡 Strategy Tips
#### For the Boss:
* Try to link multiple words with creative clues
* Avoid clues that may lead to the killer or opponent’s words
* Consider associations your team might make
#### For Captain & Players:
* Discuss all possible interpretations
* Consider risky words
* Don’t be afraid to stop early to avoid the killer word
* The Captain has final say but should consider all suggestions
---
## 🤝 Sponsors
Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.
|