File size: 7,933 Bytes
194fb2e
 
 
 
 
2c253b7
194fb2e
2c253b7
194fb2e
 
0a26b83
70fba39
 
0a26b83
 
 
 
 
 
 
194fb2e
 
2d2d677
 
 
 
 
 
 
1ed6475
 
 
2d2d677
1ed6475
2d2d677
1ed6475
 
 
 
 
fd41dc1
2d2d677
 
 
 
301e15a
2d2d677
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7be7dad
2d2d677
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7be7dad
661b663
7be7dad
 
 
 
 
 
 
 
 
661b663
7be7dad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ed6475
7be7dad
1ed6475
7be7dad
1ed6475
2d2d677
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
  - mcp-in-action-track-creative
  - mcp-in-action-track-consumer
  - Google
  - Gemini
  - Anthropic
  - OpenAI
  - HuggingFace
  - ElevenLabs
---


# 🧠 Agentic Codenames Arena

![Meme](assets/meme.png)

**Watch, or join, LLMs battling it out in Codenames.**

**New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started.

---

## ✅ Hackathon Requirements:

`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA)

`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)

`My HuggingFace Username`: lucadipalma1998

---

## 🧩 What This App Does

**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.  
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:

* **1 Boss**: Provides the clue and clue number for each turn.
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.


The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

![LangGraph Architecture](graph.png)

You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.

---

## 🤖 How It Works

### **LLM Teams**

Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
Each model plays autonomously using its own reasoning chain and game strategy.

### **Two Gameplay Modalities**

#### **1️⃣ Observation Mode — Watch AIs Battle**

Sit back and spectate.
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.

You'll see:

* Model-to-model conversations
* Reasoning traces
* Turn-by-turn decisions
* How each team coordinates across multiple rounds

Perfect for AI benchmarking, research, or just entertainment.

#### **2️⃣ Human Boss Mode — Enter the Fight**

Become the Boss for either team and give your own clue + number.
Your AI teammates will interpret your hint and take their guesses.

---

## 🧠 Why It’s Interesting

* **Compare LLM reasoning styles:**
  Watch how different models interpret associations, analogies, and subtle semantic cues.

* **Analyze team dynamics:**
  Some models coordinate beautifully. Others… not so much.
  Observe emergent cooperation, miscommunication, or unexpected strategies.

* **Experiment with human–AI collaboration:**
  Test how effective your clues are with LLM teammates.
  Try pushing the limits with creative, cryptic, or minimalist hints.

---

## 🕹️ Main Features

* **Build teams by selecting providers** or choose `random` to generate a mixed-model team.
* **Switch between AI vs AI** and **Human vs AI** modes
* **Detailed per-turn logs** for all model decisions
* **Transparent reasoning chains**
* **Interactive UI** for watching matches play out
* **Match history & analytics dashboard**

---

## 📊 Stats & Analytics

All games played in the Arena are stored in a database.
The Stats section of the app includes:

* **Model win/loss rates** across all recorded matches
* **Performance comparisons** between model families (OpenAI vs Google vs …)
* **Historical match logs** for replay & analysis
* **Leaderboards** highlighting the best-performing models

This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

---
<a id="how-to-play"></a>
## ❓ How to Play


### 📝 Summary

Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.

### 💡 Let's see an example

What Bosses see (above) VS what other players see (below)

<img src="assets/example.png" alt="Example board" width="400">
<img src="assets/no-color-board.png" alt="Example board" width="400">

### 👥 Team Roles

Each team has four members with distinct responsibilities:

* **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
* **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
* **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations.

---

### 🎮 How a Turn Works

#### 1️⃣ Boss Gives a Clue

The Red Boss (seeing the board) might say:

> **"Atmosphere: 2"**

This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of:

* **AIR** (part of the atmosphere)
* **SPACE** (beyond the atmosphere)

*⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.*

---

#### 2️⃣ Team Discussion

The Captain and Players discuss without seeing the colors:

* **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.”
* **Player 2:** “SPACE could connect because it's outside the atmosphere.”

---

#### 3️⃣ Captain Makes Final Selection

The Captain decides which words to touch, in order:

1. AIR ✅ (Red — Correct!)
2. SPACE ✅ (Red — Correct!)

The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).

---

### ⚠️ Mistakes to Avoid

* Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**!
* Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team.
* Guessing **SATURN** (beige — neutral) simply ends the turn.

---

### 🏆 Winning the Game

The game ends when:

***A team finds all their colored words** → That team wins!
***A team touches the killer word (STAFF)** → That team loses immediately!

---

### 💡 Strategy Tips

#### For the Boss:

* Try to link multiple words with creative clues
* Avoid clues that may lead to the killer or opponent’s words
* Consider associations your team might make

#### For Captain & Players:

* Discuss all possible interpretations
* Consider risky words
* Don’t be afraid to stop early to avoid the killer word
* The Captain has final say but should consider all suggestions

---

## 🤝 Sponsors

Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.