---
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
  - mcp-in-action-track-creative
  - mcp-in-action-track-consumer
  - Google
  - Gemini
  - Anthropic
  - OpenAI
  - HuggingFace
  - ElevenLabs
---


# 🧠 Agentic Codenames Arena

![Meme](assets/meme.png)

**Watch, or join, LLMs battling it out in Codenames.**

**New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started.

---

## ✅ Hackathon Requirements:

`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA)

`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)

`My HuggingFace Username`: lucadipalma1998

---

## 🧩 What This App Does

**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.  
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:

* **1 Boss**: Provides the clue and clue number for each turn.
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.


The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

![LangGraph Architecture](graph.png)

You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.

---

## 🤖 How It Works

### **LLM Teams**

Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
Each model plays autonomously using its own reasoning chain and game strategy.

### **Two Gameplay Modalities**

#### **1️⃣ Observation Mode — Watch AIs Battle**

Sit back and spectate.
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.

You'll see:

* Model-to-model conversations
* Reasoning traces
* Turn-by-turn decisions
* How each team coordinates across multiple rounds

Perfect for AI benchmarking, research, or just entertainment.

#### **2️⃣ Human Boss Mode — Enter the Fight**

Become the Boss for either team and give your own clue + number.
Your AI teammates will interpret your hint and take their guesses.

---

## 🧠 Why It’s Interesting

* **Compare LLM reasoning styles:**
  Watch how different models interpret associations, analogies, and subtle semantic cues.

* **Analyze team dynamics:**
  Some models coordinate beautifully. Others… not so much.
  Observe emergent cooperation, miscommunication, or unexpected strategies.

* **Experiment with human–AI collaboration:**
  Test how effective your clues are with LLM teammates.
  Try pushing the limits with creative, cryptic, or minimalist hints.

---

## 🕹️ Main Features

* **Build teams by selecting providers** or choose `random` to generate a mixed-model team.
* **Switch between AI vs AI** and **Human vs AI** modes
* **Detailed per-turn logs** for all model decisions
* **Transparent reasoning chains**
* **Interactive UI** for watching matches play out
* **Match history & analytics dashboard**

---

## 📊 Stats & Analytics

All games played in the Arena are stored in a database.
The Stats section of the app includes:

* **Model win/loss rates** across all recorded matches
* **Performance comparisons** between model families (OpenAI vs Google vs …)
* **Historical match logs** for replay & analysis
* **Leaderboards** highlighting the best-performing models

This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

---
<a id="how-to-play"></a>
## ❓ How to Play


### 📝 Summary

Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.

### 💡 Let's see an example

What Bosses see (above) VS what other players see (below)

<img src="assets/example.png" alt="Example board" width="400">
<img src="assets/no-color-board.png" alt="Example board" width="400">

### 👥 Team Roles

Each team has four members with distinct responsibilities:

* **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
* **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
* **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations.

---

### 🎮 How a Turn Works

#### 1️⃣ Boss Gives a Clue

The Red Boss (seeing the board) might say:

> **"Atmosphere: 2"**

This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of:

* **AIR** (part of the atmosphere)
* **SPACE** (beyond the atmosphere)

*⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.*

---

#### 2️⃣ Team Discussion

The Captain and Players discuss without seeing the colors:

* **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.”
* **Player 2:** “SPACE could connect because it's outside the atmosphere.”

---

#### 3️⃣ Captain Makes Final Selection

The Captain decides which words to touch, in order:

1. AIR ✅ (Red — Correct!)
2. SPACE ✅ (Red — Correct!)

The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).

---

### ⚠️ Mistakes to Avoid

* Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**!
* Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team.
* Guessing **SATURN** (beige — neutral) simply ends the turn.

---

### 🏆 Winning the Game

The game ends when:

* ✅ **A team finds all their colored words** → That team wins!
* ❌ **A team touches the killer word (STAFF)** → That team loses immediately!

---

### 💡 Strategy Tips

#### For the Boss:

* Try to link multiple words with creative clues
* Avoid clues that may lead to the killer or opponent’s words
* Consider associations your team might make

#### For Captain & Players:

* Discuss all possible interpretations
* Consider risky words
* Don’t be afraid to stop early to avoid the killer word
* The Captain has final say but should consider all suggestions

---

## 🤝 Sponsors

Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.