--- title: Agentic Codenames Arena emoji: 📊 colorFrom: blue colorTo: blue python_version: 3.12.6 sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false short_description: Time for the LLMs to have some fun with Codenames! tags: - mcp-in-action-track-creative - mcp-in-action-track-consumer - Google - Gemini - Anthropic - OpenAI - HuggingFace - ElevenLabs --- # 🧠 Agentic Codenames Arena ![Meme](assets/meme.png) **Watch, or join, LLMs battling it out in Codenames.** **New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started. --- ## ✅ Hackathon Requirements: `Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA) `Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08) `My HuggingFace Username`: lucadipalma1998 --- ## 🧩 What This App Does **Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*. Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of: * **1 Boss**: Provides the clue and clue number for each turn. * **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”. * **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions. The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions. Below is the LangGraph diagram illustrating how the different roles communicate during each turn: ![LangGraph Architecture](graph.png) You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues. --- ## 🤖 How It Works ### **LLM Teams** Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy. ### **Two Gameplay Modalities** #### **1️⃣ Observation Mode — Watch AIs Battle** Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses. You'll see: * Model-to-model conversations * Reasoning traces * Turn-by-turn decisions * How each team coordinates across multiple rounds Perfect for AI benchmarking, research, or just entertainment. #### **2️⃣ Human Boss Mode — Enter the Fight** Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses. --- ## 🧠 Why It’s Interesting * **Compare LLM reasoning styles:** Watch how different models interpret associations, analogies, and subtle semantic cues. * **Analyze team dynamics:** Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies. * **Experiment with human–AI collaboration:** Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints. --- ## 🕹️ Main Features * **Build teams by selecting providers** or choose `random` to generate a mixed-model team. * **Switch between AI vs AI** and **Human vs AI** modes * **Detailed per-turn logs** for all model decisions * **Transparent reasoning chains** * **Interactive UI** for watching matches play out * **Match history & analytics dashboard** --- ## 📊 Stats & Analytics All games played in the Arena are stored in a database. The Stats section of the app includes: * **Model win/loss rates** across all recorded matches * **Performance comparisons** between model families (OpenAI vs Google vs …) * **Historical match logs** for replay & analysis * **Leaderboards** highlighting the best-performing models This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure. --- ## ❓ How to Play ### 📝 Summary Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words. ### 💡 Let's see an example What Bosses see (above) VS what other players see (below) Example board Example board ### 👥 Team Roles Each team has four members with distinct responsibilities: * **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team. * **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections. * **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations. --- ### 🎮 How a Turn Works #### 1️⃣ Boss Gives a Clue The Red Boss (seeing the board) might say: > **"Atmosphere: 2"** This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of: * **AIR** (part of the atmosphere) * **SPACE** (beyond the atmosphere) *⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.* --- #### 2️⃣ Team Discussion The Captain and Players discuss without seeing the colors: * **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.” * **Player 2:** “SPACE could connect because it's outside the atmosphere.” --- #### 3️⃣ Captain Makes Final Selection The Captain decides which words to touch, in order: 1. AIR ✅ (Red — Correct!) 2. SPACE ✅ (Red — Correct!) The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable). --- ### ⚠️ Mistakes to Avoid * Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**! * Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team. * Guessing **SATURN** (beige — neutral) simply ends the turn. --- ### 🏆 Winning the Game The game ends when: * ✅ **A team finds all their colored words** → That team wins! * ❌ **A team touches the killer word (STAFF)** → That team loses immediately! --- ### 💡 Strategy Tips #### For the Boss: * Try to link multiple words with creative clues * Avoid clues that may lead to the killer or opponent’s words * Consider associations your team might make #### For Captain & Players: * Discuss all possible interpretations * Consider risky words * Don’t be afraid to stop early to avoid the killer word * The Captain has final say but should consider all suggestions --- ## 🤝 Sponsors Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.