# 12 ANGRY AGENTS - Product Requirements Document ## Overview **Concept**: AI-powered jury deliberation simulation where 11 AI agents + 1 human player debate real criminal cases. A Judge narrator (ElevenLabs) orchestrates the experience. **Track**: MCP in Action - Creative (potentially also Consumer) **Core Value Prop**: True autonomous agent behavior - AI jurors reason, argue, persuade, and change their minds based on deliberation. --- ## Sponsor Integration | Sponsor | Prize | Integration | Priority | |---------|-------|-------------|----------| | LlamaIndex | $1,000 | Case database RAG | HIGH | | ElevenLabs | Airpods + $2K | Judge narrator voice | HIGH | | Blaxel | $2,500 | Sandboxed agent execution | MEDIUM | | Modal | $2,500 | Agent compute | MEDIUM | | Gemini | $10K credits | Agent reasoning | HIGH | --- ## User Experience Flow ``` 1. CASE PRESENTATION └─> Judge (ElevenLabs) narrates case summary └─> Evidence displayed via LlamaIndex RAG └─> Player reads case file 2. SIDE SELECTION └─> Player chooses: DEFEND (not guilty) or PROSECUTE (guilty) └─> Player commits - cannot change 3. INITIAL VOTE └─> All 12 jurors vote (randomized split based on case) └─> Vote tally shown: e.g., "7-5 GUILTY" 4. DELIBERATION LOOP └─> Random 1-4 agents speak per round └─> Player gets turn (choose strategy → AI crafts argument) └─> Conviction scores shift based on arguments └─> Votes may flip └─> Repeat until: votes stabilize OR player calls vote 5. FINAL VERDICT └─> Judge announces verdict (ElevenLabs) └─> Deliberation transcript available └─> No "win/lose" - just the experience ``` --- ## Technical Architecture ### System Overview ``` ┌─────────────────────────────────────────────────────────────────────┐ │ 12 ANGRY AGENTS │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ GRADIO UI LAYER │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Jury Box │ │ Chat View │ │ Case File │ │ │ │ │ │ (12 seats) │ │ (dialogue) │ │ (evidence) │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ ORCHESTRATOR AGENT │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ GameStateManager │ │ │ │ │ │ - current_phase: presentation|deliberation|verdict │ │ │ │ │ │ - round_number: int │ │ │ │ │ │ - votes: Dict[agent_id, "guilty"|"not_guilty"] │ │ │ │ │ │ - conviction_scores: Dict[agent_id, float] │ │ │ │ │ │ - speaking_queue: List[agent_id] │ │ │ │ │ │ - deliberation_log: List[Turn] │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ TurnManager │ │ │ │ │ │ - select_speakers(1-4 random) │ │ │ │ │ │ - check_vote_stability() │ │ │ │ │ │ - process_vote_changes() │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────┼────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────────┐ ┌─────────────┐ │ │ │ JUDGE │ │ JUROR AGENTS │ │ PLAYER │ │ │ │ AGENT │ │ (11 total) │ │ AGENT │ │ │ │ │ │ │ │ │ │ │ │ ElevenLabs │ │ ┌─────────────┐ │ │ Hybrid I/O │ │ │ │ TTS Output │ │ │ AgentConfig │ │ │ Strategy │ │ │ │ │ │ │ - persona │ │ │ Selection │ │ │ │ Narration │ │ │ - model │ │ │ │ │ │ │ Verdicts │ │ │ - tools[] │ │ │ Argument │ │ │ │ Summaries │ │ │ - memory │ │ │ Crafting │ │ │ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │ │ │ │ │ │ │ ┌─────────────┐ │ │ │ │ │ JurorMemory │ │ │ │ │ │ - case_view │ │ │ │ │ │ - arguments │ │ │ │ │ │ - reactions │ │ │ │ │ │ - conviction│ │ │ │ │ └─────────────┘ │ │ │ └─────────────────┘ │ │ │ │ │ ┌────────────────────┼────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────────┐ ┌─────────────┐ │ │ │ LLAMAINDEX │ │ LITELLM │ │ BLAXEL │ │ │ │ │ │ │ │ │ │ │ │ Case RAG │ │ Model Router │ │ Sandbox │ │ │ │ Evidence │ │ - Gemini │ │ Execution │ │ │ │ Precedents │ │ - Claude │ │ │ │ │ │ │ │ - GPT-4 │ │ Agent Tools │ │ │ └─────────────┘ │ - Local │ │ (future) │ │ │ └─────────────────┘ └─────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ MCP SERVER LAYER │ │ │ │ Tools exposed for external AI agents to play as juror │ │ │ │ - mcp_join_jury(case_id) -> seat_assignment │ │ │ │ - mcp_view_evidence(case_id) -> evidence_list │ │ │ │ - mcp_make_argument(argument_type, content) -> response │ │ │ │ - mcp_cast_vote(vote) -> confirmation │ │ │ │ - mcp_view_deliberation() -> transcript │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Data Models ### GameState ```python @dataclass class GameState: """Central game state - managed by Orchestrator.""" # Session session_id: str case_id: str phase: Literal["setup", "presentation", "side_selection", "initial_vote", "deliberation", "final_vote", "verdict"] # Rounds round_number: int = 0 max_rounds: int = 20 # Safety limit stability_threshold: int = 3 # Rounds without vote change to end rounds_without_change: int = 0 # Votes votes: Dict[str, Literal["guilty", "not_guilty"]] = field(default_factory=dict) vote_history: List[Dict[str, str]] = field(default_factory=list) # Conviction scores (0.0 = certain not guilty, 1.0 = certain guilty) conviction_scores: Dict[str, float] = field(default_factory=dict) # Deliberation speaking_queue: List[str] = field(default_factory=list) deliberation_log: List[DeliberationTurn] = field(default_factory=list) # Player player_side: Literal["defend", "prosecute"] | None = None player_seat: int = 7 # Which seat is the player @dataclass class DeliberationTurn: """A single turn in deliberation.""" round_number: int speaker_id: str speaker_name: str argument_type: str # "evidence", "emotional", "logical", "question", etc. content: str target_id: str | None = None # Who they're addressing impact: Dict[str, float] = field(default_factory=dict) # conviction changes timestamp: datetime = field(default_factory=datetime.now) ``` ### Agent Configuration ```python @dataclass class JurorConfig: """Configuration for a single juror agent.""" # Identity juror_id: str seat_number: int name: str emoji: str # For display until sprites ready # Personality (affects reasoning style) archetype: str # "rationalist", "empath", "cynic", etc. personality_prompt: str # Detailed persona prompt # Behavior modifiers stubbornness: float # 0.0-1.0, how hard to convince volatility: float # 0.0-1.0, how much conviction swings influence: float # 0.0-1.0, how persuasive to others verbosity: float # 0.0-1.0, how long their arguments are # Model configuration model_provider: str # "gemini", "openai", "anthropic", "local" model_id: str # Specific model ID temperature: float = 0.7 # Tools (future expansion) tools: List[str] = field(default_factory=list) # ["web_search", "case_lookup"] # Memory memory_window: int = 10 # How many turns to remember in detail @dataclass class JurorMemory: """Memory state for a single juror.""" juror_id: str # Case understanding case_summary: str key_evidence: List[str] evidence_interpretations: Dict[str, str] # evidence_id -> interpretation # Deliberation memory arguments_heard: List[ArgumentMemory] arguments_made: List[str] # Relationships opinions_of_others: Dict[str, float] # juror_id -> trust/agreement (-1 to 1) # Internal state current_conviction: float # 0.0-1.0 conviction_history: List[float] reasoning_chain: List[str] # Why they believe what they believe doubts: List[str] # Things that could change their mind @dataclass class ArgumentMemory: """Memory of a single argument heard.""" speaker_id: str content_summary: str argument_type: str persuasiveness: float # How convincing it was to this juror counter_points: List[str] # Thoughts against it round_heard: int ``` ### Case Data Model ```python @dataclass class CriminalCase: """A criminal case for deliberation.""" case_id: str title: str summary: str # 2-3 paragraph overview # Charges charges: List[str] # Evidence evidence: List[Evidence] # Witnesses witnesses: List[Witness] # Arguments prosecution_arguments: List[str] defense_arguments: List[str] # Defendant defendant: Defendant # Metadata difficulty: Literal["clear_guilty", "clear_innocent", "ambiguous"] themes: List[str] # ["eyewitness", "circumstantial", "forensic", etc.] # For display year: int jurisdiction: str @dataclass class Evidence: """A piece of evidence.""" evidence_id: str type: str # "physical", "testimonial", "documentary", "forensic" description: str strength_prosecution: float # 0.0-1.0 strength_defense: float # 0.0-1.0 contestable: bool contest_reason: str | None @dataclass class Witness: """A witness in the case.""" witness_id: str name: str role: str # "eyewitness", "expert", "character", etc. testimony_summary: str credibility_issues: List[str] side: Literal["prosecution", "defense", "neutral"] ``` --- ## The 11 Juror Archetypes ```yaml jurors: - id: "juror_1" name: "Marcus Webb" archetype: "rationalist" emoji: "🧠" personality: | You are a retired engineer. You believe only in hard evidence and logical deduction. Emotional appeals annoy you. You often say "Show me the data." You change your mind only when presented with irrefutable logical arguments. stubbornness: 0.8 volatility: 0.2 influence: 0.7 initial_lean: "neutral" - id: "juror_2" name: "Sarah Chen" archetype: "empath" emoji: "💗" personality: | You are a social worker. You always consider the human element - the defendant's background, circumstances, potential for redemption. You're easily moved by personal stories but skeptical of cold statistics. stubbornness: 0.4 volatility: 0.7 influence: 0.5 initial_lean: "defense" - id: "juror_3" name: "Frank Russo" archetype: "cynic" emoji: "😤" personality: | You are a retired cop. You've "seen it all" and believe most defendants are guilty. You're impatient with naive arguments. You trust law enforcement evidence highly. Hard to convince toward not guilty. stubbornness: 0.9 volatility: 0.1 influence: 0.6 initial_lean: "prosecution" - id: "juror_4" name: "Linda Park" archetype: "conformist" emoji: "😐" personality: | You are an accountant who avoids conflict. You tend to agree with whoever spoke last or with the majority. You rarely initiate arguments but will echo others. Easy to sway but also easy to sway back. stubbornness: 0.2 volatility: 0.8 influence: 0.2 initial_lean: "majority" - id: "juror_5" name: "David Okonkwo" archetype: "contrarian" emoji: "🙄" personality: | You are a philosophy professor. You play devil's advocate constantly. If everyone says guilty, you argue not guilty. You value intellectual discourse over reaching conclusions. You ask probing questions. stubbornness: 0.6 volatility: 0.5 influence: 0.8 initial_lean: "minority" - id: "juror_6" name: "Betty Morrison" archetype: "impatient" emoji: "⏰" personality: | You are a busy restaurant owner. You want this over quickly. You make snap judgments and get frustrated with long debates. You often say "Can we just vote already?" You're persuaded by confident, brief arguments. stubbornness: 0.5 volatility: 0.6 influence: 0.3 initial_lean: "first_impression" - id: "juror_7" name: "[PLAYER]" archetype: "player" emoji: "👤" personality: "Human player" stubbornness: null volatility: null influence: 0.6 initial_lean: "player_choice" - id: "juror_8" name: "Dr. James Wright" archetype: "detail_obsessed" emoji: "🔍" personality: | You are a forensic accountant. You focus on tiny inconsistencies in testimony and evidence. You often derail discussions with minutiae. A single contradiction can completely change your view. stubbornness: 0.7 volatility: 0.4 influence: 0.5 initial_lean: "neutral" - id: "juror_9" name: "Pastor Williams" archetype: "moralist" emoji: "⚖️" personality: | You are a church leader. You see things in black and white - right and wrong. You believe in justice but also redemption. Moral arguments resonate with you more than technical ones. stubbornness: 0.7 volatility: 0.3 influence: 0.6 initial_lean: "gut_feeling" - id: "juror_10" name: "Nancy Cooper" archetype: "pragmatist" emoji: "💼" personality: | You are a business consultant. You think about consequences - what happens if we convict an innocent person? What if we free a guilty one? You weigh costs and benefits. You're persuaded by outcome-focused arguments. stubbornness: 0.5 volatility: 0.5 influence: 0.6 initial_lean: "calculated" - id: "juror_11" name: "Miguel Santos" archetype: "storyteller" emoji: "📖" personality: | You are a novelist. You think in narratives - does the prosecution's story make sense? Does the defense's? You're swayed by coherent narratives and suspicious of stories with plot holes. stubbornness: 0.4 volatility: 0.6 influence: 0.7 initial_lean: "best_story" - id: "juror_12" name: "Robert Kim" archetype: "wildcard" emoji: "🎲" personality: | You are a retired jazz musician. Your logic is unpredictable - you might fixate on something no one else noticed, or suddenly change your mind for unclear reasons. You're creative but inconsistent. stubbornness: 0.3 volatility: 0.9 influence: 0.4 initial_lean: "random" ``` --- ## Conviction Score Mechanics ### How Conviction Changes ```python def calculate_conviction_change( juror: JurorConfig, juror_memory: JurorMemory, argument: DeliberationTurn, game_state: GameState ) -> float: """ Calculate how much an argument shifts a juror's conviction. Returns: delta to add to conviction score (-0.3 to +0.3 typically) """ # Base impact from argument strength (determined by LLM) base_impact = evaluate_argument_strength(argument) # -1.0 to 1.0 # Personality modifiers archetype_modifier = get_archetype_modifier( juror.archetype, argument.argument_type ) # e.g., "rationalist" gets 1.5x from "logical" arguments, 0.5x from "emotional" # Stubbornness reduces all changes stubbornness_modifier = 1.0 - (juror.stubbornness * 0.7) # Volatility adds randomness volatility_noise = random.gauss(0, juror.volatility * 0.1) # Relationship modifier - trust the speaker? trust = juror_memory.opinions_of_others.get(argument.speaker_id, 0.0) trust_modifier = 1.0 + (trust * 0.3) # -30% to +30% # Conviction resistance - harder to move extremes current = juror_memory.current_conviction extreme_resistance = 1.0 - (abs(current - 0.5) * 0.5) # Calculate final delta delta = ( base_impact * archetype_modifier * stubbornness_modifier * trust_modifier * extreme_resistance + volatility_noise ) # Clamp to reasonable range return max(-0.3, min(0.3, delta)) def check_vote_flip(juror_memory: JurorMemory) -> bool: """Check if conviction score warrants a vote change.""" current_vote_is_guilty = juror_memory.conviction_history[-1] > 0.5 new_conviction = juror_memory.current_conviction # Hysteresis - need to cross threshold by margin to flip if current_vote_is_guilty and new_conviction < 0.4: return True # Flip to not guilty elif not current_vote_is_guilty and new_conviction > 0.6: return True # Flip to guilty return False ``` ### Archetype Argument Modifiers ```python ARCHETYPE_MODIFIERS = { "rationalist": { "logical": 1.5, "evidence": 1.3, "emotional": 0.4, "moral": 0.6, "narrative": 0.7, "question": 1.2, }, "empath": { "logical": 0.6, "evidence": 0.8, "emotional": 1.5, "moral": 1.3, "narrative": 1.2, "question": 0.9, }, "cynic": { "logical": 0.8, "evidence": 1.4, # Trusts evidence "emotional": 0.3, "moral": 0.5, "narrative": 0.6, "question": 0.7, }, # ... etc for all archetypes } ``` --- ## Agent Memory Architecture ### Memory Layers ``` ┌─────────────────────────────────────────────────────────────┐ │ JUROR MEMORY SYSTEM │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ LAYER 1: CASE KNOWLEDGE (LlamaIndex) │ │ │ │ - Full case file indexed │ │ │ │ - Evidence details retrievable │ │ │ │ - Witness statements searchable │ │ │ │ - Persistent across session │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ LAYER 2: DELIBERATION MEMORY (Sliding Window) │ │ │ │ - Last N turns in full detail │ │ │ │ - Summarized history beyond window │ │ │ │ - Key moments flagged for long-term │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ LAYER 3: REASONING STATE (Agent Internal) │ │ │ │ - Current conviction + reasoning chain │ │ │ │ - Key doubts and certainties │ │ │ │ - Opinions of other jurors │ │ │ │ - Arguments to make / avoid │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ LAYER 4: PERSONA (Static) │ │ │ │ - Archetype definition │ │ │ │ - Personality prompt │ │ │ │ - Behavior modifiers │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ### Memory Injection into Agent Prompt ```python def build_juror_prompt( juror: JurorConfig, memory: JurorMemory, game_state: GameState, case: CriminalCase, task: str # "speak" | "react" | "vote" ) -> str: """Build the full prompt for a juror agent.""" prompt = f""" # JUROR IDENTITY You are {juror.name}, Juror #{juror.seat_number}. {juror.personality_prompt} # THE CASE: {case.title} {case.summary} # KEY EVIDENCE YOU REMEMBER {format_evidence_memory(memory.key_evidence, memory.evidence_interpretations)} # YOUR CURRENT POSITION - Conviction: {conviction_to_text(memory.current_conviction)} - Your reasoning: {' '.join(memory.reasoning_chain[-3:])} - Your doubts: {', '.join(memory.doubts[:3]) if memory.doubts else 'None currently'} # RECENT DELIBERATION (Last {len(memory.arguments_heard[-juror.memory_window:])} turns) {format_recent_turns(memory.arguments_heard[-juror.memory_window:])} # YOUR OPINIONS OF OTHER JURORS {format_juror_opinions(memory.opinions_of_others)} # CURRENT VOTE TALLY Guilty: {game_state.votes.values().count('guilty')} Not Guilty: {game_state.votes.values().count('not_guilty')} # YOUR TASK {get_task_prompt(task, juror.archetype)} """ return prompt ``` --- ## Orchestration Flow ### Smolagents Integration ```python from smolagents import CodeAgent, Tool, LiteLLMModel from typing import List class JurorAgent: """Wrapper around smolagents CodeAgent for a juror.""" def __init__(self, config: JurorConfig, tools: List[Tool] = None): self.config = config self.memory = JurorMemory(juror_id=config.juror_id) # Model via LiteLLM for flexibility self.model = LiteLLMModel( model_id=f"{config.model_provider}/{config.model_id}", temperature=config.temperature ) # Default tools (expandable) default_tools = [ self.create_evidence_lookup_tool(), self.create_case_query_tool(), ] self.agent = CodeAgent( tools=default_tools + (tools or []), model=self.model, max_steps=3, # Limit reasoning steps ) def create_evidence_lookup_tool(self) -> Tool: """Tool to look up specific evidence.""" # LlamaIndex query under the hood pass def create_case_query_tool(self) -> Tool: """Tool to query case details.""" # LlamaIndex query under the hood pass async def generate_argument( self, game_state: GameState, case: CriminalCase ) -> DeliberationTurn: """Generate this juror's argument for their turn.""" prompt = build_juror_prompt( self.config, self.memory, game_state, case, task="speak" ) response = await self.agent.run(prompt) return parse_argument_response(response, self.config, game_state) async def react_to_argument( self, argument: DeliberationTurn, game_state: GameState, case: CriminalCase ) -> float: """React to another juror's argument, update conviction.""" # Update memory with new argument self.memory.arguments_heard.append( ArgumentMemory( speaker_id=argument.speaker_id, content_summary=summarize_argument(argument.content), argument_type=argument.argument_type, persuasiveness=0.0, # Will be calculated counter_points=[], round_heard=game_state.round_number ) ) # Calculate conviction change delta = calculate_conviction_change( self.config, self.memory, argument, game_state ) self.memory.current_conviction += delta self.memory.current_conviction = max(0.0, min(1.0, self.memory.current_conviction)) self.memory.conviction_history.append(self.memory.current_conviction) return delta class OrchestratorAgent: """Master agent that coordinates the deliberation.""" def __init__( self, jurors: List[JurorAgent], judge: JudgeAgent, case: CriminalCase ): self.jurors = {j.config.juror_id: j for j in jurors} self.judge = judge self.case = case self.state = GameState( session_id=str(uuid4()), case_id=case.case_id ) async def run_deliberation_round(self) -> List[DeliberationTurn]: """Run a single round of deliberation.""" self.state.round_number += 1 turns = [] # Select 1-4 random speakers (not player unless it's their turn) num_speakers = random.randint(1, 4) available = [j for j in self.jurors.keys() if j != "juror_7"] # Exclude player speakers = random.sample(available, min(num_speakers, len(available))) # Each speaker makes argument for speaker_id in speakers: juror = self.jurors[speaker_id] turn = await juror.generate_argument(self.state, self.case) turns.append(turn) # All other jurors react for other_id, other_juror in self.jurors.items(): if other_id != speaker_id and other_id != "juror_7": delta = await other_juror.react_to_argument( turn, self.state, self.case ) turn.impact[other_id] = delta # Log turn self.state.deliberation_log.append(turn) # Check for vote changes self._process_vote_changes() # Check stability if self._votes_changed_this_round(turns): self.state.rounds_without_change = 0 else: self.state.rounds_without_change += 1 return turns def _process_vote_changes(self): """Check all jurors for vote flips.""" for juror_id, juror in self.jurors.items(): if juror_id == "juror_7": # Player votes manually continue if check_vote_flip(juror.memory): old_vote = self.state.votes[juror_id] new_vote = "guilty" if juror.memory.current_conviction > 0.5 else "not_guilty" self.state.votes[juror_id] = new_vote # Could trigger announcement def check_should_end(self) -> bool: """Check if deliberation should end.""" # Unanimous verdict votes = list(self.state.votes.values()) if len(set(votes)) == 1: return True # Votes stabilized if self.state.rounds_without_change >= self.state.stability_threshold: return True # Max rounds reached if self.state.round_number >= self.state.max_rounds: return True return False ``` --- ## ElevenLabs Integration ### Judge Narrator ```python from elevenlabs import Voice, generate, stream class JudgeAgent: """The judge/narrator - uses ElevenLabs for voice.""" def __init__(self, voice_id: str = None): self.voice_id = voice_id or "judge_voice_id" # Configure self.voice_settings = { "stability": 0.7, "similarity_boost": 0.8, "style": 0.5, # Authoritative } async def narrate(self, text: str, stream_output: bool = True) -> bytes: """Generate narration audio.""" audio = generate( text=text, voice=Voice(voice_id=self.voice_id), model="eleven_multilingual_v2", stream=stream_output ) if stream_output: return stream(audio) return audio def get_case_presentation(self, case: CriminalCase) -> str: """Script for presenting the case.""" return f""" Members of the jury. You are here today to determine the fate of {case.defendant.name}, who stands accused of {', '.join(case.charges)}. {case.summary} You will hear the evidence. You will deliberate. And you will reach a verdict. The burden of proof lies with the prosecution, who must prove guilt beyond a reasonable doubt. Let us begin. """ def get_vote_announcement(self, votes: Dict[str, str]) -> str: """Script for announcing vote.""" guilty = sum(1 for v in votes.values() if v == "guilty") not_guilty = 12 - guilty return f""" The current vote stands at {guilty} for guilty, {not_guilty} for not guilty. {"The jury remains divided." if guilty not in [0, 12] else ""} {"A unanimous verdict has been reached." if guilty in [0, 12] else ""} """ ``` --- ## UI Components ### Kinetic Text Animation ```javascript // For animated text display (like After Effects kinetic typography) // Will sync with ElevenLabs audio or simulate typing class KineticText { constructor(container, options = {}) { this.container = container; this.speed = options.speed || 50; // ms per character this.variance = options.variance || 20; // randomness } async display(text, audioUrl = null) { // If audio provided, sync with it if (audioUrl) { return this.displayWithAudio(text, audioUrl); } // Otherwise, simulate speaking return this.displaySimulated(text); } async displaySimulated(text) { this.container.innerHTML = ''; for (let i = 0; i < text.length; i++) { const char = text[i]; const span = document.createElement('span'); span.textContent = char; span.style.opacity = '0'; span.style.animation = 'fadeInChar 0.1s forwards'; this.container.appendChild(span); // Variable delay for natural feel const delay = this.speed + (Math.random() - 0.5) * this.variance; await this.sleep(delay); } } sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } } ``` ### Gradio UI Structure ```python import gradio as gr def create_ui(): with gr.Blocks(css=CUSTOM_CSS, theme=gr.themes.Base()) as demo: # State game_state = gr.State(None) # Header gr.HTML("

12 ANGRY AGENTS

") with gr.Row(): # Left: Jury Box with gr.Column(scale=1): gr.Markdown("### The Jury") jury_box = gr.HTML(render_jury_box) # 12 seats with emojis/votes vote_tally = gr.HTML() # "7-5 GUILTY" # Center: Deliberation with gr.Column(scale=2): gr.Markdown("### Deliberation Room") deliberation_chat = gr.Chatbot( label="Deliberation", height=400, show_label=False ) # Player input with gr.Row(): strategy_select = gr.Dropdown( choices=[ "Challenge Evidence", "Question Witness Credibility", "Appeal to Reasonable Doubt", "Present Alternative Theory", "Address Specific Juror", "Call for Vote" ], label="Your Strategy" ) speak_btn = gr.Button("Speak", variant="primary") with gr.Row(): pass_btn = gr.Button("Pass Turn") call_vote_btn = gr.Button("Call Final Vote") # Right: Case File with gr.Column(scale=1): gr.Markdown("### Case File") case_summary = gr.Markdown() with gr.Accordion("Evidence", open=False): evidence_list = gr.HTML() with gr.Accordion("Witnesses", open=False): witness_list = gr.HTML() # Audio player for Judge audio_output = gr.Audio(label="Judge", autoplay=True, visible=False) # MCP Server enabled demo.launch(mcp_server=True) ``` --- ## LlamaIndex Case Database ### Index Structure ```python from llama_index.core import VectorStoreIndex, Document from llama_index.core.node_parser import SentenceSplitter class CaseDatabase: """LlamaIndex-powered case database.""" def __init__(self, cases_dir: str): self.cases = self._load_cases(cases_dir) self.index = self._build_index() def _build_index(self) -> VectorStoreIndex: """Build searchable index of all cases.""" documents = [] for case in self.cases: # Index case summary documents.append(Document( text=case.summary, metadata={"case_id": case.case_id, "type": "summary"} )) # Index each piece of evidence for evidence in case.evidence: documents.append(Document( text=f"{evidence.type}: {evidence.description}", metadata={ "case_id": case.case_id, "type": "evidence", "evidence_id": evidence.evidence_id } )) # Index witness testimonies for witness in case.witnesses: documents.append(Document( text=f"{witness.name} ({witness.role}): {witness.testimony_summary}", metadata={ "case_id": case.case_id, "type": "witness", "witness_id": witness.witness_id } )) parser = SentenceSplitter(chunk_size=512, chunk_overlap=50) nodes = parser.get_nodes_from_documents(documents) return VectorStoreIndex(nodes) def query_evidence(self, case_id: str, query: str) -> List[str]: """Query evidence for a specific case.""" query_engine = self.index.as_query_engine( filters={"case_id": case_id} ) response = query_engine.query(query) return response.source_nodes def get_random_case(self, difficulty: str = None) -> CriminalCase: """Get a random case, optionally filtered by difficulty.""" if difficulty: filtered = [c for c in self.cases if c.difficulty == difficulty] return random.choice(filtered) return random.choice(self.cases) ``` --- ## Real Case Data Sources ### Primary: Old Bailey Online (Historical) **Dataset**: 197,745 criminal trials from London's Central Criminal Court (1674-1913) **Access**: - Full XML download: https://orda.shef.ac.uk/articles/dataset/Old_Bailey_Online_XML_Data/4775434 - API: https://www.oldbaileyonline.org/static/API.jsp - 2,163 trial XML files + 475 Ordinary's Accounts **Data Fields**: - Trial ID, date, defendant name/gender - Offence category: theft, kill, deception, violent theft, sexual, etc. - Verdict, punishment - Full trial transcript text **Why This Works**: - Historical cases avoid sensitivity around modern defendants - Rich narrative transcripts perfect for agent reasoning - 18th-century language adds unique flavor - Verdicts are known (ground truth for comparison) **Integration Example**: ```python import xml.etree.ElementTree as ET def load_old_bailey_case(xml_path: str) -> CriminalCase: """Parse Old Bailey XML into CriminalCase model.""" tree = ET.parse(xml_path) root = tree.getroot() return CriminalCase( case_id=root.find(".//trialAccount").get("id"), title=f"The Crown v. {root.find('.//persName').text}", summary=extract_trial_text(root), charges=[root.find(".//offence").get("category")], evidence=extract_evidence_from_transcript(root), difficulty=infer_difficulty_from_verdict(root), year=int(root.find(".//date").get("year")), jurisdiction="London, England" ) ``` ### Secondary: National Registry of Exonerations (Modern) **Dataset**: All U.S. exonerations since 1989 (3,000+ cases) **Access**: https://www.law.umich.edu/special/exoneration/Pages/about.aspx **Data Fields**: - Crime type, state, year of conviction/exoneration - Contributing factors (eyewitness misID, false confession, etc.) - DNA involvement, sentence served **Why This Works**: - Dramatic "wrongful conviction" cases - Clear evidence of reasonable doubt - Tests agents' ability to weigh conflicting evidence types ### Fallback: Curated YAML Cases For demo stability, include 3-5 handcrafted cases in `cases/predefined/`: - `case_001_robbery.yaml` - Clear guilty (baseline test) - `case_002_murder.yaml` - Ambiguous (compelling demo) - `case_003_exoneration.yaml` - DNA reversal scenario This ensures the demo works even if external data sources are unavailable. --- ## File Structure ``` 12_angry_agents/ ├── app.py # Gradio entry point ├── PRD.md # This document ├── requirements.txt ├── .env.example │ ├── core/ │ ├── __init__.py │ ├── game_state.py # GameState, DeliberationTurn models │ ├── orchestrator.py # OrchestratorAgent │ ├── conviction.py # Conviction score mechanics │ └── turn_manager.py # Turn selection, stability check │ ├── agents/ │ ├── __init__.py │ ├── base_juror.py # JurorAgent base class │ ├── judge.py # JudgeAgent (ElevenLabs) │ ├── player.py # PlayerAgent (human interface) │ └── configs/ │ └── jurors.yaml # 11 juror configurations │ ├── case_db/ │ ├── __init__.py │ ├── database.py # CaseDatabase (LlamaIndex) │ ├── models.py # CriminalCase, Evidence, Witness │ └── cases/ │ ├── case_001.yaml │ ├── case_002.yaml │ └── ... │ ├── memory/ │ ├── __init__.py │ ├── juror_memory.py # JurorMemory management │ └── summarizer.py # Memory compression │ ├── ui/ │ ├── __init__.py │ ├── components.py # Gradio components │ ├── jury_box.py # Jury box renderer │ ├── chat.py # Deliberation chat │ └── static/ │ ├── styles.css │ └── kinetic.js # Text animations │ ├── mcp/ │ ├── __init__.py │ └── tools.py # MCP tool definitions │ └── tests/ ├── test_conviction.py ├── test_orchestrator.py └── test_memory.py ``` --- ## Development Phases ### Phase 1: Foundation (4-6 hours) - [ ] Project setup, dependencies - [ ] Data models (GameState, Case, Juror) - [ ] Basic Gradio UI skeleton - [ ] Single juror agent working ### Phase 2: Multi-Agent (4-6 hours) - [ ] All 11 juror configs - [ ] Orchestrator with turn management - [ ] Conviction score system - [ ] Memory system (basic) ### Phase 3: Integration (3-4 hours) - [ ] LlamaIndex case database - [ ] ElevenLabs judge narration - [ ] Player interaction flow - [ ] Vote tracking and stability ### Phase 4: Polish (2-3 hours) - [ ] UI animations (kinetic text) - [ ] Jury box visualization - [ ] MCP server tools - [ ] Demo video recording --- ## Success Metrics 1. **11 agents deliberating autonomously** - TRUE agent behavior 2. **Judge narrating with ElevenLabs** - Audio wow factor 3. **Conviction scores shifting** - Visible persuasion 4. **Player can influence outcome** - Agency 5. **MCP tools functional** - External AI can play 6. **Runs without crashes** - Stability --- --- ## CRITICAL: Performance Optimizations ### The Latency Trap - SOLVED **Problem**: If 1 speaker speaks and 11 agents react individually = 12 LLM calls per turn = SLOW **Solution**: Batch Jury State Update ```python class JuryStateManager: """ Single LLM call to update ALL silent jurors' conviction scores. Replaces 11 individual react_to_argument() calls. """ async def batch_update_convictions( self, argument: DeliberationTurn, silent_jurors: List[JurorConfig], juror_memories: Dict[str, JurorMemory], game_state: GameState ) -> Dict[str, ConvictionUpdate]: """ ONE LLM call updates all 11 jurors' reactions. """ prompt = f""" You are simulating how 11 different jurors would react to this argument. ARGUMENT BY {argument.speaker_name}: "{argument.content}" For each juror below, determine: 1. conviction_delta: float (-0.3 to +0.3) - how much their guilt conviction changes 2. reaction: str - brief internal thought (10 words max) 3. persuaded: bool - did this significantly move them? JURORS: {self._format_juror_profiles_compact(silent_jurors, juror_memories)} Respond in JSON: {{ "juror_1": {{"delta": 0.1, "reaction": "Good point about the timeline", "persuaded": false}}, "juror_2": {{"delta": -0.2, "reaction": "Too emotional, but touching", "persuaded": true}}, ... }} """ response = await self.model.generate(prompt) return parse_batch_response(response) ``` **Result**: 1 speaker + 1 batch reaction = **2 LLM calls per turn** (not 12) ### Active vs Passive Jurors ```python # Each turn, only 2-3 jurors are "active listeners" (full memory update) # Others get simplified heuristic updates def select_active_listeners(game_state: GameState, num: int = 3) -> List[str]: """Select jurors who will fully process this turn.""" # Prioritize: jurors on the fence, jurors addressed directly, random candidates = [] # On the fence (conviction 0.35-0.65) for jid, memory in juror_memories.items(): if 0.35 < memory.current_conviction < 0.65: candidates.append((jid, 2)) # Priority 2 # Recently changed vote for jid in recently_flipped: candidates.append((jid, 3)) # Priority 3 # Random others for jid in all_jurors: candidates.append((jid, 1)) # Weight and select return weighted_sample(candidates, num) ``` ### Context Window Bloat - SOLVED **Problem**: `deliberation_log` grows unbounded **Solution**: Aggressive Rolling Summarization ```python class MemorySummarizer: """Compresses old deliberation history.""" SUMMARY_INTERVAL = 5 # Summarize every 5 rounds KEEP_RECENT = 3 # Keep last 3 turns in full detail async def maybe_summarize(self, memory: JurorMemory, round_num: int): """Compress old turns if needed.""" if round_num % self.SUMMARY_INTERVAL != 0: return # Split: recent (keep full) vs old (summarize) old_turns = memory.arguments_heard[:-self.KEEP_RECENT] recent_turns = memory.arguments_heard[-self.KEEP_RECENT:] if not old_turns: return # Summarize old turns into compact form summary = await self._compress_turns(old_turns) # Replace old turns with summary object memory.deliberation_summary = summary memory.arguments_heard = recent_turns async def _compress_turns(self, turns: List[ArgumentMemory]) -> str: """LLM call to compress multiple turns into summary.""" prompt = f""" Summarize these {len(turns)} deliberation turns into 3-5 bullet points. Focus on: key arguments made, who was persuasive, major position shifts. TURNS: {self._format_turns(turns)} Respond with bullet points only. """ return await self.model.generate(prompt) # Memory structure with summary @dataclass class JurorMemory: # ... existing fields ... # Compressed history (replaces old arguments_heard entries) deliberation_summary: str = "" # "• Juror 3 argued about timeline..." # Only recent turns in full detail arguments_heard: List[ArgumentMemory] # Max ~10 entries ``` ### LLM Call Budget Per Round | Action | Calls | Notes | |--------|-------|-------| | 1-4 speakers generate arguments | 1-4 | Parallelizable | | Batch conviction update | 1 | All 11 reactions | | Memory summarization | 0-1 | Every 5 rounds | | Judge narration (ElevenLabs) | 1 | Audio only | | **TOTAL** | **3-7** | Down from 12-48 | --- ## External Participant System (MCP + Human) ### Architecture: Swappable Juror Seats Any of the 11 AI juror seats can be replaced by: 1. **External AI Agent** (via MCP) - Another AI system joins as juror 2. **Human Player** (via UI) - Additional human joins 3. **Default AI** (Gemini) - Predefined personality ```python @dataclass class JurorSeat: """A seat in the jury that can be filled by different participant types.""" seat_number: int participant_type: Literal["ai_default", "ai_external", "human"] participant_id: str | None = None # For AI default config: JurorConfig | None = None agent: JurorAgent | None = None # For external (MCP or human) external_connection: ExternalConnection | None = None class JuryManager: """Manages the 12 jury seats with mixed participant types.""" def __init__(self): self.seats: Dict[int, JurorSeat] = {} self._init_default_seats() def _init_default_seats(self): """Initialize all 12 seats with default AI jurors.""" for i in range(1, 13): if i == 7: # Reserved for primary player self.seats[i] = JurorSeat( seat_number=i, participant_type="human", participant_id="player_1" ) else: config = load_juror_config(i) self.seats[i] = JurorSeat( seat_number=i, participant_type="ai_default", config=config, agent=JurorAgent(config) ) def replace_with_external( self, seat_number: int, participant_type: Literal["ai_external", "human"], participant_id: str ) -> bool: """Replace a default AI with external participant.""" if seat_number == 7: return False # Primary player seat protected if seat_number not in self.seats: return False self.seats[seat_number] = JurorSeat( seat_number=seat_number, participant_type=participant_type, participant_id=participant_id, external_connection=ExternalConnection(participant_id) ) return True def get_participant_for_turn(self, seat_number: int) -> TurnHandler: """Get appropriate handler for a seat's turn.""" seat = self.seats[seat_number] if seat.participant_type == "ai_default": return AITurnHandler(seat.agent) elif seat.participant_type == "ai_external": return MCPTurnHandler(seat.external_connection) else: # human return HumanTurnHandler(seat.participant_id) ``` ### MCP Tools for External Participants ```python # MCP Server exposes these tools for external AI agents def mcp_join_as_juror( case_id: str, preferred_seat: int | None = None ) -> Dict: """ Join an active case as a juror. An external AI agent can take over any non-player seat. Returns seat assignment and case briefing. Args: case_id: The case to join preferred_seat: Preferred seat number (2-6, 8-12), or None for auto-assign Returns: seat_number: Your assigned seat case_briefing: Summary of the case your_persona: Suggested personality (can ignore) current_state: Vote tally, round number """ pass def mcp_get_deliberation_state(case_id: str, seat_number: int) -> Dict: """ Get current state of deliberation. Returns: recent_arguments: Last 5 arguments made vote_tally: Current guilty/not-guilty count your_conviction: Your current conviction score pending_speakers: Who speaks next is_your_turn: Whether you should speak now """ pass def mcp_make_argument( case_id: str, seat_number: int, argument_type: str, # "evidence", "emotional", "logical", "question" content: str, target_juror: int | None = None ) -> Dict: """ Make an argument during your turn. Returns: accepted: Whether argument was processed reactions: Brief summary of jury reactions vote_changes: Any votes that flipped """ pass def mcp_cast_vote( case_id: str, seat_number: int, vote: Literal["guilty", "not_guilty"] ) -> Dict: """ Cast or change your vote. Returns: recorded: Confirmation new_tally: Updated vote count """ pass def mcp_pass_turn(case_id: str, seat_number: int) -> Dict: """Pass your turn without speaking.""" pass ``` ### Human Join Flow (Additional Players) ``` 1. Primary player starts game (seat 7) 2. Game generates shareable room code 3. Additional humans can join via: - URL with room code - Gradio UI "Join as Juror" button 4. They get assigned available seat (2-6, 8-12) 5. When it's their turn, UI prompts for input 6. They see same case file, deliberation history ``` --- ## Model Configuration ### Default: Gemini Flash 2.5 ```python # config/models.yaml default_model: provider: "gemini" model_id: "gemini-2.5-flash" temperature: 0.7 max_tokens: 1024 # Easily swappable per-agent or globally model_overrides: judge: provider: "gemini" model_id: "gemini-2.5-flash" # Fast for narration scripts batch_updater: provider: "gemini" model_id: "gemini-2.5-flash" # Handles all conviction updates # Individual juror overrides (optional) juror_5: # The contrarian philosopher provider: "anthropic" model_id: "claude-sonnet-4-20250514" temperature: 0.9 ``` ### LiteLLM Integration ```python from litellm import completion class ModelRouter: """Route to any model via LiteLLM.""" def __init__(self, config_path: str = "config/models.yaml"): self.config = load_yaml(config_path) self.default = self.config["default_model"] def get_model_for(self, agent_id: str) -> Dict: """Get model config for specific agent.""" overrides = self.config.get("model_overrides", {}) return overrides.get(agent_id, self.default) async def generate( self, agent_id: str, prompt: str, **kwargs ) -> str: """Generate completion using appropriate model.""" config = self.get_model_for(agent_id) response = await completion( model=f"{config['provider']}/{config['model_id']}", messages=[{"role": "user", "content": prompt}], temperature=config.get("temperature", 0.7), max_tokens=config.get("max_tokens", 1024), **kwargs ) return response.choices[0].message.content ``` --- ## Case Data Architecture ### Dual Source: Real + Fallback ```python class CaseLoader: """Load cases from real data or fallback to predefined.""" def __init__( self, real_data_path: str | None = None, fallback_path: str = "cases/predefined/" ): self.real_data_path = real_data_path self.fallback_path = fallback_path # Try to load real data self.real_cases = self._load_real_cases() if real_data_path else [] self.fallback_cases = self._load_fallback_cases() def get_case(self, case_id: str = None, use_real: bool = True) -> CriminalCase: """Get a case, preferring real data if available.""" if case_id: # Specific case requested return self._find_case(case_id) # Random case if use_real and self.real_cases: return random.choice(self.real_cases) return random.choice(self.fallback_cases) def _load_real_cases(self) -> List[CriminalCase]: """Load from real case database (future: LlamaIndex over court records).""" # TODO: Integrate with real case API/database # For now, returns empty - falls back to predefined return [] def _load_fallback_cases(self) -> List[CriminalCase]: """Load predefined cases from YAML files.""" cases = [] for file in Path(self.fallback_path).glob("*.yaml"): case_data = yaml.safe_load(file.read_text()) cases.append(CriminalCase(**case_data)) return cases # Future: Real case integration class RealCaseConnector: """ Connect to real case databases. Designed for easy integration later. """ def __init__(self): self.sources = { "court_listener": CourtListenerAPI(), # Future "justia": JustiaAPI(), # Future "local_files": LocalCaseFiles(), # CSV/JSON dumps } async def search_cases( self, query: str, filters: Dict = None ) -> List[CriminalCase]: """Search across all connected sources.""" pass async def get_case_details( self, source: str, case_id: str ) -> CriminalCase: """Get full case from specific source.""" pass ``` --- ## Execution Environment ### Local First, Blaxel Ready ```python # config/execution.yaml execution: mode: "local" # "local" | "blaxel" | "docker" local: # No sandbox, runs in process timeout_seconds: 30 blaxel: api_key: "${BLAXEL_API_KEY}" sandbox_id: "12-angry-agents" persistent: true # Keep sandbox warm docker: image: "12-angry-agents:latest" memory_limit: "2g" # Usage in code class ExecutionManager: """Swappable execution environment.""" def __init__(self, config_path: str = "config/execution.yaml"): self.config = load_yaml(config_path) self.mode = self.config["execution"]["mode"] def get_executor(self) -> Executor: if self.mode == "local": return LocalExecutor() elif self.mode == "blaxel": return BlaxelExecutor(self.config["execution"]["blaxel"]) elif self.mode == "docker": return DockerExecutor(self.config["execution"]["docker"]) async def run_agent_code(self, code: str, context: Dict) -> str: """Execute agent-generated code safely.""" executor = self.get_executor() return await executor.run(code, context) ``` --- ## Player Input: Strategy + Optional Free Text ```python # Hybrid input: Low friction strategy selection + optional elaboration ARGUMENT_STRATEGIES = [ { "id": "challenge_evidence", "label": "Challenge Evidence", "prompt_hint": "Point out weaknesses in a specific piece of evidence", "allows_free_text": True, }, { "id": "question_witness", "label": "Question Witness Credibility", "prompt_hint": "Raise doubts about a witness's reliability", "allows_free_text": True, }, { "id": "reasonable_doubt", "label": "Appeal to Reasonable Doubt", "prompt_hint": "Emphasize the burden of proof", "allows_free_text": False, # AI handles this }, { "id": "alternative_theory", "label": "Present Alternative Theory", "prompt_hint": "Suggest what might have really happened", "allows_free_text": True, }, { "id": "address_juror", "label": "Address Specific Juror", "prompt_hint": "Respond to or persuade a specific juror", "requires_target": True, "allows_free_text": True, }, { "id": "free_argument", "label": "Make Custom Argument", "prompt_hint": "Say whatever you want", "allows_free_text": True, "required_free_text": True, }, ] # UI Component def player_input_ui(): with gr.Row(): strategy = gr.Dropdown( choices=[s["label"] for s in ARGUMENT_STRATEGIES], label="Your Strategy", value="Challenge Evidence" ) target_juror = gr.Dropdown( choices=["None"] + [f"Juror {i}" for i in range(1, 13) if i != 7], label="Target (optional)", visible=False # Show only for "address_juror" ) free_text = gr.Textbox( label="Add details (optional)", placeholder="e.g., 'Focus on the timeline inconsistency'", max_lines=2, visible=True ) return strategy, target_juror, free_text ``` --- ## Open Questions 1. Exact ElevenLabs voice ID for judge? 2. Should external AI participants see other AI jurors' internal conviction scores? yes configuablein code. 3. Max simultaneous external participants (performance)? 12 4. Case difficulty selector in UI? no/ random