Model,Score,95% CI GPT-4 Omni,3.18,+0.06/-0.06 GPT-4 Turbo,3.1,+0.06/-0.06 Gemini 1.5 Pro,3.06,+0.07/-0.07 Gemini 1.5 Flash,2.98,+0.07/-0.07 Llama 3 70B,2.9,+0.07/-0.07 Claude 3 Opus,2.86,+0.08/-0.08 Claude 3 Sonnet,2.79,+0.08/-0.08 Claude 3 Haiku,2.73,+0.08/-0.08 Gemini 1.0 Pro,2.56,+0.07/-0.07 Llama 3 8B,2.56,+0.07/-0.07 GPT-3.5 Turbo,2.52,+0.08/-0.08 Gemma 7B,2.14,+0.07/-0.07 Gemma 2B,1.83,+0.16/-0.16