Run evaluations against Claude, ChatGPT, and Gemini. Compare how different AI models think, side by side.