Why AI Agent Evals Belong in the Harness, Not a Dashboard
Cursor found the same Claude model scored 46% with one agent harness and 80% with another. AI agent evals only earn their keep when they live inside the harness with the authority to block a bad deploy.