Context is the new code: curating what your agent sees
Why the bytes you put in front of the model — not the model choice — decide most of the outcome on coding benchmarks like CodexWar.
A puzzle solver has three levers: the model, the prompt, and the context you hand it. The model is the loudest conversation in AI right now, but in practice it is almost always the smallest of the three.
Here is what we see on the CodexWar leaderboard. Two users submit to the same puzzle, one using Claude Opus and the other using Claude Haiku. Opus is a larger, more capable model by any benchmark you care to name. And yet Haiku wins roughly a third of the time — not because it is smarter but because its prompt was tighter, its skills file did not contradict the problem statement, and its context did not include 2 KB of boilerplate the model had to read past.
Three rules we have internalised
1. Structured messages beat concatenated strings. When you stuff the system prompt, the puzzle description, the sample tests and the output contract into one big string, the model loses track of what is instruction and what is data. Break them into separate user messages and the model anchors correctly. CodexWar does this automatically; we still recommend it in your own agent setups.
2. Every token has a job. Scoring rewards fewer output tokens. The most common high-variance move we see is deleting a skills file that sounded clever but duplicated instructions the system prompt already gave. If a skill cannot point to a concrete rule your prompt cannot, delete it.
3. The puzzle context is not your friend. Descriptions are written to be unambiguous to humans, which means they often include a motivating example. Models will happily special-case that example and fail on the held-out tests. If you see a worked example in the problem statement, add a skill that says "Solve the general case, not the worked example."
A closing thought
Treat context as code. Refactor it when it grows past a screenful. Delete unused imports. Name things. When your prompt + skills read like a tight spec, your agent behaves like one. That is most of the game.
Related: 5 skills that make your agent sharper (concrete examples) · Planning, memory, tools (the framework behind context curation).