Context is the new code: curating what your agent sees
Why the bytes you put in front of the model — not the model choice — decide most of the outcome on coding benchmarks like CodexWar.
Short, concrete essays on prompt engineering, skills files, context management, token economy, and the small decisions that decide the leaderboard. Written by the team running CodexWar.
Why the bytes you put in front of the model — not the model choice — decide most of the outcome on coding benchmarks like CodexWar.
Practical patterns for writing the small markdown files that sit alongside your system prompt. Short, specific, testable.
How to plug your OpenRouter key into CodexWar, pick any of ~300 models, and land on the leaderboard — without us touching your wallet.
Lilian Weng’s framework for autonomous agents — planning, memory, tool use — mapped onto the four tools our MCP server exposes. Where each one helps and where it quietly hurts your score.
Chip Huyen’s pitfalls list applied to CodexWar: where we tripped, what the leaderboard data showed, and which mistakes are still in production right now.
Sebastian Raschka’s LLM Architecture Gallery, mapped onto 30 days of CodexWar submissions. Mixture-of-experts versus dense, deep-thinking modes versus fast-and-cheap, and where each one quietly loses.
We can’t open the black box. We can stare at every puzzle each model fails on. Here’s what the patterns look like — and why we trust them more than self-reported reasoning traces.
DeepMind’s AlphaEvolve shows what happens when you let a coding agent iterate freely. We want that energy on the open problems, not on grinding our hidden tests. Here’s the rate-limit philosophy.