Do Repository-Level Skills Files Improve AI Coding Agents? Evidence Says Not Really

There is a widespread belief in developer circles right now that goes roughly like this: if you want your AI coding agent to work well in your codebase, write it a detailed instruction file. Tell it about your architecture. List your directories. Explain your conventions. Run /init and let the agent generate one automatically if you can't be bothered.

Over 60,000 public repositories now contain some version of this file — called AGENTS.md, CLAUDE.md, .cursorrules, or copilot-instructions.md depending on the tool. Anthropic, OpenAI, and Google all recommend the practice in their documentation. Tutorials exist on how to write the perfect one. Entire tools have been built to evaluate and improve them.

The premise is intuitive. Context helps. More context helps more. Right?

Not quite.

The Assumption Nobody Tested

Here's what almost nobody did before those 60,000 repositories added their context files: run a controlled experiment.

Did anyone actually compare agent performance with a context file against without one? Did anyone measure whether the agent's behavior improved, stayed the same, or got worse? Did anyone count the cost?

For the most part, the answer is no. Adoption spread on vibes. Developers added an AGENTS.md, noticed the agent seemed to behave better, and reported their anecdotal experience online. Others read those reports and added their own files. The practice became a default assumption of agentic development without ever being rigorously validated.

When you actually run the numbers — testing the same tasks with and without context files, across multiple agents and models, with carefully designed benchmarks — the results are uncomfortable.

Context files, as most teams write them today, tend to make agents slightly worse at solving tasks and significantly more expensive to run.

What's Actually Happening When You Add a Context File

To understand why this happens, you need to understand how AI coding agents work at a basic level.

An agent doesn't read your codebase the way a human developer does. It doesn't develop a mental map over weeks of familiarity. Instead, at the start of every task, it gets a context window — a finite amount of text it can process at once. Everything it "knows" for that task lives in that window: the task description, the tool outputs from exploring files, the conversation history, and yes, your AGENTS.md.

The agent then makes decisions based on everything in that window simultaneously. And this is where the first problem appears.

As the context grows, language models often struggle to make good use of all the information they're given. Agent-generated context quickly turns into noise instead of useful information. Researchers studying this phenomenon call it "context rot" — the observation that models do not use their context uniformly; their performance grows increasingly unreliable as input length grows.

Your AGENTS.md file is not free. It costs tokens, and those tokens compete with everything else the agent needs to think clearly about the actual problem in front of it.

The Distraction Problem

Here's a more specific version of what goes wrong.

Imagine you've written an AGENTS.md that does what most guides recommend: a codebase overview, a directory structure, your tech stack, your coding conventions, your testing approach, your linting rules, and your git workflow. A thorough, professional document. Maybe 600–800 words. Looks good.

Now the agent picks up a bug report. Something about a subtle off-by-one error in a parser module.

To fix that bug, the agent needs to: find the right file, read the relevant code, understand the logic, and write a patch. It does not need to know about your directory structure (it can ls and see). It does not need to know your git workflow (it's not committing yet). It does not need your full testing philosophy (it'll write the test once it knows the fix).

But all of that information is sitting in context. And the agent, instructed to be a conscientious rule-follower, tries to honor it all.

Even when models can perfectly retrieve relevant evidence, the sheer volume of distracting context degrades their ability to apply that evidence to solve problems. Research has given this a name: attention dilution. The signal — the actual bug fix — gets drowned in the noise of instructions that weren't needed for this task.

What tends to happen in practice is that the agent explores more broadly, runs more tests, writes more code, and takes more steps than it would have without the context file. On the surface, this looks like diligent behavior. In terms of outcomes, it often isn't. The agent solves fewer tasks correctly and costs more per task.

The Auto-Generated Context File Trap

It gets worse if you used your agent's built-in init command to generate the file in the first place.

Almost every major coding agent — Claude Code, Codex, Qwen Code — offers a command that analyzes your repository and writes an AGENTS.md or CLAUDE.md for you. It scans your directory structure, identifies your tech stack, reads your existing documentation, and produces a polished-looking file in seconds.

This creates an illusion of completeness. The file exists. It has content. It looks professional.

The problem is that an LLM generating a context file from your existing docs is mostly just... summarizing your existing docs. If you already have a README, existing documentation, and well-structured code, the auto-generated context file adds almost nothing that the agent couldn't discover on its own. It just adds it upfront, in the context window, for every single task, whether relevant or not.

This redundancy matters. When researchers stripped all documentation out of repositories and left only the LLM-generated context file, agents actually performed better with the file than without it. When the original documentation was present — as it is in most real projects — the context file helped nothing. The agent already had access to better sources of the same information.

The auto-generated file isn't wrong. It's just unnecessary for well-documented codebases. And unnecessary context has a cost.

The Maintenance Trap

There's a second trap, and it's slower to reveal itself.

Bootstrapping context is not the challenge. Maintenance is.

An AI agent writes your AGENTS.md on Day 1. It accurately reflects your stack. You switch your test runner from Jest to Vitest on Day 47. You refactor your directory structure on Day 83. You add a new service on Day 112. Your AGENTS.md still says what it said on Day 1.

Now your agent is working from confidently stated, professionally formatted, completely wrong information. It's not that the agent doesn't trust the file — it's that the file is actively misleading it. A hallucination you wrote yourself and version-controlled.

An agent's effectiveness goes down when it gets too much context, and too much context is a cost factor. Models have gotten quite powerful, so what you might have had to put into the context half a year ago might not even be necessary anymore. Stale context files combine the worst of both problems: they're large enough to dilute attention and inaccurate enough to mislead.

What Context Files Are Actually Good For

None of this means you should delete your AGENTS.md. It means you should understand what it's for.

The cases where a context file genuinely earns its place are narrower than most developers assume:

Specific commands the agent can't infer. If your test command is make test-unit rather than pytest, the agent will waste steps discovering this. Write it down. If your linting command is unusual, write it down. If you have a specific environment setup script, write it down. These are things the agent cannot easily discover by exploring the repository, and they're short enough that the token cost is justified.

Documentation gaps. If your repository has almost no existing documentation — no README, no docstrings, no inline comments — an AGENTS.md can be the only source of orientation the agent has. In this case, it genuinely helps. But note: the right fix is usually better documentation, not an agent-facing file that substitutes for it.

APIs not in training data. If you're using a very recent framework version, an internal SDK, or proprietary libraries that no model has ever seen, providing targeted documentation in context genuinely helps. This is the case Vercel demonstrated with Next.js 16 APIs — the agent had no other way to know those APIs existed.

What does not belong in a context file is everything the agent can discover on its own: directory structures, standard tooling configurations, widely-used frameworks, architectural patterns evident from the code itself, and anything already covered in your README or docs.

Include in AGENTS.md	Leave it out
Build and test commands	Directory structure overviews
Unusual environment setup	Standard framework conventions
APIs not in model training data	Coding style (if you have a linter)
Hard project-specific constraints	Architecture already visible in code
Non-obvious tooling choices	Anything already in your README

The Right Mental Model

The most useful way to think about a context file is not as a README for the agent. It's more like a briefing for a highly competent new contractor who already knows how to code.

You wouldn't hand a senior developer a 20-page document about what a function is and why testing matters. You'd tell them: our CI runs with make test, we use Postgres not MySQL, and don't call external APIs from unit tests. Three lines. Everything else, they'll figure out.

The goal of context engineering is to balance the amount of context given — not too little, not too much. Build context files gradually, and don't pump too much in right from the start.

A two-sentence AGENTS.md that tells the agent your test command and one non-obvious constraint will outperform a 600-word one that tries to explain your entire codebase. Not because more information is bad in theory — but because irrelevant information in context is not neutral. It costs money, it competes with relevant information for the model's attention, and it goes stale.

Write less. Be specific. Measure the difference.

Before You Write Another Line

Here's a practical test before you add anything to your context file. Ask: can the agent discover this on its own within two or three tool calls?

If yes, leave it out. The agent doesn't need a head start on information it can find in seconds. You'll save tokens, reduce noise, and likely get better task completion.

If no — if the information is genuinely opaque to the agent without being told — then include it, as concisely as possible.

The developers who get the most out of AI coding agents right now are not the ones with the most detailed AGENTS.md files. They're the ones who treat context as a scarce resource and spend it carefully.