5 November 2025

The Quiet Power of Knowing What Your AI Doesn't Know

By Asgeir Albretsen5 min read

knowledge-baseai-agentscontextretrieval

There's a 2023 paper out of Stanford that doesn't get enough attention in AI product circles. Nelson Liu and colleagues fed language models long documents with a relevant answer buried somewhere inside, then measured how well the model found it. The result was uncomfortable: performance was highest when the answer appeared at the very beginning or very end of the document. When the relevant passage sat in the middle, accuracy dropped sharply. They called it "lost in the middle."

The implication is stranger than it first sounds. Language models process every token you give them — they read the whole thing. But attention isn't uniformly distributed. The model's effective working memory clusters toward the edges. Pile enough context in, and the useful parts disappear into statistical noise.

Which means "give your AI more information" and "make your AI smarter" are not the same thing.

The instinct to open everything

When people first set up an AI with access to their knowledge base, the natural move is to grant access to as much as possible. Of course you do. You want the AI to have everything it might need — to surface the reference you half-remember from a meeting two years ago, to find the preference note you wrote once and forgot. More complete knowledge should mean more capable AI.

But completeness and precision pull in different directions. An AI searching across two hundred folders doesn't just have more to work with — it has more irrelevant material to wade through. The relevant answer is in there, somewhere, buried in the middle.

A retrieval system pulling from a tightly scoped folder of project notes produces more precise results than one pulling from an entire unstructured workspace. Research on RAG systems has found that keeping related information together improves retrieval precision by 20 to 30 percent. This matters because the quality of what gets retrieved shapes the quality of what gets generated. Not garbage in, garbage out — the subtler version: noise in, hedged vague answer out.

There's also a cost dimension. Attention in transformer models scales quadratically with context length. Doubling the size of the context doesn't double the processing cost; it quadruples it. Giving an AI access to everything, always, is not just imprecise. It's expensive in ways that compound.

Scoping as a performance decision

This is the case for treating folder boundaries as a performance primitive, not just a security measure.

When you limit an AI agent to specific folders — the project notes for one client, the person records related to a current engagement — you're not restricting it out of caution. You're making it better at the task. The retrieval signal gets cleaner. The model's attention budget goes toward material that's actually relevant. Answers become more specific and more reliable.

In Harbor, agents are configured with an explicit knowledge scope: which folders they can read, which databases they can touch, which documents are in play for a given purpose. A "Product Strategist" agent might have access to Ideas/ and Projects/Product/ and nothing else. Not because the personal journal needs protecting (though it does), but because the agent doesn't need the personal journal to do its job well. Scoping is how you turn a capable agent into a useful one.

That's a distinction worth pausing on. A capable agent can theoretically handle anything. A useful agent has been configured to handle something specific, precisely.

The briefing problem

Think of it as the difference between a skilled person and a well-briefed one. A consultant with twenty years of experience will still give you a generic answer if you ask a vague question with no context about what problem you're actually solving. Brief them properly — here's the project, here's the relevant history, here's the scope — and you get something you can act on.

Context scoping is the briefing. It tells the AI what kind of problem this is before it starts working. Without that structure, the model is left to figure out relevance on its own. It does this imperfectly.

The folder structure of a knowledge base turns out to be doing more work than organizational tidiness. Every folder boundary is a scope declaration. Every permission limit is a signal about what matters for this task. The architecture of the workspace shapes the quality of every answer that comes back.

Most people think about AI access the way they think about filing cabinets: the more you put in, the more complete the record. But retrieval isn't filing. The discipline of scoping forces a question you might otherwise skip: what is this agent actually for? Answering that clearly is most of the work. The folder permissions are just how you make that answer visible to the software.

Asgeir Albretsen is the founder of Harbor.

← All posts