23 September 2025

The Document and the Database Were Always the Same Thing

By Asgeir Albretsen5 min read

knowledge-managementmarkdownstructured-dataai

You write a note after a call with a colleague. You mention she prefers async updates, hates back-to-back scheduling, is working on a proposal that needs your sign-off by next week. A month later, your AI assistant is helping you prepare for another meeting with her. You paste the note. It reads the words fine. But it doesn't know that "prefers async" is a preference record, or that "Sarah" in your note is the same Sarah in your contacts, or that the sign-off became a task or got dropped entirely.

The note was readable but not queryable. That's the crack in the floor.

Two storage systems that drifted apart

For most of computing history, documents and databases had genuinely different performance characteristics. A relational database was fast at structured queries and slow at storing prose. A text file was easy to write and impossible to query. Given those constraints, the separation made sense. You stored records in the database and notes in the file.

That tradeoff stopped being technically necessary sometime in the 1990s. But the tools kept the split anyway — not for engineering reasons but because patterns are sticky. Your notes app evolved from the file cabinet. Your CRM evolved from the Rolodex. The assumption that they're categorically different things became invisible, baked into the UI before anyone questioned it.

The person who questioned it seriously, as far as I can tell, was Carsten Dominik. In 2003 he wrote org-mode for Emacs: a plain text system that could hold todos, calendar entries, project notes, and structured tables, all in the same file, readable as prose and operable as data. You could write a paragraph about a project and embed a deadline on the same line. A tool could parse that deadline and build a schedule from it. Nothing in the file changed — the structure was visible to both humans and machines at once.

He called it "organized plain text." He was about twenty years early.

What the dual format actually gives you

The insight at the core of org-mode — and what YAML frontmatter captured partially, and what Obsidian's Dataview plugin now demonstrates at scale — is that the same document can serve two purposes without contradiction. The prose is for the human. The typed fields are for the machine. Neither cancels the other out.

Dataview, which lets users write SQL-like queries over their Markdown vaults, has become one of the most-installed plugins in the Obsidian ecosystem. People are essentially building a database query layer on top of their text files by hand: adding status: active in frontmatter, writing inline fields like [due:: 2025-10-01], then running queries across hundreds of notes. The plugin is a workaround for a missing primitive. People want their prose to be queryable. They've wanted this long enough that they built their own infrastructure to get it.

The limitation of Dataview — and frontmatter in general — is that structure lives at the top of the file or as scattered inline conventions. There's no schema enforcement, no type checking, no guarantee that person:: Sarah Kim in one file means the same as person:: [email protected] in another. It works until it doesn't, and when it breaks, it breaks silently.

The AI changes what's at stake

Before AI tools were involved, the document-database split was a nuisance. You kept things in two places, you searched twice, you occasionally forgot to update one of them. Annoying, not catastrophic.

With AI in the picture, the split becomes a structural failure. An AI that can read your notes but can't run a typed query across them is limited to pattern-matching on text. Ask it what you know about Sarah's communication preferences and it searches for the string "Sarah" and returns adjacent words. That's not retrieval. It's grep with extra steps. But an AI that only has access to structured records, without the surrounding prose, gets facts without nuance. It knows Sarah's job title and birthday. It doesn't know she's navigating a hard quarter or that you owe her a follow-up from July.

The dual representation solves this. Typed entities — people, tasks, preferences — give the AI something it can query with precision. The prose gives it context the structured fields can't hold. A note that contains both a typed person block and three paragraphs of observations is more useful to an AI than either alone.

This is what Harbor is built around. Documents stay readable and portable — plain Markdown on disk, openable in any text editor, exportable as a zip. Underneath, the structured blocks get parsed into SQLite rows that AI can actually query: give me all tasks linked to this person, find notes where this project is mentioned, list preferences with high confidence. The document is the user-facing surface. The database is the query surface. They hold the same data.

The engineering challenge is keeping them in sync. Parse the Markdown when it changes. Validate the typed blocks against a schema. Write back to the document if the structured record is updated elsewhere. That's more plumbing than a pure text file and more flexible than a pure database. The complexity is real.

So is the payoff.

Carsten Dominik's insight from 2003 was that the split between documents and records was a choice made under specific constraints, not a law of the medium. Those constraints are gone. Most software is still designed as though they aren't.

Asgeir Albretsen is the founder of Harbor.

← All posts