FIELD NOTE №01 · § 06
Field note · Essay

What we learned generating a 700-document synthetic legal corpus.

The first thing a Knowledge Director asks when I show them FirmMemory is whether the demo is running on their data. The answer is nobody's: Holborn Carter LLP doesn't exist. This note is about why a fake law firm was the right way to build a real product.

Published
21 May 2026
Reading time
14 min
Author
Zubin Rajasekar
Filed under
Synthetic corpus · Trust
Fig 00 · The corpus, stackedHolborn Carter LLP · May 2026
§ 01 — Opening
01 / 08

Why a fake law firm was the right way to build a real product.

The second thing a Knowledge Director asks, after I say no, is whose data it is running on. The 114 matters in the demo, the 739 documents behind them, the 15,084 paragraphs we index and retrieve from: all of it was generated. It is a synthetic UK law firm, built on purpose, designed to behave the way a real mid-market FS-led City firm would behave, so that I could test FirmMemory end-to-end without ever touching a real firm's privileged work.

This is a note about why that decision is the one I get the most pushback on, from the most well-intentioned places, and the decision I would make again on the next build, harder.

§ 02 — The question
02 / 08

The question I was actually trying to answer.

I started this project with a thesis: that the unsolved problem in legal AI is not generation, it is trust under retrieval. The lawyer's question is rarely what is the law; it is what has my firm said about this before, and where is it, and is what I am about to advise consistent with it? That problem is not solved by giving a model access to Westlaw. It is solved by giving a model access to the firm's own institutional memory, and then constraining the model so heavily that it can only return claims anchored in that memory, and is honest about the parts of the question the memory cannot answer.

To prove that thesis, I needed a system testable under exactly the conditions a real firm would care about. Specifically:

  • Hero queries with citations to paragraphs in specific documents.
  • Graceful, visible refusal on questions the corpus cannot support.
  • Known-but-empty matters surfaced rather than confabulated around.
  • An adversarial evaluation including questions designed to make the system fail.

You cannot test any of those properties without a corpus that has a known shape. You cannot have a known shape if you are running on someone else's data.

I needed to know what the corpus contained, deliberately, in order to know whether the system was telling the truth about what the corpus contained.

A real firm's matter archive is opaque. You can run a question through it and get an answer back, but you have no way to grade the answer. You can only grade an answer if you know, in advance, what should have been retrieved and what shouldn't have been. So I built a firm where I knew.

§ 03 — The build
03 / 08

What I actually built.

Holborn Carter LLP is a 1998-founded mid-market UK law firm, headquartered in Holborn, with offices in Manchester and Bristol, ~155 fee-earners, ~32 partners, ~£62M revenue. It is built around financial services regulatory work, with corporate/M&A as the engine, plus tax, real estate, and employment. Twelve named partners, a corpus-wide house style, paragraphs of 60–140 words, no bullet points in standard memos. The partners write like senior solicitors at a serious City firm.

From the firm profile, I generated a matter ledger: 108 real matters plus 6 deliberate gap matters, distributed across five practices and eight years (2017–2024). Each matter has an ID, a codename, lead and secondary partners, regulatory themes, cross-references, and a one-paragraph note on what made it distinctive. The ledger was generated deterministically with a fixed random seed.

Then, from the ledger, I generated documents. Twenty document types with a per-matter menu in YAML. Each document was one API call to Claude Sonnet 4.6, with a system prompt encoding house style and retrieval-aware paragraph rules: every paragraph must be intelligible in isolation.

Documents
739
~15,084 paragraphs indexed
Matters
114
108 real · 6 gap
Generation
6h
Sequential bulk run
API cost
~£30
Prompt caching on input

We embedded every paragraph using BGE-large-en-v1.5, running locally, 1024 dimensions, stored in PostgreSQL with pgvector. The embedding step is local on purpose, because the entire trust architecture of FirmMemory rests on the firm's content never leaving the deployment perimeter. Building the synthetic corpus the same way taught us, in advance, every operational quirk that real firms will hit when they ingest theirs.

§ 04 — Hero matters
04 / 08

The three planted matters.

A real firm doesn't have a uniform corpus. It has hero matters, and the long tail of routine work. So I planted three: Project Meridian (2021–22), a Saudi sovereign acquisition of a 340MW UK solar portfolio, £285M. Project Falcon (2023), a UAE family office acquisition of a 600MW Celtic Sea offshore wind asset, £180M. Project Lyra (2024), a Qatari acquisition of a UK battery storage portfolio, £210M.

Each carries a specific regulatory load (NSI Act 2021, FCA s.178 change of control, EU Foreign Subsidies Regulation, Crown Estate consents). Falcon cross-references Meridian; Lyra cross-references both. The Lyra novel area research memo flags the EU Foreign Subsidies Regulation as the firm's first substantive engagement with that regime.

When you ask the system whether the firm has advised on cross-border M&A involving a Middle Eastern sovereign fund acquiring UK renewable energy assets, the synthesis layer returns paragraphs from all three matters, names them by codename, lists common hurdles, and flags the EU FSR as a novel point addressed only on the most recent matter. Every claim is cited. The bench block shows matters by deal value and lead partner. The gaps block names what the corpus does not cover.

I built the answer first, by hand, in the firm profile. Then I generated the corpus that produces the answer. Then I built the retrieval and synthesis pipeline that returns it. The work was forward-engineered from the outcome I needed to demonstrate.

§ 05 — Gap matters
05 / 08

The six gap matters, and why they exist.

A real firm has not done everything. A system that confabulates around those gaps is worse than useless. The whole product hinges on getting this right. So the corpus had to test it.

I named six gap matters in the ledger (Aurora, Bramble, Cassiopeia, Daffodil, Ember, Goldfinch), with real matter IDs and metadata. They appear as referenced but absent: other matters cross-reference them, but no documents are generated. The ledger marks them as status='gap'.

A demo question about Russian inbound investment returns Project Aurora by name, with its matter ID, and an explicit statement that the firm has a record of the matter but no documents in the system. A question that should match no matter returns a clean refusal without any matter dressed up to look like a precedent.

The gap is the trust signal. Not the answer.
§ 06 — Evaluation
06 / 08

The thirty queries.

I built an evaluation kit of thirty queries across five categories: hero, mid, gap, out-of-scope, and adversarial. The kit is graded on six axes per query: right matter at rank one, synthesis cites real paragraphs, opening lands cleanly, bench block honest, gaps block correct, overall response credible.

On the canonical run that closed our retrieval-and-synthesis checkpoint, the system scored 29 out of 30. The one failure was on the Russian inbound investment query: the system correctly identified that the firm had no Russian work, but its opener named an adjacent UAE matter rather than naming Project Aurora by codename. Documented, scoped for the next pass, and shipped anyway.

I think that 29/30 number is more honest than 30/30 would have been. A system that passes every query I designed for it is a system I have over-fit.

§ 07 — Limits
07 / 08

What this corpus is not.

  • Not training data. No FirmMemory deployment is trained on Holborn Carter. The corpus was built for evaluation and demonstration only.
  • Not legal advice. The documents read like internal work product; the substance is fiction.
  • Not a model of the legal industry. The methodology generalises; this specific corpus does not.
  • Not a substitute for ingesting a real firm. The demo is the demo. The real product is what we deploy into your perimeter, on your matters, behind your security wall.
§ 08 — What it taught us
08 / 08

What it taught me.

Three things, in order of how much they have shaped the next eighteen months of work.

  • First: synthetic corpora are not a compromise. They are the only honest way to build and ship a knowledge system legible to a regulator's risk function.
  • Second: the corpus is the product as much as the code is. Every retrieval improvement came out of staring at the corpus and asking what kind of document this query needs to find.
  • Third: building a fake firm taught me what a real firm needs. The questions we ask in discovery are the questions I had to answer for myself when I built Holborn Carter.

A real firm has a real Holborn Carter buried in its document management system. It just hasn't been written down. Building one from scratch taught me what we are looking for when we go and find it.

Next field note: the gap between RAG demos and production legal knowledge systems.

If you want to see Holborn Carter in motion, the live demo is at neuralhue.com. To book a working session, use the contact page.

Bring three partners. Leave with a working diagnostic.

Forty-five minutes. NDA-first. We come prepared with public information about your firm.

You leave with our written read on where Firm Memory would pay back inside a quarter, and a candid answer on whether we're the right partner for that work.