Your hospital bought the wrong AI

In 1995 I became director of a small care home. The regulation was heavy and the paperwork relentless: countless separate registrations every morning for the inspectors. The only software on offer, an accounting package plus MS Access and Excel, was never built for a nurse's work floor. So I built our own, sitting with the head nurse, care assistants and physio to learn how each worked, then writing the forms, medication modules, inspection reports we were graded on, and much more.
What made it work has stuck with me for thirty years. Everything sat in one database, and each job had its own purpose-built table: medication rounds, wound care, treatment plans, the daily registrations the inspectors checked. When the regulator wanted proof, I pulled it up in minutes instead of hiring administrators to chase paper, because the data was already shaped by the task. That kept the salaries at the bedside. (I wrote up that arc here, in Dutch.)
Now look at what hospitals buy. When Ontario audited the medical AI scribes it approved, most got things wrong: about 60% recorded the wrong drug, nearly half invented details. The models weren't broken. They were doing what one big model does when you ask it to be everything at once.
Most hospitals spent two years buying the wrong shape of AI: one monolithic assistant, when what works is a team of small, specialised agents reading from a single shared (federated) database.
Why one big brain breaks
Ask one model to draft notes, suggest codes, check drug interactions, triage messages and answer patient questions, and its attention thins across a dozen unrelated jobs. Accuracy slips and hallucinations rise. Worse, when it gets something wrong there is no record of which step failed, which is the first thing a safety review asks for. A black box AI that confidently writes down the wrong drug is worse than no tool at all, because you cannot even see where it went wrong.
The swarm that's already shipping
The alternative is less glamorous and much harder to fool: ten small agents per clinician, each tuned to one narrow job, with its own guardrails and audit log. One drafts the note, another suggests codes, others handle triage, prior authorisation, lab interpretation and drug-interaction checks. Each does one thing, and you can see exactly what it did.
NVIDIA's own researchers argue that small language models are the future of agentic AI: capable enough for the narrow, repetitive jobs and ten to thirty times cheaper to run, small enough to sit on local hardware rather than a data centre. A simple safeguard bears this out: add a second agent to check the first, and production hallucinations fall under 1%, roughly the human rate, because a model catches another model's mistakes far better than its own.
My care home stumbled onto this by accident: specialise the work, but give each task its own table on one shared database. That is the part worth copying. When every agent reads and writes its own narrow table, the schema itself becomes the audit trail; you can see which agent touched which record, and when, without reconstructing it afterwards. One shared database does not have to mean one physical monolith, though. It can be distributed across systems and still act as a single logical, auditable source of truth. Monolithic reasoning is what fails in a clinic; one well-structured place to store what every agent did is what makes the team safe to run. The same shape was on display at NVIDIA's 2026 healthcare event: IQVIA showed a platform already running more than 150 specialised agents, and the design everyone talked about was a hospital digital twin on shared data.
Why narrow and auditable wins
A clinician tends to trust a handful of narrow tools that each prove themselves long before trusting one chatbot that claims to do everything. A hospital tech committee should treat a multi-year, all-in-one contract as a risk, because it locks the building into one architecture just as the field moves past it. Vendors feel it too: the moat is moving from the biggest model to the most auditable team on the cleanest data. And a regulator, or a patient's family, will find a logged chain of small steps far easier to certify, and to explain when it goes wrong, than a black box.
| One big brain | Ten small agents, one shared database |
|---|---|
| One model juggles everything | Each agent owns one task |
| "The AI said so", no trail | Every step has its own audit log |
| Confidence you can't question | Reliability you can check |
More = better?
More agents is not automatically better. New benchmarks show many multi-agent health systems still choke on messy, real-world data, and the best on a recent test solved only 28% of its tasks. The bottleneck there is rarely the model; it is the data underneath and the discipline to scope each agent to one job. (There is a second, equally hard problem about whether anyone on the floor has been trained to use these systems, but that one deserves its own piece.)
Intelligence was the easy part. The hard part is the plumbing: give each agent a narrow remit and keep every decision on a database you can audit. Get that right and you'll trust it at 3am on a short-staffed ward, which is the only test that counts.
Next experiment: if you sit on a hospital tech committee, before you renew or sign any "AI assistant" contract this year, ask the vendor one question. Can you show me the audit trail for one decision, step by step? If the answer is a shrug, you're being sold a system you can't interrogate. Walk away.
💥 May this leave you asking for AI you can question, and a vendor who can show you how it decided.