Ten People, Ten Different Decisions: The Hidden Cost of Manual Claims Processing
Written by
Tanja Spasevska
Reading time
11 min read
Published
June 8, 2026

For about a year I coordinated a team that built and ran a claims-processing engine for Medicare durable medical equipment. Continuous glucose monitors, CPAP machines, the supplies people genuinely depend on to get through their day. Over the life of that project we processed millions of claims. What follows is partly what I lived through and partly what I’d tell anyone weighing this up, because most of what gets written about “AI claims automation” is busy solving the wrong problem.
The error nobody puts in the budget
When people picture manual claims processing going wrong, they picture someone being slow, or fat-fingering a number. That happens. It was never our biggest cost.
Our biggest cost was that no two people processed a claim the same way.
Hand the same claim to ten experienced reviewers and you would get ten slightly different decisions. Not because anyone was careless. Because the policy left room to interpret, and every person filled that room with their own judgment and whatever habits they had picked up from whoever trained them. One reviewer reads a face-to-face note as enough. The next one flags that same note as missing a detail. Both are reasonable. Both can defend it. And both cannot be right when an auditor shows up asking why.
There is a hard money number underneath all of this, even when nobody writes it down. Every manual claim carries a loaded cost once you add up the time to read the file, key the data, check eligibility, chase the missing document, and pass it along. Call it £30 to £60, or €25 to €50, depending on where you sit. In the Medicare world I came from it ran in dollars and it added up quickly. The uncomfortable part is that most of that money gets spent on claims that were always going to be approved. You are paying experienced people to do data entry on the easy ones.
Two things make it worse than the spreadsheet suggests. The cost climbs in a straight line with volume, because more claims just means more people; there is no leverage in a manual process. And the backlog arrives late. A surge in new members or new product lines doesn’t hit the queue the week you sign them. It lands six to twelve weeks later, once those policies start generating claims, by which point the team that was coping is suddenly underwater and hiring takes another quarter.
We spent hours just keeping everyone aligned
So that is where the time actually went. Every week we would sit down and reconcile decisions that should have matched and didn’t. Explain the same eligibility rule for the third time to the same group of smart people. Clean up a whole batch because two reviewers had read one requirement differently and now the numbers wouldn’t tie out.
Before anyone said the word “speed,” before a single claim moved faster, we were burning real hours just keeping the team’s understanding of the rules in sync with each other. That cost never shows up on a report as cost-per-claim. It shows up as a team that is busy from morning to night and a backlog that somehow never gets any smaller.
Automation didn’t make us faster. It made us consistent.
Here is the part that surprised me. When we automated, speed was the headline, but speed turned out to be a side effect. The thing that actually moved was consistency.
The interpretation became one interpretation. We sat in a room, argued out what a rule really meant, once, and wrote it down as a decision the system would apply the exact same way every single time. No drift. No “well, the way I read it.” We had the disagreement on purpose, one time, instead of having it by accident a hundred times a week, scattered across the team.
That is what finally canalised the work. All the energy we used to pour into re-aligning people got pointed at the one thing people are genuinely better at than any machine: the hard, novel cases that no rule covers yet. The system took the thousand routine claims that were all quietly the same claim. The team took the ten that were actually different. The error rate dropped because the variance dropped, and the variance dropped because there was finally one source of truth instead of one per person.
What the system actually did, layer by layer
People say “automate claims” as if it is one thing. It is really four, and skipping any one of them is where most of these projects quietly fall apart. Here is how it broke down for us.
First, it read whatever showed up
Claims arrive as a mess. Claim forms, hospital invoices, referral notes, loss notifications, photos, portal uploads, PDFs and scans of every quality you can imagine. The first job is to read all of it and sort it: what kind of claim is this, which line, which documents are here and which are missing. This is the part that genuinely suits a model. At the volume we ran, you cannot keep a template per form. You need something that can read a document it has never seen before and still pull the right fields off it.
Then it applied rules we owned, not a model’s hunch
This is the layer that decides whether the whole thing is trustworthy or a liability. The tempting shortcut is to let the model decide eligibility too. Don’t. Reading a document and deciding a claim are two different jobs, and blurring them is exactly what turns “AI” into something you cannot defend.
So the model read, and a deterministic rule decided. Rules we had written, that we owned, that our own people could edit without filing a ticket with engineering and waiting on a release. Eligibility, completeness, the coverage logic. The decision was explainable because a named rule fired, not because a model felt confident about it.
It cleared the easy claims and handed us the hard ones
Anything clean and in-policy resolved on its own. Everything else, the genuine exceptions and the cases nobody had seen before, came to a person with the file already pulled together and the missing pieces flagged. Our most experienced reviewers stopped doing data entry and started spending their day on the work only they could do.
And it logged every decision with a trail
The structured result went back into the core system, and this is the part that earns its keep the first time you get audited: every decision carried a record of which rule fired, what read the document, and what it had in front of it. When a regulator or an internal reviewer asked why a claim went the way it did, we didn’t reconstruct it from memory. We showed them. Whether it’s a UPIC auditor poking at Medicare documentation or an FCA review of claims handling in the UK, that trail is the difference between a quiet afternoon and a bad month.
It sat beside our core system, not on top of it
The objection I hear most is integration risk. Nobody wants to rip out the core they already run on, and you shouldn’t have to. We didn’t. The engine plugged into the systems we already used and read documents from where they already lived.
At DocuGenius the same principle holds. It connects to Salesforce through a security-reviewed AppExchange app that pushes the claim data and the audit trail straight into your instance. Guidewire, a legacy core, or a regional ERP connect through standard REST, JSON, and CSV, with file-based imports for the older systems that need it. And when data residency actually matters, it runs on-premise in the EU, with a DORA-aligned vendor profile for the procurement and security reviews that come with tier-one buyers. A pilot environment can be live in about a day, running your real claims, your rules, your data, next to the core you already trust.
Where this goes, stage by stage
You don’t need anyone’s case study to work out where you sit today. You need to be honest about where you are. This is roughly the curve, from fully manual to the point where the routine work just runs itself.
| Stage | What it looks like | What it costs you |
|---|---|---|
| Claims handled ad hoc, no standard process. Backlog managed by overtime. | Cost per claim unknown and rising. Turnaround unpredictable. |
| 2. Structured but manual | A defined process, but every step is human. Spreadsheets and email. | Linear cost. Quality depends on who is on shift that day. |
| 3. Partially automated | OCR or RPA bolted on. Breaks when a form changes. Automation rate stalls below 40%. | Maintenance burden. Brittle. The audit story is thin. |
| 4. Autonomous resolution | Clean claims resolve straight through; only exceptions reach a person. Editable rules, full audit trail. | Cost finally decouples from volume. Same team, far more throughput. |
| 5.Proactive | The system flags missing documents and policy gaps before the claim stalls. Friction is prevented, not processed. | The backlog stops forming. Capacity becomes something you can plan around. |
Most teams I meet are sitting at stage two, or a brittle stage three that breaks every time a form changes. The jump that actually pays for itself is the one from there to stage four. That is where the variance collapses and the hours stop disappearing into reconciliation.
How to tell if you are past the tipping point
I don’t have a glossy case study to hand you, and to be honest I am a little suspicious of the ones people hand me. What I have is a feel for where an operation tips over, the point where adding people stops helping and the inconsistency starts to compound. From what I lived through, you are probably there if a few of these ring true:
- Clean claims still take more than a few days to clear.
- Your reviewers regularly land on different answers for claims that should be straightforward.
- A meaningful share of claims gets escalated “just to be safe.”
- Fewer than half your claims get through without someone reworking them.
- You took on more volume this year without adding the people to match, and everyone can feel it.
Three or more, and you are not behind on technology. You are past the point where more headcount fixes it, because every new hire is one more interpretation of the same rule.
Start small, then watch your own numbers
At DocuGenius we start small on purpose. One line, six to eight weeks, a pilot environment running your real claims inside about a day. We measure three things against your own baseline: how many claims flow through without a human touching them, how long they take, and how often they come out wrong. It starts at €500 a month, not a six-figure quote before the first claim has even moved.
You don’t have to take my word for any of it. Run it on one line and watch your own variance drop. That is the number that convinced me, and I was the skeptic in the room.
Book a 15-minute call and I will walk through what this looks like for your claim types and your core system.
— Tanja, Growth & Operations, DocuGenius
A few questions I get asked
What does manual claims processing really cost?
Less in typos than you think, and far more in inconsistency. The expensive part is the time spent reconciling decisions that should have matched and re-aligning reviewers on the same rules, week after week. It rarely shows up as cost-per-claim, which is exactly why it goes unmanaged.
Is automated claims processing safe for a regulated, audited environment?
When it is built right, yes. The trick is keeping document-reading separate from decision-making. Let the model read; let a deterministic, auditable rule decide. Every decision keeps an evidence trail of the rule that fired and the inputs it saw, which is what you produce when a reviewer or an auditor asks why.
Does this replace the claims team?
No. It takes the routine claims that are all quietly the same claim, and it hands your people the ones that are genuinely different. The same team handles far more volume and spends its time on judgment instead of data entry and reconciliation.
How long before we see anything?
A pilot runs six to eight weeks on one line, with the environment live in about a day. You measure straight-through rate, cycle time, and error rate against your own baseline. No rip-and-replace of your core system.
What savings are realistic?
The gains came less from raw speed and more from variance collapsing: fewer reworks, fewer escalations, fewer batches to clean up. Rather than quote you someone else’s number, I would run the pilot on one line and measure the delta yourself.