Scaling a patient-management system without buying a bigger server.
By Scott Li ·
How we got a 500K-client patient system on AWS back into reasonable response times without upgrading the host — splitting tiers, distributing cold data, and the agent-loop reporting pipeline that replaced template code.
We got a Singapore medical group’s patient-management system back into reasonable response times without taking AWS’s recommendation to upgrade to a dedicated host and a high-performance database tier. The client’s binding constraint was cost. The architecture survived and the lessons compound.
The system is still in production, with modifications. Here’s the working version — same skeleton, three changes that matter.
What the original moves got right
The four moves we made:
- Split the public patient portal from the employee portal. Two separate compute tiers, each scaled and tuned independently.
- Distribute cold data across cheaper auxiliary servers. Logs older than a week, visit data older than six months, reports older than six months — each on its own modest VM.
- An archive automation service. A scheduled .NET service that swept the primary DB and migrated cold rows to the cheap servers, with metadata in the primary so queries could still find them.
- Web-service APIs for cross-server reads. When a primary-tier query needed a cold row, it called a web service against the appropriate auxiliary, returning real-time results.
The architecture survived a near-tripling of the user base. The decision not to upgrade the dedicated host saved the client roughly 60% on infrastructure cost over that window, on the same workload.
What we changed
1. Replaced the bespoke web-service APIs with PolyBase / linked queries
We initially wrote a small .NET web-service layer because cross-server SQL Server queries (linked servers) had a reputation for being unstable. We had less ideological feelings about this later and consolidated onto SQL Server PolyBase / linked-server reads for the cold-data tiers. The custom .NET layer became one fewer thing to deploy.
This is a case where the boring built-in tool was, in fact, the right tool.
2. Moved the archive service to AWS Step Functions + Lambda
The archive service originally ran as a Windows scheduled task on the primary server. It worked. It also concentrated a fragile job on the primary host. We re-implemented it as a Step Functions workflow with Lambda steps and S3 staging. The primary host got slightly faster; the archive workflow became visibly retryable.
3. Read replicas, finally
We initially didn’t add read replicas because the client wanted to avoid the licence cost of a second SQL Server. The client grew enough that the licence cost was a smaller proportion of the bill, and we added a read replica for the heavy-report queries. Reports moved off the primary entirely.
What AI changed
Two changes worth calling out.
Reports moved to an agent loop
We ported the patient-summary report generation from a static-template pipeline to an agent loop. The agent reads the patient record, the latest assays, recent visit notes, and writes a summary report for the attending clinician — with every fact in the summary linked back to the source row. Roughly:
- A clinician requests a report; the request goes onto a queue.
- An agent picks up the request, reads the relevant rows from the read replica, drafts the summary.
- A second agent fact-checks every claim against the underlying rows, flagging anything that doesn’t trace back.
- A clinician reviews and signs.
We removed about 4,000 lines of template code and a few hundred lines of edge-case handling. We also added an eval harness for the agent loop and a dedicated incident-response runbook for it (a separate piece of work — see IMDA’s Model Governance Framework for Agentic AI, read by builders).
Cost monitoring became a first-class signal
Originally we monitored CPU, memory, IOPS. The agent-loop reports run on inference, which is a new cost line with its own variance. We now monitor:
- Inference cost per report, per model version.
- Latency per report, with separate alerts for tail.
- Eval pass rate on the fact-checker, weekly.
A correct report generated for thirty dollars of inference is, in product terms, a wrong report. Cost is part of the eval.
Architecture today
public patient portal ────► public-tier ASP.NET ──┐
│
employee portal ──────► internal-tier .NET ───┤
▼
primary MS SQL (hot)
│
read replica (reports)
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
log archive visit archive report archive
(>1 week) (>6 months) (>6 months)
▲ ▲ ▲
└──────── Step Functions archive workflow ────┘
Report generation:
clinician request ──► queue ──► drafter.agent ──► fact-checker.agent ──► clinician sign-off
│
eval harness
The instinct: scale by separating, then by distributing, then by replicating, then by upgrading the host — in that order. Most teams reach for the upgrade first. It is rarely the cheapest move.
— Scott Li, wGrow