Infra & Security 19 April 2026 · 8 min

Scaling a patient-management system without buying a bigger server.

By Scott Li · 19 April 2026

How we got a 500K-client patient system on AWS back into reasonable response times without upgrading the host — splitting tiers, distributing cold data, and the agent-loop reporting pipeline that replaced template code.

We got a Singapore medical group’s patient-management system back into reasonable response times without taking AWS’s recommendation to upgrade to a dedicated host and a high-performance database tier. The client’s binding constraint was cost. The architecture survived and the lessons compound.

The system is still in production, with modifications. Here’s the working version — same skeleton, three changes that matter.

What the original moves got right

The four moves we made:

Split the public patient portal from the employee portal. Two separate compute tiers, each scaled and tuned independently.
Distribute cold data across cheaper auxiliary servers. Logs older than a week, visit data older than six months, reports older than six months — each on its own modest VM.
An archive automation service. A scheduled .NET service that swept the primary DB and migrated cold rows to the cheap servers, with metadata in the primary so queries could still find them.
Web-service APIs for cross-server reads. When a primary-tier query needed a cold row, it called a web service against the appropriate auxiliary, returning real-time results.

The architecture survived a near-tripling of the user base. The decision not to upgrade the dedicated host saved the client roughly 60% on infrastructure cost over that window, on the same workload.

What we changed

1. Replaced the bespoke web-service APIs with PolyBase / linked queries

We initially wrote a small .NET web-service layer because cross-server SQL Server queries (linked servers) had a reputation for being unstable. We had less ideological feelings about this later and consolidated onto SQL Server PolyBase / linked-server reads for the cold-data tiers. The custom .NET layer became one fewer thing to deploy.

This is a case where the boring built-in tool was, in fact, the right tool.

2. Moved the archive service to AWS Step Functions + Lambda

The archive service originally ran as a Windows scheduled task on the primary server. It worked. It also concentrated a fragile job on the primary host. We re-implemented it as a Step Functions workflow with Lambda steps and S3 staging. The primary host got slightly faster; the archive workflow became visibly retryable.

3. Read replicas, finally

We initially didn’t add read replicas because the client wanted to avoid the licence cost of a second SQL Server. The client grew enough that the licence cost was a smaller proportion of the bill, and we added a read replica for the heavy-report queries. Reports moved off the primary entirely.

What AI changed

Two changes worth calling out.

Reports moved to an agent loop

We ported the patient-summary report generation from a static-template pipeline to an agent loop. The agent reads the patient record, the latest assays, recent visit notes, and writes a summary report for the attending clinician — with every fact in the summary linked back to the source row. Roughly:

A clinician requests a report; the request goes onto a queue.
An agent picks up the request, reads the relevant rows from the read replica, drafts the summary.
A second agent fact-checks every claim against the underlying rows, flagging anything that doesn’t trace back.
A clinician reviews and signs.

We removed about 4,000 lines of template code and a few hundred lines of edge-case handling. We also added an eval harness for the agent loop and a dedicated incident-response runbook for it (a separate piece of work — see IMDA’s Model Governance Framework for Agentic AI, read by builders).

Cost monitoring became a first-class signal

Originally we monitored CPU, memory, IOPS. The agent-loop reports run on inference, which is a new cost line with its own variance. We now monitor:

Inference cost per report, per model version.
Latency per report, with separate alerts for tail.
Eval pass rate on the fact-checker, weekly.

A correct report generated for thirty dollars of inference is, in product terms, a wrong report. Cost is part of the eval.

Architecture today

   public patient portal ────► public-tier ASP.NET ──┐
                                                     │
   employee portal     ──────► internal-tier .NET ───┤
                                                     ▼
                                          primary MS SQL (hot)
                                                     │
                                          read replica (reports)
                                                     │
                              ┌──────────────────────┼──────────────────────┐
                              ▼                      ▼                      ▼
                        log archive           visit archive          report archive
                        (>1 week)             (>6 months)            (>6 months)
                              ▲                      ▲                      ▲
                              └──────── Step Functions archive workflow ────┘

   Report generation:
        clinician request ──► queue ──► drafter.agent ──► fact-checker.agent ──► clinician sign-off
                                                              │
                                                         eval harness

The instinct: scale by separating, then by distributing, then by replicating, then by upgrading the host — in that order. Most teams reach for the upgrade first. It is rarely the cheapest move.

— Scott Li, wGrow

← All field notes Brief a crew →