The FHIR Server Benchmark You Have Probably Not Seen Yet

A new FHIR server performance report quietly went public this week and has not yet made the rounds the way bigger interoperability news usually does. Health Samurai put four FHIR servers on the same hardware, the same data, and the same load harness, then published a dashboard that reruns the comparison every day. For healthcare IT teams that have been waiting for something better than vendor brochures, this is the data point the conversation has been missing.

For more FHIR primers for healthcare IT, the wider site collects related explainers. The headline pieces of this one are worth a quick scan.

What Got Measured

The four servers under test are Aidbox, HAPI FHIR, Medplum, and the Microsoft FHIR Server. Each runs in a container with identical resource limits, 8 vCPU and 24 GB of memory, on a bare-metal host with 64 cores and 500 GB of RAM. The dataset is Synthea-generated, 1,000 patients, around 2 million resources. The load generator is Grafana k6, scripted to push CRUD, bundle import, and search workloads against each server in turn.

The Headline Number

CRUD throughput on the 2026-06-29 snapshot ranges from about 5,200 requests per second at the top to 440 at the bottom of the four. The middle of the table has HAPI near 3,058 and Medplum near 1,420. Those are the numbers the dashboard renders straight from the latest run, not curated for any particular framing.

The numbers move from day to day because the harness reruns nightly. Snapshotting any single day is fine for a news read; for procurement, watching the trend over a few weeks is more honest.

Why This Setup Is Useful

Most public FHIR benchmarks change the hardware between rows, change the data between rows, or both. The result is rows that cannot be honestly compared. Forcing identical container limits and identical Synthea data across all four servers strips out the easy excuses. Healthcare IT teams reading the dashboard get a like-for-like read instead of a vendor-tailored slide.

What the Report Is Open About

The report is open about its limits. The dataset is small enough to fit in memory, so the working set never spills to disk the way a production load would. The note attached to the report says the next post in the series tests at scale. Treat the current snapshot as a baseline, not a production model.

It is also worth saying that Health Samurai authored the benchmark and also makes Aidbox. That does not invalidate the numbers, but it is the kind of context any reader should hold in mind. Because the methodology and the harness are open, anyone can re-run the test and publish their own snapshot. The Medplum CTO has already forked the repository, which is the healthy direction for this kind of work.

Where It Slots in for an IT Team

For most teams the right read is to use the dashboard as a sanity check, not a final answer. Production FHIR servers are picked on a much wider list of criteria, including auth, the form layer on top, and operations behavior at scale. The top 5 FHIR form builders for API-first healthcare platforms walks through the form layer that almost every healthcare stack puts on top of a FHIR server. The complete guide to FHIR form builders for modern healthcare stacks frames the rest of the picture.

A daily-updated, open, four-server FHIR benchmark did not exist a year ago. That on its own is news worth knowing.