Tuesday, March 4, 2008

Electronic Health Records for non-owned doctors -scalable infrastructure

This is my fifth entry about our Electronic Health Record project for non-owned doctors. As I've described, the scope of the project is to implement a highly reliable, secure, feature rich, well supported, but affordable electronic health record for private practices. Today's entry is about building the scalable centralized Software as a Service hosting infrastructure to meet these goals.

A key design requirement for the project is scalability. Our projected customer base is 300 clinicians and we have a fixed start up budget. However, we must design the infrastructure in a way that can cost effectively support the smallest amount of adopters as well as scale to thousands if our project is wildly successful. We debated two possibilities (metaphorically speaking)

a. Build a hotel, not knowing if anyone will ever check-in
b. Build a housing development, where the limits of expansion are only defined by available land

We decided on choice "b", starting with a robust foundation and adding new equipment and storage as we add clinicians. We standardized our central site equipment on products from HP EMC and Cisco, with guidance from our infrastructure partner, Concordant, and our equipment supplier CDW, ensuring it was easy to plug in additional hardware on demand. We invested a significant amount of time designing the central hosting facility, doing it right the first time. Over the years, I've seen CIOs rush through the design phase, only having to rebuild the infrastructure later when application performance did not scale. We partnered with our vendors to build something special, that if successful, could be a model for other medical centers and communities.

Considerations in designing our hosting infrastructure included:
* Supporting a user base that is remote, unmanaged, and diverse. We need to be able to identify any performance issues via end to end monitoring of all components
* Meet important security and privacy restrictions, as well as address liability issues (who is responsible for what)
* Understand infrastructure costs for a) start up b) additional capacity that occurs in bursts or steps, and c) variable requirements as practices go live.
* Provide connectivity to external parties (labs, claims, etc...) through interfaces which create additional security and performance complexities
* Address the limitations and performance of "last mile" connectivity through publicly available internet access (DSL, cable, etc.)

The infrastructure choices we made are:

Virtualized servers - VMware was the natural choice because of the scalability design goals. VM and V-Motion technologies also play an important part in redundancy, failure recovery and disaster recovery

Physical services - We debated rack mounted verses blade servers and elected to use powerful small footprint HP rack servers connected to fast multi-tiered storage. We computed the economics of blade servers verses rack mounted servers and the use of VMWare made small powerful rack servers the most cost effective solution.

Storage - We purchased a Clariion CX3-20 series SAN. We will go live with 11.1TB of total storage (2.1TB of fast, Tier 1 storage for database transactions and 9TB of secondary, Tier 2 storage for files). A single CX3-20 will allow us to expand in a modular fashion to accommodate up to 1200 practices. We'll also be leveraging a disk to disk backup strategy, using tape only for disaster recovery.

Network Infrastructure – We implemented a high speed network backbone with multiple paths for redundancy using
* Cisco Integrated Services Routers (ISR) 2811s for internet connectivity
* Adaptive Security Appliances (ASA) 5520 for Firewalling, Intrusion Protection and IPSec VPN Client termination
* Catalyst 4948 Switches for Server connectivity and layer 3 routing
* MDS 9000 Series Multilayer SAN Switches for SAN connectivity

Security – We incorporated physical, technical and administrative controls to protect confidentiality, integrity and availability.

SSL Accelerators - We are using Array Networks TMX-2000, the hardware recommended by eClinicalWorks to optimize web server performance.

Redundancy & Disaster Recovery - One of the real challenges to this project is the price sensitivity of our private clinicians. We needed to build a world class system at a price that all clinicians could afford. Redundancy and disaster recovery is like life insurance - it's a great investment only if you need it. We had to balance our infrastructure investment with total cost of ownership, given the fixed hospital contribution and physician frugalness. In the end we used the equation

Risk=likelihood of bad events * impact of bad events.

We believe that it is much more likely that a component will fail than an entire data center be destroyed, so we elected to build a highly redundant infrastructure in a single data center for now, expanding to a secondary data center once we have sufficient clinicians signed up to fund the new infrastructure. Networking gear, servers, power and cabling are duplicated within a commercial co-location facility. Storage is disk to disk redundant. Tapes are moved offsite nightly. Once the hardware is up, we'll work with Concordant, Cisco, EMC, HP, Array Networks, and the Co-Location facility to test physical hardware and operating system/database software redundancy. Then we'll install eClinicalWorks and run the redundancy tests again. We'll also engage Third Brigade at that time for intrusion/security testing.

We've written a comprehensive disaster recovery plan and if we lost the co-location facility due to disaster, we would recover the tapes from offsite storage and build a replica of the hosting environment (VMWare plays a key role here) and restore the data. The recovery point and recovery time objectives for this plan will be clearly communicated to all who sign up for service. The customer base for our Software as a Service solution is mainly small practices which operate Monday through Friday 8am-6pm. Our disaster recovery plan includes a solid practice/workflow specific contingency/downtime plan. We will also perform a mock downtime as part of each implementation.

By creating a highly redundant single data center with rack mounted servers, two tiered storage, virtualization and offsite tape backup, we believe we've balanced scalability, affordability, and maintainability. We go live this Summer and I'll let you know if we were right!

1 comment:

Unknown said...

John - not sure if you've been keeping up with Amazon's recent cloud computing developments (EC2/S3/SQS) ... certainly not (yet) appropriate for EMR use but interesting nonetheless.

Companies like ELASTRA are building value-added products on top of Amazon's stack. It's only a matter of time before someone offers a similar service with a strong SLA and HIPAA compliance.

I'm not sure if you've spoken to this before, but how do you generally evaluate hosted/managed software and services within your organizations?