Wednesday, June 3, 2009

Service Level Agreements

I was recently asked about our approach to Service Level Agreements (SLAs) at BIDMC.

We develop customer facing SLA's for every new infrastructure and application as part of our standard project management methodology. We work collaboratively with the application owner and subject matter experts to develop a mutually acceptable process for support escalation, with defined availability and response times.

The end result is a series of documents which outline customer and IS responsibilities, as well as provide enough detail about the application to understand its scope and uses.

Customer Facing Documents:
1. Customer Project and Post Project Responsibilities - This document serves as a foundation for each project and sets customer expectations for support roles and responsibilities.
2. Service Level Agreement - I've attached an SLA for a live application to illustrate the types of service level documentation we provide.

Internal IT Documents:
1. Business Impact Analysis - a worksheet used by our managers to facilitate discussions with application owners and document service level of objectives based on business requirements.
2. Service Level Objectives - availability and disaster recovery service levels by class of application

A few general observations about our SLAs.

1. Much of our planned downtime is now done as a background task thanks to improvements in our configurations. For example, we have clustered servers, redundant network components and Internet connections, mirrored storage devices, shadowed or mirrored data bases, and other improvements that have remarkably decreased the need for disruptive, planned outages.

2. Escalation processes differ slightly for our mission critical clinical applications, such that downtime over two hours triggers implementation of paper-based downtime procedures.

3. In addition to our own hosted applications, we have a few Software as a Service applications. Our SLAs with hosting vendors include:

a. Expected uptime. In some cases this is backed by a well-defined formula that states the goal, e.g. 99.9%, and any other qualifiers such as excluding planned downtime that is done at a mutually agreed upon times. Whatever is set as an uptime goal usually drives the high availability and disaster recovery configurations.

b. Transaction performance. This has traditionally not been a problem for us, but for applications that may have not been engineered well, it's an important component of an SLA.

c. Escalation. Defining the event levels (priority one, priority two etc.), contacts, and what response time (phone vs on-site) and repair time can be expected is a key component. Time to repair is usually a tough negotiation in hosted application SLAs.

d. Remedies. This is not usually defined in internal agreements, but is for vendor agreements. The typical remedy is a credit on future maintenance payments, which is not always satisfying if you lose an application for a prolonged period.

Feel free to use my SLA documents as templates in support of your own service level documentation needs.

No comments:

Post a Comment