Tuesday, February 19, 2008

Cool Technology of the Week

Storage backup and data recovery is at the top of my list of things that will keep me awake in 2008. In healthcare IT, we need short recovery times with minimal or no data loss. Accomplishing this with once-a-day tape backups is not possible. The Cool Technology of the Week, Data Domain de-duplication storage, solves this problem.

At BIDMC, we generate 28 terabytes of new file and email storage each year. Basic file stores have grown so large that we struggle to copy them within our 24 hour backup window. Additionally, our disaster recovery efforts now require us to replicate our data across two geographic locations.

Tape backup, which has been in use at BIDMC for decades, suffers from a variety of problems. Tape backups are time-consuming. Tapes are fragile and require physical security when transported. The time required to retrieve and recover from tape stresses our service availability objectives. In years past, we considered backup to disk, but the economics did not work. Data Domain de-duplication now makes disk an economical backup media. Here's how.

Instead of making full tape backups, we can backup changes on a sub-block level to disk then compress the result. There is an important distinction between a tape-based, incremental backup and de-duplication. With incremental backups, files that changed since the last backup are copied. A major problem with this approach is that each incremental backup must be recalled in sequence to recover files. This is a slow and complex process that does not detect if the same file was stored in several locations.

De-duplication, on the other hand, has sophisticated methods for identifying changes at the sub-block level. For example, if a spreadsheet has '&date' in the heading, each time you save it, the date in the title will change. An incremental backup will copy the whole document again. De-duplication at the sub-block level will only copy the date change. If multiple copies of the file are sent in email, it will only save one copy.

Over a two year period we examined many products from many companies. Most of them required proprietary hardware, specialized software, new management tools, training, and multiple staff to support the technology 24x7x365. We believe in information life cycle management/hierarchical storage management, but want one set of tools and compliance with the technology standards already in use in our data center. We chose Data Domain because:
  • The product de-duplicates at the sub-block level yielding better reduction ratios
  • The product looks like regular storage supporting NFS and CIFS file mounts
  • The product requires little training since it's completely managed by Data Domain
  • The product is an in-line appliance and does not require installation of server agents
  • The product works with all our existing backup software
  • The product is highly reliable, using RAID 6 SATA drives and built in hardware redundancy
In our pilot, Data Domain averaged 1:20 compression. It worked well with our existing backup software which made the transition from tape to disk easy. We're deploying 15 terabytes of Data Domain disk storage in our production and backup data centers, replicating data between the two over our a private Ethernet link. Through the use of de-duplication and compression, 15 terabytes will hold the equivalent of 300 terabytes in tape storage. Our goal over the next two years is to completely eliminate the need for tape in the data center.

We're so impressed with Data Domain's performance as a backup infrastructure that we're also planning to use it as an archival tool for less frequently accessed files. To do so, we'll first implement file virtualization technology such as Acopia or Rainfinity. This will enable us to move content from one storage medium to another without impacting file shares, our web-based file access tools, or our SSLVPN remote file access applications. The combination of file virtualization and Data Domain will enable us to support three tiers of storage.

Tier 1 - SAN storage with lower density, high performance drives.
Tier 2 - SAN or NAS storage with high density, low performance drives.
Tier 3 – NAS-based, archival storage with high density drives coupled with Data Domain de-duplication and compression.

With these 3 tiers of storage, we'll reduce our cost of information life cycle management while reducing complexity.


After a 2 year journey exploring backup, recovery, and archiving solutions, I feel we've finally found the answer that will let me sleep at night in 2008.

No comments:

Post a Comment