Yes it does happen, it did so this week and as sure as the universe started with a big bang sooner or later your server is going to die. Ok so that’s not 100% true. But then it’s also not 100% true that you will have a car accident tonight on the way home. Yet you will put on your seat belt and you most likely checked the numbers off and location of air bags in your car before deciding to buy it.
This week a routine lunch time job on a server went wrong and the result was that at 1 PM the system was down. A corrupt RAID container meant that all info on all hard drives was gone and it was not coming back. The good news is that at around 7 PM that night it was back up and running. 6 hours may sound like a long time, and in business hours it is. But to reinstall Windows, reinstall the backup software, catalogue the tape, restore Windows and Active directory, restore data and rights, and restore Exchange. That is very quick. It was quick and it worked because of some very simple things.
- The Engineer doing the job had done "bare metal" restores before.
- He had another Engineer onsite with him to help with checking his logic.
- The tape drive was a fast tape drive. (If your backup takes 8 hours, then a bare metal restore will need a minimum of double that spent just on reading the tape and writing the data back)
- The backups were full backups of everything on the disks and the backups had all worked.
- We were restoring to the same hardware.
Over the years I have heard of, seen and been hands-on involved with dozens of full recoveries. I know that it happens and I know that a complete system down costs clients a lot of money. I also know that for the engineers undertaking the recovery, it is a very stressful experience. Yet almost weekly I will somewhere be dealing with someone that does not want to buy the faster or larger tape drive, does not want to the more expensive – recommended – backup software. The most frustrating part of this is, that typically that person will be home in bed whilst I’m pulling an all-nighter stressing buckets over recovering there system using slow equipment and unreliable backups.
(Having said that I also notice that after 15 years I’m still in this industry so maybe there is also a element of egomaniac that somehow enjoys saving someone else’s butt!) However it does not need to be that way. 6 hours is a quick recovery time, but how much value would there have been to that company if instead of a 1 PM till 7 PM outage they had been up and running as soon as 3 PM?
Kinetics has just been testing various solutions around continuous online backups and quick data recovery. One that looks very promising is a hardware device by Sonicwall called CDP. Such devices are not replacements’ for tape backups - that is not there purpose. Tape backups can be taken offsite and because you have multiples of them allow you to travel back though time to find missing or corrupt data. Devices like the Sonicwall are designed for a quick recovery - be it of a single document or a whole server. Check out http://www.sonicwall.com/us/backup_and_recovery.html for more info.
READ KINETICS PRODUCT REVIEW, BY KRIS UNKOVICH -
Click Here