RAID is not a backup, RAID is not a backup
Did I mention that RAID is not a backup? Or that RAID is not a backup?
My posts lately have tended toward the computer-ish side of things, and this one will be no different. It will be brief and concise, to wit:
RAID is not a backup
I realize that is the H~2~0 third time you have read this in the last 14 seconds, but there is the story behind it:
Most of us photographer types have a lot of data that needs to be stored digitally. The days of ring binders filled with plastic negative and slide sleeves are coming to a close. I still shoot a lot of film, but in just my casual, non-professional digital shooting over the last seven years, I have accumulated about 1 terabyte of digital images on my computer! Naturally, I don’t want to lose any of these photographs. So what is the best strategy for protecting them?
I’ll start by admitting that my job involves using seismic data for oil and gas exploration. We have HUGE amounts of data and interpretation that we need to access on high powered graphical workstations. What I have learned the hard way over the last thirty years are a few things:
- Hard drives will fail. Always. It is not if, it is when.
- A copy is not necessarily a backup
- Your strategy is only as good as the weakest link
- Belts and Suspenders are fine, but it doesn’t hurt to use a little super glue too
It seems like every day I will see some offhand comment on a photography forum from someone who has just gotten a new RAID device (often a Drobo or Buffalo-type device) and that they can now breathe easy. Well, this is a misplaced sense of security. Here is the deal: A RAID device is only as good as the the piece of hardware or software that controls it. A RAID uses multiple drives and distributes the data among them in a systematic way. But the hardware or software controller is like the map. You lose the map, and your data is gone, and likely can only be recovered by some very pricey specialists who make quite a good living by being able to reconstruct your map. It is hugely complicated, and is akin to reconstructing a wine glass that has had an encounter with a concrete floor.
“Oh”, you say, “But I have RAID-5, and I can have one disk go bad and it will rebuild the data if I lose a disk”. Yep, it could work that way. Or not. That assumes the failure point is a bad disk. What if something else fails? Say, for instance, that you have a hardware based RAID-5 standalone box like the Mercury Elite QX-2, and have it configured in 3+1 mode, which means that it is RAID-5 and can reconstruct the data with the ‘hot spare’ if one of the primary disks goes down. What if the hardware/firmware inside the box goes bad? Where does that leave you? Up a foul-smelling estuary without means of locomotion. This isn’t a disk going bad, this is the hardware-controller having a brain aneurysm and not having any recollection of where it left the car keys.
This happened to me about three weeks ago. No disk failures. Just a hardware failure at a level above the disks. Did I freak out? No, because now I don’t trust anything, and I had two spare copies of that data off site and one live copy on-site that is cloned once a day. I just packed up the box and sent it back for a replacement. But the moral is: Don’t trust a RAID as a backup. The only safety you have is many copies of your data in many different places. And for backup purposes, I would prefer two or three copies on high capacity single drives over a RAID anyday.
Another lesson I learned is not to rely on a piece of software to automatically backup your data. You need to have a method of confirming that the backup took place. I like to use Carbon Copy Cloner for my mac, and I have it send me an email when a backup starts and when it finishes. I also have it use the Growl notification windows to tell me the same thing. If I get up in the morning and I don’t have matching pairs of Growl notifiers on my screen when I log in, I know there is a problem.
Trust, but verify, in short. I learned this the hard way last spring when my backup software failed to run for two weeks and then my system disk went bad. I was all smug that I had a live clone, and then horrified to find out that it was two weeks out of date because Super-Duper software hiccuped and stopped backing up. I hadn’t set all the notification routines in place that I have currently, and I had no clue there was a problem. I switched to Carbon Copy Cloner, and so far (fingers crossed) I have not had a problem.
And while I am on the subject of Carbon Copy Cloner, I want to emphasize that this is a replication method, not a true backup. All the software does is ensure that the files on one piece of hardware are duplicated onto another. If you have a major screwup on the parent copy, all the cloning does is to ensure that you have the same screwup written to your child copy. Backup software like Time Machine allows you to store incremental copies of your data, and if you decide you want to go back to the work you had a week ago, it will make that possible. A replication approach will merely allow you to go back to the most recent copied version, and that is it.
So here are a few pieces of advice:
- Backups to hard disks are fine, but use more than one, and DO NOT use a RAID as a backup
- Have a copy of your data in a location that is not in the same place as your computer. Fire and flood will destroy both the original and the backup
- Analyze your system. There is always a weak link.
- Redundancy is your friend.