Tuesday 11 December 2012

Digital Storage: Backup vs. Redundancy

By Dong Ngo,

One of a storage device's most important roles, if not the most important, is to keep the information stored on it safe, especially from hardware failure. Redundancy and backup are the two popular types of data protection. They are not the same, however, and it's important to understand the differences between the two.

Redundancy

In a nutshell, redundancy in consumer-grade digital storage means using more internal drives than necessary to store the information, or in other words, storing the same data in more than one place. There are many ways to do this, but the most popular is the use of a type of RAID (check Part 2 of this series for more information on different types of RAID), which can be set up on storage devices with two internal drives or more. That said, the first and foremost thing you should remember is that redundancy is not a form of backup, but just a fail-safe measure. The most popular RAID configurations that offer redundancy are RAID 1 and RAID 5.

Again: Redundancy is not a form of backup, but just a fail-safe measure in case of failure of the storage device's internal drive or drives.

RAID 1 (which requires at least two drives) uses double the number of drives necessary to store the information. The two drives mirror each other. Thus, only half of the total storage space is available to the user, while the other half is used for redundancy. RAID 5 (which requires at least three drives) uses at most a third more drive(s) than necessary. In RAID 5, what's available to the user is the combined storage space of all drives used in the array minus one. This way if one drive dies, the rest will kick in immediately and no data is lost.

Note: While RAID generally is available in storage devices with more than one internal drive, for Thunderbolt storage you can daisy-chain multiple single-volume storage devices, such as the LaCie Little Big Disk Thunderbolt, and create a RAID that way.

The storage devices involved need to have two Thunderbolt ports, and once a RAID is created, they need to be used at the same time with the same computer. Most of the time, however, it's more economical to buy a RAID-capable multiple-volume storage device, known as a RAID system, a RAID box, or a RAID array.

You can think of redundancy as using two plastic bags, one inside the other, to carry groceries home from the market. This way, if one of the bags is broken or punctured along the way, food, especially broken eggs, won't spill out.

Redundancy is not perfect, however, and here are its pros and cons.

Pros of redundancy: The biggest and most obvious virtue of redundancy is it protects data against drive failure in real time. This means if you are working on a file and one of the internal drives in a RAID fails, the storage device can continue to work normally. (Some RAID setups can survive when two drives fail.) It will just indicate that one of internal drives has failed, offering you the chance to back up important data and replace the failed drive with another. After that the device itself will blend the replacement drive to become part of the RAID the way the drive it's replacing used to be, in a process called RAID rebuild. During this time, the storage device is still available to use.

In short, redundancy offers an immediate type of data protection. And since internal drive failures can happen at any time, it's important to have redundancy for storage devices that host critical information or that provide a service that musn't be interrupted.

Cons of redundancy: The first drawback of redundancy is cost; you have to spend money on multiple drives and this could be expensive. A RAID 1 setup, for example, basically requires double the spending on internal drives.

The second downside is redundancy doesn't provide protection against physical disaster, such as fire or flood, or the failure of the storage device itself. Redundancy also doesn't offer versioning, in which data is saved in different versions (see discussion of backup below).

And lastly the RAID rebuild time can be a very long process that could take days depending on the amount of existing data stored on the storage device. During the rebuild time, the RAID is generally vulnerable, and if a second drive fails before the process is finished, the entire array will crash and you lose all of the data. In fact, during a RAID rebuild time, a RAID storage system is more vulnerable than a single-volume storage device, since rebuilding an array puts a lot of stress on all drives involved, especially when the array still has to provide data to users.

Note: In addition to standard RAID (such as RAID 1 or RAID 5), there are also proprietary RAID setups that, apart from offering redundancy, also permit scaling up data. A typical example of this is the HybridRAID provided as an option in Synology NAS servers. HybridRAID automatically configures the type of redundancy based on the number of internal drives being used. On top of that you can also replace existing drives, one at a time, with drives of larger capacities to increase the total capacity without having to rebuild the RAID from scratch, or even turn the storage device off.

Last thoughts on redundancy: No matter what type of redundancy RAID you use, remember that it's just like insurance, something you need to have just in case, and hope that you will never have to resort to. The option to hot-replace a drive should be used only when absolutely necessary and not viewed as a "fun" or "cool" feature. The more often you use this feature, the more likely you are to lose all of your data stored in the storage device. For this reason, when you get a RAID-capable storage device, it's best to get one that offers lots of storage space to avoid having to replace its internal drives to increase the storage space.

Finally, let me say this one more time: redundancy is not backup. And you should never put all of your data on a single storage device, even one that offers redundancy.

Remember: A redundancy RAID's option to hot-replace a drive should be used only when absolutely necessary and not just for fun. Due to the stress of the RAID rebuild process, the more often you use this feature, the more likely it becomes that you could lose all of the data stored in the storage device. You should never put all of your data on a single storage device, even one that offers redundancy.

Backup

Home users might not need redundancy but they definitely need backup, which basically means keeping separate copies of data in multiple places so that if something happens to one place you can turn to another. The more copies of data you have, the safer it is.

In the same grocery shopping analogy, backup is like getting two (or more) separate bags of exactly the same groceries. In this case, if the eggs stored in one bag got broken, or even in the rare case that you dropped one of the bags and it got run over by a car, you would still be able to make breakfast the next morning thanks to the content of the other bag.

Backing up is much easier and might happen more often than you might think. For example, e-mailing a Word document to someone (or to yourself) is in itself a form of backup, because now there are at least two copies of the file, one on your computer and one on the recipient's. If you use a Web-based e-mail service, such as Gmail, a copy is stored on one of Google's servers, too. This goes for photos and other types of lightweight (in terms of storage size) data as well.

Obviously, you can't use e-mailing as the main backup method; that'd take too long and you would run out of creative energy very fast. Ideally, you should use more robust approaches. Following are the most popular ways to back up your data, and who they are good for.

Online backup (also known as cloud backup)

An online backup service allows you to store your data at an off-site location by uploading it via the Internet to a remote computer or computers. Generally, you don't know where the computer that hosts your backup is. In reality, your data is likely hosted by multiple servers in multiple data centers around the world. There are many online backup services, such as Dropbox, Google Drive, and SkyDrive, and all of them automatically sync local content with the remote server in real time, or based on a schedule that you set. Most if not all of them offer about 5GB of online storage for free and you can purchase more if need be. Google also offers Google Docs, a Web-based alternative to Microsoft Office that hosts the documents in the cloud (meaning on Google servers) at all times.

1 comment:

  1. Cloud backup is now the smartest way of backing up important files you have in your computer or mobile phone. You just have to synchronize your computer and phone to your chosen backup system and it’ll automatically save all your files. On some online backup system you have to do the backing up manually, but it would be better if you choose the automatic backup so it won’t take up too much of your time.
    -Williams Data Management

    ReplyDelete

RPM Tech Widget

Search Box

Blog Archive