More Reliability with RAID

A widely used technology in servers, RAID combines multiple hard drives into a single logical drive. It is used to increase reliability and uptime in servers and give you faster data access. Here, we talk about concepts involved in RAID and different ways of implementing it

An acronym for Redundant Array of Independent (or Inexpensive) Disks, RAID is a widely used, though old, technology that combines multiple hard drives into a single logical hard drive. The benefits of this range from increased reliability to faster data access, depending on the way RAID is implemented. RAID is mostly used in servers to increase their reliability and uptime.

Let’s look at some concepts related to RAID that we need to go through. A physical drive array is a collection of physical hard drives. Physical arrays can be divided or grouped together to form one or more logical arrays. These logical arrays can be divided into logical drives that the OS sees. The OS is oblivious to the presence of any physical or logical arrays, courtesy the RAID controller, which manages how the data is stored and accessed across them. A RAID controller can be implemented using hardware or software. The former is more suitable for higher RAID levels that require more CPU power. 

There are two kinds of hardware RAID controllers: internal and external. Internal RAID controllers are like any ordinary card that fits onto a slot like a PCI slot. Some motherboards even have built-in RAID controllers. Depending on the RAID controller and levels available, a certain amount of cache memory is also present. External RAID controllers come in their own case with the hard drives. High-end servers often contain a separate enclosure for the RAID controller and hard drives. An external RAID controller is usually more complex and has more memory compared to an internal one, because of the large number of hard drives and complex RAID levels it needs to work with. It generally uses a SCSI interface, which makes it easier to have large numbers of (hot swappable) hard drives. Cost wise, internal controllers are cheaper than external controllers. However, software RAID controllers use processor time, so the more complex RAID levels will slow down a system considerably. Hardware RAID is a better option in this case. 

Techniques used in RAID
Central to RAID are the concepts of Mirroring, Parity and Striping. 

Mirroring offers good fault tolerance and reliability because here two copies of the same data are stored on separate hard disks or disk arraysMirroring
Mirroring involves having two copies of the same data on separate hard drives or disk arrays. Since the data has to be written simultaneously to both the disks, it takes a toll on performance. However, performance increases during disk reads, as information is read from both the drives, thereby reducing wait states. Mirroring can also be performed on more than two drives, so the more number of drives, better the read performance. Mirroring offers maxi- mum security of data and ease of recovery, as two distinct copies of data are maintained. But in real life, putting mirroring into practice can be an expensive affair, as twice as much storage space is needed. Parity is a cheaper option.

Parity requires less storage space than mirroringParity
Suppose you have N number of data elements, you use these N elements to create a parity element and end up with N+1 elements. If one of these N+1 elements is lost, it can be recovered as long as at least N elements remain. The RAID controller creates this extra parity element by (generally) using the XOR operation. The method reduces the amount of additional storage space needed as compared to Mirroring, but this is not as fault tolerant. Parity algorithms used in this method usually need large amounts of computing power, as the parity data has to be computed every time a read/write takes place. This means that a hardware RAID controller is required, as a software controller will tie down the CPU.

Striping
Striping reduces time taken to access your data by breaking your (large) file into multiple pieces and storing each piece on a separate hard disk, so that the whole file can be accessed at the same timeStriping is aimed at increasing performance. Suppose you have a large file on a single hard drive. To read the file, you’ll have to wait till it is read from the beginning to the end. Now, if you break this file into multiple pieces and store them on separate hard disks, all of which can be simultaneously accessed, the total time taken is more or less equal to the time taken to read a single part (similar to downloading a file in segments). If you increase the number of hard drives, the file will be transferred in 1/Nth the time it takes to transfer from one hard drive (where N is the number of drives). Clearly, the more the number of hard drives, the greater the increase in performance. There are two levels of striping that can be used: byte level and block level. In byte-level striping, each byte of data is written onto a different hard disk. That is, suppose you have four hard disks, the first byte is written onto the first hard disk, the second on the second and so on, and the fifth again on the first hard disk. Block-level striping involves breaking up data into blocks of a given size. These are then distributed the same way as in byte-level striping. The size of these blocks is called the stripe size. 

RAID levels

Since individual levels may not be suitable for everyone’s needs, a combination of RAID levels can also be used. The most popular combinations are RAID 0+1 and 1+0. These two are often thought to be one, though there exists a subtle difference. RAID 0+1 is striping and then mirroring. Let’s say you have eight drives. You split them into two arrays of four drives each and apply RAID 0 to them individually. Then you apply RAID 1 and have one array act as a mirror of the other. Now, if one of the disks in an array fail, the entire array goes down, though the other one is there (now without any fault tolerance).

RAID 1+0 applies mirroring first and striping later. That is, the eight drives are divided into four sets of two drives each. Each set now has duplicate information. Striping is then applied across these mirrored sets. This technique has better fault tolerance, as it will work fine as long as at least one mirrored set is active. Theoretically, you can have the system working fine, even with half your drives failing. These two techniques are very popular as they are easy to implement and combine the benefits of RAID 0 and 1 levels.

Kunal Dua