RAID Level 0
Common Name(s): RAID 0. (Note that the term "RAID 0" is sometimes used to mean not only the conventional striping technique described here but also other "non-redundant" ways of setting up disk arrays. Sometimes it is (probably incorrectly) used just to describe a collection of disks that doesn't use redundancy.)
Technique(s) Used: Striping (without parity)
Description: The simplest RAID level, RAID 0 should really be called "AID", since it involves no redundancy. Files are broken into stripes of a size dictated by the user-defined stripe size of the array, and stripes are sent to each disk in the array. Giving up redundancy allows this RAID level the best overall performance characteristics of the single RAID levels, especially for its cost. For this reason, it is becoming increasingly popular by performance-seekers, especially in the lower end of the marketplace.
This illustration shows how files of different sizes are distributed between the
drives on a four-disk, 16 kiB stripe size RAID 0 array. The red file is 4 kiB in
size; the blue is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB.
They are shown drawn to scale to illustrate how much space they take
up in relative terms in the array--one vertical pixel represents 1 kiB.
(To see the impact that increasing or decreasing stripe size has on the
way the data is stored in the array, see the 4 kiB and 64 kiB stripe size
versions of this illustration on the page discussing stripe size issues.)
Controller Requirements: Supported by all hardware controllers, both SCSI and IDE/ATA, and also most software RAID solutions.
Hard Disk Requirements: Minimum of two hard disks (some may support one drive, the point of which escapes me); maximum set by controller. Any type may be used, but they should be of identical type and size for best performance and to eliminate "waste".
Array Capacity: (Size of Smallest Drive * Number of Drives).
Storage Efficiency: 100% if identical drives are used.
Fault Tolerance: None. Failure of any drive results in loss of all data, short of specialized data recovery.
Availability: Lowest of any RAID level. Lack of fault tolerance means no rapid recovery from failures. Failure of any drive results in array being lost and immediate downtime until array can be rebuilt and data restored from backup.
Degradation and Rebuilding: Not applicable.
Random Read Performance: Very good; better if using larger stripe sizes if the controller supports independent reads to different disks in the array.
Random Write Performance: Very good; again, best if using a larger stripe size and a controller supporting independent writes.
Sequential Read Performance: Very good to excellent.
Sequential Write Performance: Very good.
Cost: Lowest of all RAID levels.
Special Considerations: Using a RAID 0 array without backing up any changes made to its data at least daily is a loud statement that that data is not important to you.
Recommended Uses: Non-critical data (or data that changes infrequently and is backed up regularly) requiring high speed, particularly write speed, and low cost of implementation. Audio and video streaming and editing; web servers; graphic design; high-end gaming or hobbyist systems; temporary or "scratch" disks on larger machines.
RAID Level 1
Common Name(s): RAID 1; RAID 1 with Duplexing.
Technique(s) Used: Mirroring or Duplexing
Description: RAID 1 is usually implemented as mirroring; a drive has its data duplicated on two different drives using either a hardware RAID controller or software (generally via the operating system). If either drive fails, the other continues to function as a single drive until the failed drive is replaced. Conceptually simple, RAID 1 is popular for those who require fault tolerance and don't need top-notch read performance. A variant of RAID 1 is duplexing, which duplicates the controller card as well as the drive, providing tolerance against failures of either a drive or a controller. It is much less commonly seen than straight mirroring.
Illustration of a pair of mirrored hard disks, showing how the
files are duplicated on both drives. (The files are the same as
those in the RAID 0 illustration, except that to save space I have
reduced the scale here so one vertical pixel represents 2 kiB.)
Controller Requirements: Supported by all hardware controllers, both SCSI and IDE/ATA, and also most software RAID solutions.
Hard Disk Requirements: Exactly two hard disks. Any type may be used but they should ideally be identical.
Array Capacity: Size of Smaller Drive.
Storage Efficiency: 50% if drives of the same size are used, otherwise (Size of Smaller Drive / (Size of Smaller Drive + Size of Larger Drive) )
Fault Tolerance: Very good; duplexing even better.
Availability: Very good. Most RAID controllers, even low-end ones, will support hot sparing and automatic rebuilding of RAID 1 arrays.
Degradation and Rebuilding: Slight degradation of read performance; write performance will actually improve. Rebuilding is relatively fast.
Random Read Performance: Good. Better than a single drive but worse than many other RAID levels.
Random Write Performance: Good. Worse than a single drive, but better than many other RAID levels. :^)
Sequential Read Performance: Fair; about the same as a single drive.
Sequential Write Performance: Good; again, better than many other RAID levels.
Cost: Relatively high due to redundant drives; lowest storage efficiency of the single RAID levels. Duplexing is still more expensive due to redundant controllers. On the other hand, no expensive controller is required, and large consumer-grade drives are rather inexpensive these days, making RAID 1 a viable choice for an individual system.
Special Considerations: RAID 1 arrays are limited to the size of the drives used in the array. Multiple RAID 1 arrays can be set up if additional storage is required, but RAID 1+0 begins to look more attractive in that circumstance. Performance may be reduced if implemented using software instead of a hardware controller; duplexing may require software RAID and thus may show lower performance than mirroring.
Recommended Uses: Applications requiring high fault tolerance at a low cost, without heavy emphasis on large amounts of storage capacity or top performance. Especially useful in situations where the perception is that having a duplicated set of data is more secure than using parity. For this reason, RAID 1 is popular for accounting and other financial data. It is also commonly used for small database systems, enterprise servers, and for individual users requiring fault tolerance with a minimum of hassle and cost (since redundancy using parity generally requires more expensive hardware.)
RAID Level 2
Common Name(s): RAID 2.
Technique(s) Used: Bit-level striping with Hamming code ECC.
Description: Level 2 is the "black sheep" of the RAID family, because it is the only RAID level that does not use one or more of the "standard" techniques of mirroring, striping and/or parity. RAID 2 uses something similar to striping with parity, but not the same as what is used by RAID levels 3 to 7. It is implemented by splitting data at the bit level and spreading it over a number of data disks and a number of redundancy disks. The redundant bits are calculated using Hamming codes, a form of error correcting code (ECC). Each time something is to be written to the array these codes are calculated and written along side the data to dedicated ECC disks; when the data is read back these ECC codes are read as well to confirm that no errors have occurred since the data was written. If a single-bit error occurs, it can be corrected "on the fly". If this sounds similar to the way that ECC is used within hard disks today, that's for a good reason: it's pretty much exactly the same. It's also the same concept used for ECC protection of system memory.
Level 2 is the only RAID level of the ones defined by the original Berkeley document that is not used today, for a variety of reasons. It is expensive and often requires many drives--see below for some surprisingly large numbers. The controller required was complex, specialized and expensive. The performance of RAID 2 is also rather substandard in transactional environments due to the bit-level striping. But most of all, level 2 was obviated by the use of ECC within a hard disk; essentially, much of what RAID 2 provides you now get for "free" within each hard disk, with other RAID levels providing protection above and beyond ECC.
Due to its cost and complexity, level 2 never really "caught on".
RAID Level 3
Common Name(s): RAID 3. (Watch out for some companies that say their products implement RAID 3 when they are really RAID 4.)
Technique(s) Used: Byte-level striping with dedicated parity.
Description: Under RAID 3, data is striped across multiple disks at a byte level; the exact number of bytes sent in each stripe varies but is typically under 1024. The parity information is sent to a dedicated parity disk, but the failure of any disk in the array can be tolerated (i.e., the dedicated parity disk doesn't represent a single point of failure in the array.) The dedicated parity disk does generally serve as a performance bottleneck, especially for random writes, because it must be accessed any time anything is sent to the array; this is contrasted to distributed-parity levels such as RAID 5 which improve write performance by using distributed parity (though they still suffer from large overheads on writes, as described here). RAID 3 differs from RAID 4 only in the size of the stripes sent to the various disks.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk, byte-striped RAID 3 array. As with
the RAID 0 illustration, the red file is 4 kiB in size; the blue is 20 kiB;
the green is 100 kiB; and the magenta is 500 kiB, with each vertical
pixel representing 1 kiB of space. Notice that the files are evenly
spread between three drives, with the fourth containing parity
information (shown in dark gray). Since the blocks are so tiny in
RAID 3, the individual boundaries between stripes can't be seen.
You may want to compare this illustration to the one for RAID 4.
Controller Requirements: Generally requires a medium-to-high-end hardware RAID card.
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller. Should be of identical size and type.
Array Capacity: (Size of Smallest Drive) * (Number of Drives - 1)
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of Drives).
Fault Tolerance: Good. Can tolerate loss of one drive.
Availability: Very good. Hot sparing and automatic rebuild are usually supported by controllers that implement RAID 3.
Degradation and Rebuilding: Relatively little degrading of performance if a drive fails. Rebuilds can take many hours.
Random Read Performance: Good, but not great, due to byte-level striping.
Random Write Performance: Poor, due to byte-level striping, parity calculation overhead, and the bottleneck of the dedicated parity drive.
Sequential Read Performance: Very good.
Sequential Write Performance: Fair to good.
Cost: Moderate. A hardware controller is usually required, as well as at least three drives.
Special Considerations: Not as popular as many of the other commonly-implemented RAID levels. For transactional environments, RAID 5 is usually a better choice.
Recommended Uses: applications working with large files that require high transfer performance with redundancy, especially serving or editing large files: multimedia, publishing and so on. RAID 3 is often used for the same sorts of applications that would typically see the use of RAID 0, where the lack of fault tolerance of RAID 0 makes it unacceptable.
RAID Level 4
Common Name(s): RAID 4 (sometimes called RAID 3 by the confused).
Technique(s) Used: Block-level striping with dedicated parity.
Description: RAID 4 improves performance by striping data across many disks in blocks, and provides fault tolerance through a dedicated parity disk. This makes it in some ways the "middle sibling" in a family of close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it uses blocks instead of bytes for striping, and like RAID 5 except that it uses dedicated parity instead of distributed parity. Going from byte to block striping improves random access performance compared to RAID 3, but the dedicated parity disk remains a bottleneck, especially for random write performance. Fault tolerance, format efficiency and many other attributes are the same as for RAID 3 and RAID 5.
This illustration shows how files of different sizes are distributed between
the drives on a four-disk RAID 4 array using a 16 kiB stripe size. As with the
RAID 0 illustration, the red file is 4 kiB in size; the blue is 20 kiB; the green
is 100 kiB; and the magenta is 500 kiB, with each vertical pixel representing
1 kiB of space. Notice that as with RAID 3, the files are evenly spread between
three drives, with the fourth containing parity information (shown in gray).
You may want to contrast this illustration to the one for RAID 3 (which is very
similar except that the blocks are so tiny you can't see them) and the one
for RAID 5 (which distributes the parity blocks across all four drives.)
Controller Requirements: Generally requires a medium-to-high-end hardware
RAID card.
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller. Should be of identical size and type.
Array Capacity: (Size of Smallest Drive) * (Number of Drives - 1).
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of Drives).
Fault Tolerance: Good. Can tolerate loss of one drive.
Availability: Very good. Hot sparing and automatic rebuild are usually supported..
Degradation and Rebuilding: Moderate degrading if a drive fails; potentially lengthy rebuilds.
Random Read Performance: Very good.
Random Write Performance: Poor to fair, due to parity calculation overhead and the bottleneck of the dedicated parity drive.
Sequential Read Performance: Good to very good.
Sequential Write Performance: Fair to good.
Cost: Moderate. A hardware controller is usually required, as well as at least three drives.
Special Considerations: Performance will depend to some extent upon the stripe size chosen.
Recommended Uses: Jack of all trades and master of none, RAID 4 is not as commonly used as RAID 3 and RAID 5, because it is in some ways a "compromise" between them that doesn't have a target market as well defined as either of those two levels. It is sometimes used by applications commonly seen using RAID 3 or RAID 5, running the gamut from databases and enterprise planning systems to serving large multimedia files.
RAID Level 5
Common Name(s): RAID 5.
Technique(s) Used: Block-level striping with distributed parity.
Description: One of the most popular RAID levels, RAID 5 stripes both data and parity information across three or more drives. It is similar to RAID 4 except that it exchanges the dedicated parity drive for a distributed parity algorithm, writing data and parity blocks across all the drives in the array. This removes the "bottleneck" that the dedicated parity drive represents, improving write performance slightly and allowing somewhat better parallelism in a multiple-transaction environment, though the overhead necessary in dealing with the parity continues to bog down writes. Fault tolerance is maintained by ensuring that the parity information for any given block of data is placed on a drive separate from those used to store the data itself. The performance of a RAID 5 array can be "adjusted" by trying different stripe sizes until one is found that is well-matched to the application being used.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk RAID 5 array using a 16 kiB stripe
size. As with the RAID 0 illustration, the red file is 4 kiB in size; the blue
is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB, with each
vertical pixel representing 1 kiB of space. Contrast this diagram to the
one for RAID 4, which is identical except that the data is only on three
drives and the parity (shown in gray) is exclusively on the fourth.drive.
Controller Requirements: Requires a moderately high-end card for hardware RAID; supported by some operating systems for software RAID, but at a substantial performance penalty.
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller. Should be of identical size and type.
Array Capacity: (Size of Smallest Drive) * (Number of Drives - 1).
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of Drives).
Fault Tolerance: Good. Can tolerate loss of one drive.
Availability: Good to very good. Hot sparing and automatic rebuild are usually featured on hardware RAID controllers supporting RAID 5 (software RAID 5 will require down-time).
Degradation and Rebuilding: Due to distributed parity, degradation can be substantial after a failure and during rebuilding.
Random Read Performance: Very good to excellent; generally better for larger stripe sizes. Can be better than RAID 0 since the data is distributed over one additional drive, and the parity information is not required during normal reads.
Random Write Performance: Only fair, due to parity overhead; this is improved over RAID 3 and RAID 4 due to eliminating the dedicated parity drive, but the overhead is still substantial.
Sequential Read Performance: Good to very good; generally better for smaller stripe sizes.
Sequential Write Performance: Fair to good.
Cost: Moderate, but often less than that of RAID 3 or RAID 4 due to its greater popularity, and especially if software RAID is used.
Special Considerations: Due to the amount of parity calculating required, software RAID 5 can seriously slow down a system. Performance will depend to some extent upon the stripe size chosen.
Recommended Uses: RAID 5 is seen by many as the ideal combination of good performance, good fault tolerance and high capacity and storage efficiency. It is best suited for transaction processing and is often used for "general purpose" service, as well as for relational database applications, enterprise resource planning and other business systems. For write-intensive applications, RAID 1 or RAID 1+0 are probably better choices (albeit higher in terms of hardware cost), as the performance of RAID 5 will begin to substantially decrease in a write-heavy environment.
RAID Level 6
Common Name(s): RAID 6. Some companies use the term "RAID 6" to refer to proprietary extensions of RAID 5; these are not discussed here.
Technique(s) Used: Block-level striping with dual distributed parity.
Description: RAID 6 can be thought of as "RAID 5, but more". It stripes blocks of data and parity across an array of drives like RAID 5, except that it calculates two sets of parity information for each parcel of data. The goal of this duplication is solely to improve fault tolerance; RAID 6 can handle the failure of any two drives in the array while other single RAID levels can handle at most one fault. Performance-wise, RAID 6 is generally slightly worse than RAID 5 in terms of writes due to the added overhead of more parity calculations, but may be slightly faster in random reads due to spreading of data over one more disk. As with RAID levels 4 and 5, performance can be adjusted by experimenting with different stripe sizes.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk RAID 6 array using a 16 kiB stripe
size. As with the RAID 0 illustration, the red file is 4 kiB in size; the blue
is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB, with each
vertical pixel representing 1 kiB of space. This diagram is the same as the
RAID 5 one, except that you'll notice that there is now twice as much
gray parity information, and as a result, more space taken up on the
four drives to contain the same data than the other levels that use striping.
Controller Requirements: Requires a specialized (usually meaning expensive) hardware controller.
Hard Disk Requirements: Minimum of four hard disks; maximum set by controller. Should be of identical size and type.
Array Capacity: (Size of Smallest Drive) * (Number of Drives - 2).
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 2) / Number of Drives).
Fault Tolerance: Very good to excellent. Can tolerate the simultaneous loss of any two drives in the array.
Availability: Excellent.
Degradation and Rebuilding: Due to the complexity of dual distributed parity, degradation can be substantial after a failure and during rebuilding. Dual redundancy may allow rebuilding to be delayed to avoid performance hit.
Random Read Performance: Very good to excellent; generally better for larger stripe sizes.
Random Write Performance: Poor, due to dual parity overhead and complexity.
Sequential Read Performance: Good to very good; generally better for smaller stripe sizes.
Sequential Write Performance: Fair.
Cost: High.
Special Considerations: Requires special implementation; not widely available.
Recommended Uses: In theory, RAID 6 is ideally suited to the same sorts of applications as RAID 5, but in situations where additional fault tolerance is required. In practice, RAID 6 has never really caught on because few companies are willing to pay for the extra cost to insure against a relatively rare event--it's unusual for two drives to fail simultaneously (unless something happens that takes out the entire array, in which case RAID 6 won't help anyway). On the lower end of the RAID 5 market, the rise of hot swapping and automatic rebuild features for RAID 5 have made RAID 6 even less desirable, since with these advanced features a RAID 5 array can recover from a single drive failure in a matter of hours (where without them, RAID 5 would require downtime for rebuilding, giving RAID 6 a substantial advantage.) On the higher end of the RAID 5 market, RAID 6 usually loses out to multiple RAID solutions such as RAID 10 that provide some degree of multiple-drive fault tolerance while offering improved performance as well.
RAID Level 7
Common Name(s): RAID 7.
Technique(s) Used: Asynchronous, cached striping with dedicated parity.
Description: Unlike the other RAID levels, RAID 7 isn't an open industry standard; it is really a trademarked marketing term of Storage Computer Corporation, used to describe their proprietary RAID design. (I debated giving it a page alongside the other RAID levels, but since it is used in the market, it deserves to be explained; that said, information about it appears to be limited.) RAID 7 is based on concepts used in RAID levels 3 and 4, but greatly enhanced to address some of the limitations of those levels. Of particular note is the inclusion of a great deal of cache arranged into multiple levels, and a specialized real-time processor for managing the array asynchronously. This hardware support--especially the cache--allow the array to handle many simultaneous operations, greatly improving performance of all sorts while maintaining fault tolerance. In particular, RAID 7 offers much improved random read and write performance over RAID 3 or RAID 4 because the dependence on the dedicated parity disk is greatly reduced through the added hardware. The increased performance of RAID 7 of course comes at a cost. This is an expensive solution, made and supported by only one company.
Controller Requirements: Requires a specialized, expensive, proprietary controller.
Hard Disk Requirements: Depends on implementation.
Array Capacity: Depends on implementation.
Storage Efficiency: Depends on implementation.
Fault Tolerance: Very good.
Availability: Excellent, due to use of multiple hot spares.
Degradation and Rebuilding: Better than many RAID levels due to hardware support for parity calculation operations and multiple cache levels.
Random Read Performance: Very good to excellent. The extra cache can often supply the results of the read without needing to access the array drives.
Random Write Performance: Very good; substantially better than other single RAID levels doing striping with parity.
Sequential Read Performance: Very good to excellent.
Sequential Write Performance: Very good.
Cost: Very high.
Special Considerations: RAID 7 is a proprietary product of a single company; if it is of interest then you should contact Storage Computer Corporation for more details on the specifics of implementing it. All the caching creates potential vulnerabilities in the event of power failure, making the use of one or more UPS units mandatory.
Recommended Uses: Specialized high-end applications requiring absolutely top performance and willing to live with the limitations of a proprietary, expensive solution. For most users, a multiple RAID level solution like RAID 1+0 will probably yield comparable performance improvements over single RAID levels, at lower cost.
Multiple (Nested) RAID Levels
The single RAID levels have distinct advantages and disadvantages, which is why most of them are used in various parts of the market to address different application requirements. It wasn't long after RAID began to be implemented that engineers looked at these RAID levels and began to wonder if it might be possible to get some of the advantages of more than one RAID level by designing arrays that use a combination of techniques. These RAID levels are called variously multiple, nested, or multi-RAID levels. They are also sometimes called two-dimensional, in reference to the two-dimensional schematics that are used to represent the application of two RAID levels to a set of disks, as you shall see.
Multiple RAID levels are most commonly used to improve performance, and they do this well. Nested RAID levels typically provide better performance characteristics than either of the single RAID levels that comprise them. The most commonly combined level is RAID 0, which is often mixed with redundant RAID levels such as 1, 3 or 5 to provide fault tolerance while exploiting the performance advantages of RAID 0. There is never a "free lunch", and so with multiple RAID levels what you pay is a cost in complexity: many drives are required, management and maintenance are more involved, and for some implementations a high-end RAID controller is required.
Not all combinations of RAID levels exist (which is good, because I'd get really bored of describing them all! :^) ) Typically, the most popular multiple RAID levels are those that combine single RAID levels that complement each other with different strengths and weaknesses. Making a multiple RAID array marrying RAID 4 to RAID 5 wouldn't be the best idea, since they are so similar to begin with.
In this section I take a look at some of the more common multiple RAID levels. Note that some of the multiple RAID levels discussed here are frequently used, but others are rarely implemented. In particular, for completeness I describe both the "X+Y" and "Y+X" configurations of each multiple level, when in some cases only one or the other is commonly made into product. For example, I know that RAID 50 (5+0) is an option in commercial RAID controllers, but I am not sure if anyone makes a RAID 05 solution. There may also be other combinations of RAID levels that I am not aware of.