现在的位置: 首页 > 综合 > 正文

Introduction to RAID,part1

2013年08月26日 ⁄ 综合 ⁄ 共 6212字 ⁄ 字号 评论关闭

Introduction

 

One of the most common techniques to improve either data
reliability
or data performance
(or both) is
called RAID (Redundant Array of Inexpensive Disks). The concept was developed in
1977 by David
Patterson, Garth Gibson, and Randy Katz

as a way to use several inexpensive
disks to create a single disk from the perspective of the OS while also
achieving enhanced reliability or performance or both.

 

Before anyone erupts and says that RAID does not

stand for “Redundant Array of Inexpensive Disks”, let me start by stating that
was the original definition. Over time, the definition has become more commonly
known as “Redundant Array of Independent Disks” perhaps so the word
“inexpensive” isn’t associated with RAID controllers or disks. Personally I use
the original definition but regardless, either definition means that the disks
are independent of one another. Feel free to use either definition since it
won’t change the content of this article. Now, back to our discussion of
RAID.

When the original paper was issued, five different RAID levels or
configurations were defined. Since that time other RAID configurations have been
developed including what are referred to as “hybrid” RAID configurations.

The RAID Advisory Board (RAB) was created to help advise the IT community on
the defined RAID configurations and to help the creation of new RAID
configuration definitions. While it is not an organization that creates legally
binding standards and labeling, it does help in clarifying what the RAID levels
mean and what is commonly accepted in the community. There was a time where
companies were creating very strange RAID configurations and using strange
labels, causing great confusion. The RAB has helped to reduce the proliferation
of “weird” RAID configurations and labeling and standardize the meaning of
various RAID levels.

In this article I want to review the seven most common standard RAID
configurations. But I will also very briefly touch on some of the hybrid RAID
configurations. For each RAID level, I will describe how it works as well as the
configuration’s particular pros and cons. However, before starting I want to
clarify one thing: RAID is not
meant as a replacement for
backups. RAID can help improve data reliability which really means data
availability (improving uptime for data) and/or data performance (I/O
performance). It is not intended as a replacement for backups or keeping
multiple independent copies of your data.

 

 

 

RAID Configurations

 

 

As mentioned above, there were five original RAID levels or configurations
that were defined but others have been developed since that original article. In
RAID terminology each distinct RAID configuration is given a number which can
also be called a RAID “level”. The core RAID configurations are listed as:
RAID-0, RAID-1, RAID-2, RAID-3, RAID-4, RAID-5, and RAID-6.

 

 

RAID-0

This RAID configuration is really focused on
performance since the blocks are basically striped across multiple disks. Figure
1 from wikipedia

(image by Cburnett) illustrates how the data is written to two disks.

325px-RAID_0.svg.png

Figure
1: RAID-0 layout (from Cburnett at wikipedia under the GFDL
license)


In this illustration, the first block of data, A0, is written to the fist
disk, the second block of data, A1, is written to the second disk, the third
block of data, A3, is written to the first disk, and so on. If the I/O is
happening fast enough data blocks can be written almost simultaneously (i.e. A0
and A1 are written at just about the same time). Since the data is broken up
into block sized units between the disks, it is commonly said that the data is
striped
across the disks. As you can see, striping data across the
disks means that the overall write performance of the disk set is very fast,
usually much faster than a single disk.

 

Reading from a RAID-0 group is also very fast. A read request comes in and
the RAID controller, which controls the placement of data, knows that it can
read A0 and A1 at the same time since they are on separate disks, basically
doubling the potential read performance relative to a single disk.

 

You can have as many disks as you want in a RAID-0 array (a group of disks in
a RAID-0 configuration). However, one of the downsides to RAID-0 is that there
is is no additional data redundancy provided by RAID-0 (it is all focused on
performance). No data parity is computed and stored meaning that if you lose a
disk in a RAID-0 array, you will lose access to all of the data in the array. If
you can bring the lost disk back into the array without losing any data on it,
then you can recover the RAID-0 array, but this is a fairly rare occurrence.

 

Consequently, we can see that RAID-0 is focused solely
on
performance with no additional data redundancy beyond the redundancy in a single
disk. This affects how RAID-0 is used. For example, it can be used in situations
where performance is paramount and you have a copy of your data elsewhere or the
data is not important. A classic usage case is for scratch space where data is
written while an application is running but is not needed once the application
is done and the final output is copied to a more resilient storage device. If a
scratch space disk is lost while the application is running, you can rebuild the
RAID-0 array with one fewer drives, and rerun the application.

 

The capacity and failure rate of a RAID-0 array is the fairly simple
to
compute. The capacity is computed as,

Capacity = n * min(disk sizes)

where n
is the number of disks in the array and min(disk
sizes)

is the minimum common capacity across the drives (this indicates
that you can use drives of different sizes). This equation also means that
RAID-0 is very capacity effective since it doesn’t waste any space for parity or
any other error correction. It uses all of the space for data focusing on
performance.

The failure rate is a little more involved but can also be estimated.

MTTFgroup
 = MTTFdisk
 / n

where MTTF
is the Mean Time To Failure and “group” refers to the
RAID-0 array and “disk” refers to a single disk. So as you add disks, you
greatly reduce the MTTF for the RAID-0 array. Having two disks decreases the
MTTF by half. Three disks reduces the MTTF by a factor of 3, and so on. So you
can tell why people are reluctant to use RAID-0 for file systems where data
availability and reliability is important. But, RAID-0 is the fastest RAID
configuration and has the best capacity utilization of any RAID configuration
discussed in this article.

 

Table 1 below is a quick summary of RAID-0 with a few highlights.

 

Table 1 - RAID-0 Highlights


Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-0

 

  • Performance (great read and write performance)
  • Great capacity utilization (the best of any standard RAID
    configurations)

 

  • No data redundancy
  • Poor MTTF

100% assuming the drives are the same size 2

 

 

RAID-1

RAID-1 is almost the exact opposite of RAID-0
because it uses multiple drives that are mirrors of one another. Typically two
drives are used in RAID-0 but three drive RAID-1 configurations are becoming
more common. RAID-1 takes an incoming block of data to one drive and creates a
mirror image (copy) of it on a second drive. So RAID-1 doesn’t compute any
parity of the block - it just copies the entire block to a second drive. Figure
2 from wikipedia

(image by Cburnett) illustrates how the data is written to two disks in
RAID-1.

325px-RAID_1.svg.png

Figure
2: RAID-1 layout (from Cburnett at wikipedia under the GFDL license)

抱歉!评论已关闭.