Sunday, March 20, 2011

Understand of RAID

RAID: A Comparison of Different Modes

RAID 0: Striping
Technically speaking, mode 0 doesn't adhere to the principles of a RAID, given the fact
that an important factor, data redundancy, does not exist. Hence RAID 0 offers no
advantages in terms of security - in fact, on the contrary. All of the data are evenly
distributed to all of the existing drives; this array is called a stripe set. This process can
best be described with the "zipper method." The benefits are clear: because the data
stream can be allocated to all the different drives, the data transfer rate is multiplied by
the number of drives. Here the upper limits are the maximum transfer rate per channel
Search Amazon.com for server(max. 100 MB/s for UltraATA/100), or the maximum bandwidth of the controller on the
PCI bus (266 MB/s at 66 MHz / 32-bit PCI). However, in reality this very drastic
performance boost comes at the price of higher fault vulnerability. Instead of one, now all
of the RAID drives must work error-free. If even one of those drive crashes, all of the
stored data will be lost.

RAID 1: Mirroring
Mode 1 is basically the complete opposite of RAID 0. The goal here is not to boost
performance, but to ensure data security. When reading or writing data, all drives of the
array are used simultaneously. Hence, data is written synchronously to two or more
drives, which is equivalent to a perfect backup copy - perfect because the data is always
100% up-to-date.

RAID 2: Striping
Striping is based on the same principle as RAID 0: the stripe set distributes the data to all
drives, though not in block form, but, rather, on a bit level. This is necessary because an
Error Correcting Code (ECC) is implemented in all transaction data. Additional hard
drives are necessary to store the resulting additional volume. If you wanted to guarantee
complete data security, you would have to deploy at least ten data disks and four ECC
disks. The next level would entail 32 data disks and seven ECC disks. This explains why
RAID 2 never caught on.
On top of that, performance is only mediocre as multiple access is not possible in bit
stripe sets. The higher the number of accesses, and the shorter they are, the more lethargic
RAID 2 gets.

RAID 3: Data Striping, Dedicated Parity
Level 3 incorporates prudent error correction. Data is allocated byte by byte to several
hard drives, while the parity data is stored in a separate drive. This is exactly the
disadvantage of RAID 3, as the parity drive has to be accessed with every access. So the
advantage of RAID, bundling the disk performance by distributing access, is partially
offset. RAID 3 needs a minimum of three drives.

This mode requires quite a complex controller, which is why RAID 3, similar to levels 4
and 5, never caught on in the mass market.

RAID 4: Data Striping, Dedicated Parity
The technology of RAID Level 4 is similar to that of level 3, except that the individual
stripes are not written in bytes, but in blocks. In theory, this should speed things up, but
the parity drive still remains the bottleneck.

RAID 5: Distributed Data, Distributed Parity
RAID Level 5 is generally considered the best compromise between data security and
performance. Not only the data, but also the parity information, is distributed to all the
existing drives. The resulting advantage is that RAID is really only a bit slower than
RAID 3. However, failure safety is limited, as only one hard drive can safely crash. At
least three hard drives are required in each case.

RAID 6: Distributed Data, Distributed Parity
With RAID 6, you're really only talking about RAID 5 - except that twice the amount of
parity information is stored. Though this cuts down on performance a bit, it allows up to
two hard drives to crash. It does require, however, a minimum of five drives.
RAID: Who with Whom?
So far, we've talked about several hard drives, but haven't gone into detail. You should
principally set up all RAID modes with similar hard drives, as only then will you get
maximum performance.

However, you can also combine different drives, with the smaller or slower drive being
the determinant drive for the entire array. For example, one 30 GB and two 40 GB hard
drives in RAID 0 will give you a total capacity of 90 GB, which is three times the
capacity of the smallest hard drive.
The same applies to the combination of an old 40 GB hard drive with 5,400 rpm with a
new model with 7,200 rpm. If you were to use two of the slower drives, the performance
level would be the same. Replacing the older disk with a second and faster one would
increase the performance.

If you want to use several different hard drives together, you have the option of creating
what is known as a span array. Another term is JBOD (just a bunch of drives). Here, the
hard drives are simply hooked up in series, which results in a useable total capacity, but
without, however, an increase in performance or data security.
Another unsettled point is the question of which drives should be hooked up to which
IDE channel. If possible, every drive should be connected to its own channel as a master.

On a dual channel controller card, you'd be able to hook up a mere two hard drives.
Though using four hard drives (master and slave per channel) drastically increases
performance, you'd get even more out of a four channel controller with four master drives.
Another important fact is that only a few of the currently available IDE RAID controllers
support the ATAPI protocol. CD-ROM or DVD-ROM drives will not necessarily work
with a RAID controller (don't even bother trying in RAID mode).
Disk Drive Crash! Now What?!
If you had opted for a RAID level before a disk crash, assigning highest priority to
security, then you're in good shape. If you're using RAID Level 1, 3, 4, 5 or 6, the crash
of a single hard drive will not affect your existing data. Depending on the controller
you're using, the procedure will vary.
Most RAID controllers today notify the user of a crash with a beep and by e-mail (of
course, this does not apply to the RAID levels with the system partition, which do not
offer any crash protection).

Older or very simple RAID controllers require the computer to be shut down and the
defective drive to be replaced. After restarting the system, the user has to go into the
BIOS of the RAID controller to initiate the rebuild process.
Practically all of the RAID controllers on the market today - including the simple models
- now master the exchange of defective hard drives without a need to shut down the
system, a process called hot swapping. Rebuild takes place automatically, too - you really
don't have to do anything yourself anymore.
A clever feature is the hot-spare function. Many RAID controllers support an additional
drive, which is labeled as a hot spare. If one of the array drives refuse to work, it will be
removed from operation and the hot-spare drive will be connected automatically.
In the event that you were using RAID 0 or JBOD and have lost important data, you'll
probably never want to use this mode again. Though there's almost always a way to
restore your data, it is horrendously expensive. Companies dedicated to data recovery
(such as, for example, OnTrack) are able to take hard drives apart and restore most of the
data, even after a head crash, fire damage or other catastrophic events. But be warned:
restoring RAID arrays is disproportionately harder than the effort that goes into restoring
one single hard drive, which is enormous enough.

Conclusion: Only a Backup is Truly Safe!

Chart-Topping Capacity for a Song
It's not just aspects like performance and data security that should be considered; in many
cases, enormous amounts of data must be managed and stored - the right approach to
tackling this problem is a RAID array with large hard drives. Because expensive SCSI
RAID adapters and SCSI hard drives were the only available options just a few years ago,
high-capacity arrays were feasible only for very few individual users or companies.
Today, the prices for IDE hard drives with a capacity of 100 GB have dropped to a few
hundred dollars - a downright bargain. It takes only $500 to set up arrays with a capacity
of 300 to 400 gigabytes. The introduction of new hard drives with up to 200 GB will
make RAID arrays with up to 1 terabyte (5x 200GB) affordable for the first time ever.
Muddle Makes Trouble

No matter which RAID array you're using - for the operating system, it's ultimately a
drive just like any other, and therefore it needs to be maintained accordingly.
You should defragment it at least a few times a year; for more heavily frequented drives,
once a month. Ideally, you'll enter the defragmentation program in your task planner and
have this pesky operation performed during acceptable times.
If one of your drives ever begins to snarl (louder operating noise, reduced performance or
other conspicuities), don't hesitate. You should back up all of your important data,
especially if you're using RAID 0. If the operating system is on the RAID array as well,
you might want to try and mirror the drive in question on another computer with an
identical hard drive. Otherwise, you'll have no choice but to reinstall everything.

RAID Controllers: A Large Selection
When purchasing a RAID controller, you need to differentiate between two types. Simple
consumer models can be found everywhere, and they're often also integrated on
motherboards. They offer two channels and support RAID modes 0, 1 and 10 (striping
and mirroring), but most of the time they can also be used as simple ATA controllers.
More sophisticated models have their own RISC processors (e.g., i960) and can
additionally be outfitted with extra cache. Thanks to the processor, you can also run more
lavish RAID modes like Level 3 or 5 - assuming that you have enough hard drives.

Adaptec
Adaptec has the reputation of manufacturing high-quality SCSI hardware. But it has been
offering some products in the IDE sector for quite some time, too.

Two channel UltraATA/100: ATA RAID 1200A
RAID 0, 1, 10, JBOD, hot swap, e-mail notification

Four channel UltraATA/100: ATA RAID 2400A
RAID 0, 1, 3, 10, JBOD, hot swap, e-mail notification

HighPoint
You'll find HighPoint controllers less often in computer stores than on numerous

motherboards:
Two channel UltraATA/133: RocketRAID133
RAID 0, 1, 10, JBOD, hot swap, e-mail notification
Four channel UltraATA/133: RocketRAID404
RAID 0, 1, 10, JBOD, hot swap, e-mail notification

SAS9260-8I Sgl Raid 8PORT Int 6GB Sas/sata Pcie 2.0 512MBLSI Logic
Two channel UltraATA/100: Mega RAID IDE 100 (formerly AMI)
RAID 0, 1, 10, JBOD, hot swap, e-mail notification
Four channel UltraATA/100: Mega RAID i4
RAID 0, 1, 5, 10, JBOD, hot swap, e-mail notification
Hardly for home use: the SuperTrak SX6000 from Promise masters RAID 5 and supports
up to 128 MB cache.

Promise
Promise places equal focus on integration and retail sales:
Two channel UltraATA/133: FastTrak TX2000
RAID 0, 1, 10, JBOD, hot swap, e-mail notification
Five channel UltraATA/133: FastTrak TX2000
RAID 0, 1, 3, 5, 10, JBOD, hot swap, e-mail notification

RAID Without RAID
RAID modes 2 to 6 can be implemented only if the appropriate hardware RAID
controller is available. On the other hand, RAID 0 and 1 are offered directly by Windows
2000 or Windows XP - as long as there are several hard drives.
Under disk administration of the computer administration console, you can, among other
things, change partitions and drive letters. You can also connect two or more hard drives
to form a software RAID.

There Are Limits
RAID arrays are certainly an excellent approach to solving chronic performance deficits
and improving your sense of security. Let us mention, though, that they are not able to
perform miracles, and do not absolve the user or the administrator from backing up his or
her data periodically.
For example, a RAID controller cannot withstand short circuits or lightning, meaning that,
in the worst case scenario, your data could be toast. Therefore, an uninterruptible power
supply (UPS) is part of the required equipment in productive or otherwise critical systems.
Furthermore, a RAID array only offers protection from technical errors - the human
factor, however, should not be underestimated. Most users have had to live with lost data
because they carelessly deleted or hastily clobbered them - the same holds true with the
RAID.

The chapter on human-related causes also includes malicious attacks on the existing data,
or acts of the "powers that be." These involve attacks on software (deleting, formatting,
renaming, software bugs), as well as physical threats (theft, vandalism, arson, floods,
etc.).

Don't forget - only a backup is truly safe.

Setting Up a RAID Array
Setting up a RAID array usually doesn't take much time. Especially for modes 0 and 1, it
will suffice to select the drives to be included in the array in BIOS of the controller.
Lastly, after the system has rebooted, the new drive must be formatted (it may be
necessary to activate the RAID controller with its driver under Windows).
In RAID 3 or RAID 5 arrays, the controllers often run an initialization process that can
take up to several hours.
Though RAID Level 0 is the fastest option of all, it is also the most precarious by far. For
example, using four hard drives will push the data transfer rate far beyond 100 MB/s, but
fault tolerance will be virtually non-existent. A hard drive is a mechanical component that
will age and wear out after a while. Mechanically-induced defects are therefore really
only a question of time. But even an electronic error or a minor production error may
result in a catastrophe.

For this reason, RAID 0 is not recommended for long-term storage data, but primarily for
setting up fast drives with temporary data, such as file or database servers. And if the
system has to be mirrored on short notice, RAID 1 is your best choice. If you have a hot
swap cabinet, you can remove the hard drive while the system is running in order to
mirror it to a different drive of the same size on a different computer. Then the drive is
reinstalled in the computer, while the copy can be saved once again with a RAID 1 when
connecting it to a RAID controller.

RAID Level 1 does nothing other than mirror a hard drive's data (in special cases, also
those of any array) to another hard drive in real-time. People are inclined to believe the
misleading proposition that RAID 1 does not offer any improvement in performance.
Though write operations really aren't any faster than with only one hard drive, when
reading data, it is possible, in theory, to have a data transfer rate equivalent to that of an
analog RAID 0. This is only logical, since data can be read simultaneously from all the
drives in the array. In practice, however, there are differences, as the data to be read are
not available in cleanly split stripes as they are with RAID 0; instead, the controller has to
perform this division itself based on specific patterns.

Using RAID 1 makes sense if your main focus is on maximum data security and
minimum recovery efforts (e.g., simple servers). Most RAID controllers are able to
perform the recovery procedure independently after a hard drive has been exchanged.
You can do this on the fly only if the hard drives are housed in hot swap cabinets.
RAID Level 3 is losing more and more of its popularity because RAID 5 offers the same
advantages with fewer disadvantages. With RAID 3, parity data is written to one or
maybe even several hard drives. The big advantage is in the distribution of the actual data
stock to several drives, in the form of stripe sets, actually allowing a significantly higher
data rate - and at the same time protecting against a hard drive crash. Its disadvantage,
however, is the fact that the parity data is written to only one drive. This cuts down on
write performance considerably.

RAID 3 is usually deployed in servers with mostly static data or servers that require
better performance than RAID 0 can provide, without foregoing data security. This is a
simple way to keep the low write performance from carrying too much weight.
RAID Level 5 dominates in today's high-end server segment. If you're using four to
seven drives, such an array is a real performer and, if the drives are large, allows
HP P3410A 160Mbps Ultra3 SCSI Net RAID Array Controllersaccordingly large partitions. Unlike RAID 3, the parity data are integrated in the stripes on all drives and are distributed in a way that will have a positive impact on performance.
Consequently, RAID 5 offers a high level of performance for all kinds of applications.

The Sky's the Limit: Nested RAID
If the data transfer rate of an array with several drives is still not enough, you can
combine and nest RAID arrays any way you like. These configurations are called Nested
RAID (multiple RAID levels), but you'll encounter them very rarely - and no wonder,
because "conventional" arrays are generally fast enough.
As far as we know, controllers supporting Nested RAID are not yet available in the IDE
sector (with the exception of RAID 10). For SCSI, be prepared to quickly dish out several
hundred to some thousand dollars if you want to set up an elaborate RAID solution.

RAID Level 0+1
The most popular Nested Raid is probably 0+1. You'll need an even number of hard
drives for this, but at least four. Use half of the hard drives to create a stripe set (RAID 0),
while the resulting construct is simply mirrored (with RAID 1). You will then get almost
four times the read performance and about twice the write performance relative to a
single hard drive.

RAID Level 50 (5+0)
The performance of a RAID 5 with several drives is not good enough for you? Then
simply create a stripe set consisting of two identical RAID 5 arrays. Though data security
is no longer a given now (an array is to be considered a drive in this case), performance
can theoretically be doubled once more. In reality, you'll now be faced with the limits of
what PCI and network connections will allow.
Naming is an important factor in Multiple or Nested RAID configurations. While RAID
0+1 works on the lower level with stripe sets and mirrors only on the upper level, with
RAID 10 it's exactly the opposite. As the latter does not really make sense, the wrong
nomenclature would be less grave in this case.

Nested RAID and Security: It's All or Nothing
Now, a few words on the cascaded application of RAID arrays, even though most of you
will probably never be in a situation to have to worry about linked drives like this.
Combining several RAID arrays is efficient and prudent, but perfect data security can be
achieved only if each array is just as safe in and of itself. A RAID 5 consisting of
multiple RAID 0 arrays is not secure, because if one of the drives of a secondary array
crashes, its data cannot be recovered.

RAID Levels: Security and Performance at a Glance
RAID Number of drives Data security Availability Capacity Performance Cost
0 1+ unsatisfactory bad 100% very good very low
1 2 good good 50% satisfactory low
3 3+ satisfactory good (x-1) / x satisfactory medium
5 3+ satisfactory good (x-1) / x good medium
0+1 4,6,8... good good 50% good medium
The Key to Success: Block Size

In RAID arrays, the block size generally also determines the stripe size (not in RAID 1).
The principles concerning block size and wasted memory space apply equally to RAID
configurations: if, for example, the blocks have a size of 64 KB, then at least 64 KB are
written at all times - even if there's only a text file with 2 KB. So, the smaller the average
file size, the smaller the block size should be.

But block size is also significant in terms of the performance to be expected, as the
Netgear ReadyNAS Duo 2-Bay 1 TB (1 x 1 TB) Desktop Network Attached Storage RND2110smallest unit also determines when a file can be distributed to two or more drives. This would mean that with a block size of 64 KB, files with less than 64 KB would be written to only one hard drive. This does not happen any faster in a RAID array than on a single hard drive.

On the other hand, a file with 150 KB would be distributed to three hard drives (if available): 64 + 64 + 22 KB. The controller is now able to read from all three drives simultaneously, which reduces the read operation immensely.

No comments: