RAID

About the Install

Software RAID is compatible with a dual boot environment involving windows but windows will not be able to mount or read any partition involved in the pure software RAID, and all pseudo-hardware RAID controllers must be turned off.

This HOWTO assumes you are using SATA drives but it should work equally well with IDE drives. If you are using IDE drives, for maximum performance make sure that each drive is a master on its own separate channel.

To partition drives similarly to how the gentoo install docs suggest:

device mount size

/dev/sda1 /boot 32MB

/dev/sda2 swap >=2*RAM

/dev/sda3 / 10GB

/dev/sda4 /home 180GB (This partition is optional but recommended)

Note: You may want to consider a larger / partition, depending on the applications you are planning on using.

When you partition your disks, make sure that your partitions use fd (Linux RAID autodetect) as Partition Type instead of the default 83 (Linux native) or 82 (swap).

/boot would be best chosen as a RAID1. Recall that in RAID1, data is mirrored on multiple disks, so if there is a problem with your RAID somehow, GRUB/LILO could point to any of the copies of the kernel on any of the partitions in the /boot RAID1 and a normal boot will occur.

In this HOWTO, /, /boot, /home and the swap partition will be RAID 1 (mirror). For better performance you could use RAID10 in the far layout instead (raid10,f2), this is a direct replacement with the enhanced raid10 driver, which gives double the sequential read speed compared to raid1. Use "--level=10 -p f2" as additional parameters when creating arrays with mdadm.

If you do not place your swap partition on RAID and a drive containing your swap partition dies, your system will likely die when your system tries to access the swap partition.

Note: A swap partition on RAID must be mounted at boot time by an entry in /etc/fstab.

Note: A swap partition is faster than a swap file but requires a more complex partitioning of your disk(s). Changing the size of a swap file does not require repartitioning. Volume managers such as LVM(2) or EVMS work with volumes which provide sophisticated and more flexible alternatives to partitions. LVM(2) or EVMS often let you change e.g. the size of volumes on the fly. Sometimes swap partition(s) can be shared between certain operating systems in dual/multiple boot setups.

[edit] Load Kernel Modules

Load the appropriate RAID module.

modprobe raid1

modprobe raid0

modprobe raid5

For RAID-1, RAID-0 and RAID-5 respectively.

[edit] Setup Partitions

You can partition your drives with tools such as fdisk or cfdisk. There is nothing different here except to make sure:

1. Your partitions are the same size on each drive. See below for instructions on copying a partition map.
2. Your partitions to be included in the RAID are set to partition type fd, Linux RAID auto-detect. If not set to fd, the partitions will fail to be added to the RAID on reboot.

Note: When using GNU parted the Linux RAID auto-detect flag may be represented as "raid"

This might be a good time to play with the hdparm tool. It allows you to change hard drive access parameters, which might speed up disk access. Another use is if you are using a whole disk as a hot spare. You may wish to change its spin down time so that it spends most of its time in standby, thus extending its life.

You can also setup the first disk partitions and then copy the entire partition table to the second disk with the following command:

sfdisk -d /dev/sda | sfdisk /dev/sdb

[edit] Setup RAID

Now before we start creating the RAID arrays, we need to create the metadevice nodes:

cd /dev && MAKEDEV md

After partitioning, create the /etc/mdadm.conf file (yes, indeed, on the Installation CD environment) using mdadm, an advanced tool for RAID management. For instance, to have the boot, swap and root partition mirrored (RAID-1) covering /dev/sda and /dev/sdb, the following commands can be used:

mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

mdadm --create --verbose /dev/md2 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2

mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3

mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/sda4 /dev/sdb4

Or if you are lazy:

for i in `seq 1 4`; do mknod /dev/md$i; mdadm --create /dev/md$i --level=1 --raid-devices=2 /dev/sda$i /dev/sdb$i; done

On the other hand if you want to put 4 partitions (sdc1, sdd1, sde1, sdf1) into a single RAID-5 then try this command:

mdadm --create /dev/md0 --level=raid5 --raid-devices=4 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

Note: It looks like the latest version of mdadm uses version 1.0 superblocks by default. These will not be autodetected on startup, as the maintainer has altered them specifically not to! A workarround that's known to work is adding the '-e 0.90' switch to these lines, thus creating a superblock that will be detected on boot (the official line is that an initrd should be created).

Note: mdadm v2.6.2 uses version 0.9 superblocks by default. Following this guide should be enough to get it working.

Note: If for some crazy reason (like if you're migrating from a single disk to two...) you only have one of the hard drives on hand and want to setup a mirror (RAID-1) you can specify 'missing' instead of the second device. Then when you install the second device you can add it to the array and it will sync automatically. There may be weirdness when adding another hard drive to the system when it comes to configuring the boot loader (due to drive order/numbering).

Note: In RAID1 or RAID10 if you want to avoid the initial resync with new and clean hard disks add --assume-clean

You may check /proc/mdstat to see if the RAID devices are done syncing:

cat /proc/mdstat

you can use also:

watch -n1 'cat /proc/mdstat'

which refresh the output of /proc/mdstat every n seconds. You can cancel the output with: CTRL+C

It should look something like this (showing one array syncing and the other one already completed):

Personalities : [raid1]

md2 : active raid1 sdb3[1] sda3[0]

184859840 blocks [2/2] [UU]

[======>..............] resync = 33.1% (61296896/184859840) finish=34.3min speed=59895K/sec

md1 : active raid1 sdb1[1] sda1[0]

10000320 blocks [2/2] [UU]

unused devices: <none>

If an array is still syncing, you may still proceed to creating filesystems, because the sync operation is completely transparent to the file system. (Note: if a drive happens to fail before the RAID sync finishes, then you're in trouble.)

Note: You can install grub before the sync is complete using the commands shown below in this article, however before you restart the machine the drives need to be settled.

Create the filesystems on the disk.

mke2fs -j /dev/md1

mke2fs -j /dev/md3

mkfs.ext3 -j -O dir_index,resize_inode /dev/md4

Warning: There are reports that Journaled File Systems and >=2.2 Kernels are problematic on all Software-RAID levels. A number of the reports involve using crypto loops which are known to cause corruption. See http://forums.gentoo.org/viewtopic-t-412467.html for more information.

Note: Stride calculation: mke2fs and tune2fs have a parameter -E (extended-options) with the following options:

stride=stride-size

Configure the filesystem for a RAID array with stride-size filesystem blocks.

This is the number of blocks read or written to disk before moving to next disk.

This mostly affects placement of filesystem metadata like bitmaps at mke2fs(2) time

to avoid placing them on a single disk, which can hurt the performance.

It may also be used by block allocator.

stripe_width=stripe-width

Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe.

This is typically be stride-size * N, where N is the number of data disks in the RAID (e.g. RAID 5 N+1, RAID 6 N+2).

This allows the block allocator to prevent read-modify-write of the parity in a RAID stripe if possible when the data is written.

Note: Stride calculation example: http://busybox.net/~aldot/mkfs_stride.html

[edit] Create the Swap Partition

As described above, we earlier used RAID-0 for our swap partition. But if one of your discs dies, the system will most likely crash (since in a RAID-0 the swap data will be split over all discs). So now we use a mirrored array type:

Your fstab could look like:

File: /etc/fstab

/dev/md2 swap swap defaults 0 0

There is no performance reason to use RAID for swap. The kernel itself can stripe swapping on several devices if you give them the same priority in the /etc/fstab file. Using a mirrored raid type such as raid1 or raid10,f2 will make writing in the swap area half the speed, as data is written twice.

A striped /etc/fstab looks like:

File: /etc/fstab

/dev/sda2 swap swap defaults,pri=1 0 0

/dev/sdb2 swap swap defaults,pri=1 0 0

For reliability reasons, you may choose to use RAID for swap. With a non-RAID configuration as shown above, a drive failure on any of the swap can crash your system. Also, the above configuration, while it may be faster than using a single drive for swap, it is also 2 times more likely for a drive to fail and take your system with it.

[edit] Mount Partitions

Turn the swap on:

mkswap /dev/md2

swapon /dev/md2

Mount the /, /boot and /home RAIDs:

mount /dev/md3 /mnt/gentoo

mkdir /mnt/gentoo/boot

mount /dev/md1 /mnt/gentoo/boot

mkdir /mnt/gentoo/home

mount /dev/md4 /mnt/gentoo/home

Copy RAID configuration

mdadm --detail --scan >> /etc/mdadm.conf

mkdir /mnt/gentoo/etc

cp /etc/mdadm.conf /mnt/gentoo/etc/mdadm.conf

Make chrooted environment like real ;-)

mount -t proc none /mnt/gentoo/proc

mount -o bind /dev /mnt/gentoo/dev

[edit] Continue the Install

Continue with the Gentoo Handbook starting with the section entitled "Installing the Gentoo Installation Files". Use /dev/md1 for the boot partition, /dev/md3 for the root partition and /dev/md4 for the home partition.

When you're configuring your kernel, make sure you have the appropriate RAID support in your kernel and not as module.

Linux Kernel Configuration: Raid configuration

Device Drivers --->

Multi-device support (RAID and LVM) --->

[*] Multiple devices driver support (RAID and LVM)

<*> RAID support

<*> RAID-0 (striping) mode

<*> RAID-1 (mirroring) mode

<*> Device mapper support

When installing extra tools, emerge mdadmin as well.

emerge mdadm

rc-update add mdadm boot

otherwise mdadm will not be loaded at boot time.

When configuring your bootloader, make sure it gets installed in the MBR of both disks if you use mirroring (RAID 1).

[edit] Installing Grub onto both MBRs

Note: Grub may fail with 'Error 28' when using the hardened profile. The solution is to use a non hardened gcc to compile grub. Use gcc-config to switch to the -vanilla version of gcc, compile grub, then switch back.

gcc-config -l

[1] x86_64-pc-linux-gnu-3.4.6 *

[2] x86_64-pc-linux-gnu-3.4.6-hardenednopie

[3] x86_64-pc-linux-gnu-3.4.6-hardenednopiessp

[4] x86_64-pc-linux-gnu-3.4.6-hardenednossp

[5] x86_64-pc-linux-gnu-3.4.6-vanilla

gcc-config 5

emerge grub -av

...

gcc-config 1

Since the /boot partition is a RAID, grub cannot read it to get the bootloader. It can only access physical drives. Thus, you still use (hd0,0) in this step.

Run grub:

grub --no-floppy

You must see GRUB prompt:

grub>

If you are using a RAID 1 mirror disk system, you will want to install grub on all the disks in the system, so that when one disk fails, you are still able to boot. The find command above will list the disks, e.g.

grub> find /boot/grub/stage1

(hd0,0)

(hd1,0)

grub>

Now, if your disks are /dev/sda and /dev/sdb, do the following to install GRUB on /dev/sda MBR:

device (hd0) /dev/sda

root (hd0,0)

setup (hd0)

This will install grub into the /dev/sdb MBR:

device (hd0) /dev/sdb

root (hd0,0)

setup (hd0)

The device command tells grub to assume the drive is (hd0), i.e. the first disk in the system, when it is not necessarily the case. If your first disk fails, however, your second disk will then be the first disk in the system, and so the MBR will be correct.

The grub.conf does change from the normal install. The difference is in the specified root drive, it is now a RAID drive and no longer a physical drive. For example it would look like:

File: /boot/grub/grub.conf

default 0

timeout 30

splashimage=(hd0,0)/boot/grub/splash.xpm.gz

title=My example Gentoo Linux

root (hd0,0)

kernel /boot/bzImage root=/dev/md3 md=3,/dev/sda3,/dev/sdb3

[edit] Misc RAID stuff

To see if RAID is functioning properly after reboot do:

cat /proc/mdstat

There should be one entry per RAID drive. The RAID 1 drives should have a "[UU]" in the entry, letting you know that the two hard drives are "up, up". If one goes down you will see "[U_]". If this ever happens your system will still run fine, but you should replace that hard drive as soon as possible.

To rebuild a RAID 1:

1. Power down the system
2. Replace the failed disk
3. Power up the system once again
4. Create indentical partitions on the new disk - i.e the same as the one good remaining disk
5. Remove the old partition from the array and add the new partition back

You can copy a partition map from one disk to another with dd. Additionally, since the target drive is not in use we can rewrite partition map with fdisk to force the partition map to be re-read by the kernel:

dd if=/dev/sdX of=/dev/sdY count=1

fdisk /dev/sdY

Command (m for help): w

To remove the failed partition and add the new partition:

mdadm /dev/mdX -r /dev/sdYZ -a /dev/sdYZ

Note: do this for each of your partitions. i.e. md0 md1 md2 etc.

Watch the automatic reconstruction run with:

watch -n .1 cat /proc/mdstat

[edit] Notification

If you want to receive e-mail alerts about your RAID system mdadmin must be configured with your e-mail address.

Make sure you can send mail from your machine. If all you need is basic SMTP support, you may wish to consider installing nail. This is a version of mail that can be compiled with SMTP support.

Make sure that the next line is in the /etc/mdadm.conf with the correct To e-mail address:

File: /etc/mdadm.conf

MAILADDR root@example.com

Fix me: An explanation of how to get mail-client/nail to work is required. I couldn't get it going. mail-mta/ssmtp was easy.

To verify that e-mail notification works, use this test command:

mdadm -Fs1t

Finally add the mdadm script to your default RC, and start it to begin monitoring:

rc-update add mdadm default

/etc/init.d/mdadm start

Fix me: RC default and boot?

Now if one of your disks fails you will be notified at the address supplied.

[edit] Write-intent bitmap

A write-intent bitmap is used to record which areas of a RAID component have been modified since the RAID array was last in sync. Basically, the RAID driver periodically writes out a small table recording which portions of a RAID component have changed. Therefore, if you lose power before all drives are in sync, when the array starts up a full re-sync is not needed. Only the changed portions need to be re-synced.

[edit] To turn on write-intent bitmapping

Install a modern mdadm: >=sys-fs/mdadm-2.4.1 Install a modern kernel: >=2.6.16

Your RAID volume must be configured with a persistent superblock and has to be fully synchronized. Use the following command to verify whether these conditions have been met:

mdadm -D /dev/mdX

Make sure it says:

State : active

Persistence : Superblock is persistent

Add a bitmap with the following command:

mdadm /dev/mdX -Gb internal

You can monitor the status of the bitmap as you write to your array with:

watch -n .1 cat /proc/mdstat

[edit] To turn off write-intent bitmapping

Remove the bitmap with the following command:

mdadm /dev/mdX -Gb none

[edit] Data Scrubbing

In short: Especially if you run a RAID5 array, trigger an active bad block check on a regular basis, or there is a high chance of hidden bad blocks making your RAID unusable during reconstruction.

Normally, RAID passively detects bad blocks. If a read error occurs, the data is reconstructed from the rest of the array, and the bad block is rewritten. If the block can not be rewritten, the defective disk is kicked out of the active array.

Once the defective drive is replaced, reconstruction will cause all blocks of the remaining drives to be read. If this process runs across a previously undetected bad block on the remaining drives, another drive will be marked as failed, making RAID5 unusable. The larger the disks, the higher the odds that passive bad block detection will be inadaquate. Therefore, with today's large disks it is important to actively perform data scrubbing on your array.

With a modern (>=2.6.16) kernel, this command will initiate a data consistency and bad block check, reading all blocks, checking them for consistency, and attempting to rewrite inconsistent blocks and bad blocks.

echo check >> /sys/block/mdX/md/sync_action

You can monitor the progress of the check with:

watch -n .1 cat /proc/mdstat

You should have your array checked daily or weekly by adding the appropriate command to /etc/crontab.

If you find yourself needlessly checking your array (like I was) and want to stop it safely, you can either stop the entire array, or:

echo idle >> /sys/block/mdX/md/sync_action

Page updated

Google Sites

Report abuse