ZPOOL and IOPs notes

Notes on IOPs (Input/Output Operations per second) and ZPOOLS (example zpool name: abyss)

Choose any 2: speed | reliability | cost

If you pick speed and reliability, it will not be cheap

If you pick reliable and cost effective, it will not be fast

If you pick speed and cost effective, it will not be reliable

IOPs computation/approximation via SEEK time

IOPS = 1 / ( average latency + average seek time )                 lower bound calculated value
IOPS = 1 / ( average seek time )                                   upper bound calculated value

Using specifications of a Samsung Spinpoint F3 HD103SJ 7200RPM, avg seek time 8.9ms, avg latency 4.17ms

IOPS = 1/ (0.00417 + 0.0089) =  1000/(4.17 + 8.9) = ~76 IOPs       lower bound calculated value
IOPS = 1/ (0.0089) = 1000/(8.9) = ~112 IOPs                        upper bound calculated value

Alternate, via pure rotational speed

IOPS = RPM / 60
IOPS = 7200 / 60 = ~120 IOPs

One argument, says it is actually 2 x 120 because on average you wait half of a revolution or 180 degrees (ie IOPS ~= RPM / 30) vs a full revolution.

However, if seek time is not fast enough for a 10 degree rotation, you’ll end up waiting one revolution + the previous delta or 360 + 10 = 370 degrees.

For 7200 RPM disk (rotational method):   120 <= IOPS <= 240
For 7200 RPM disk (seek method):          76 <= IOPS <= 112

What does this all mean for my POOL?

RAID Read IO penalty Write IO penalty

RAID 0       1                   1
RAID 1, 10   1                   2
RAID Z       1                   4
RAID Z2      1                   6
IOPs needed = ( Total IOPs x % read ) + (Total IOPs x % write x RAID penalty)

So, if we needed 300 IOPs with a 50% read and 50% write workload from a RAID-Z (write penalty = 4) pool

IOPs needed = (300 x 0.5) + (300 x 0.5 x 4) = 150 + 600 = 750 IOPs

For, a similar 300 IOPs with a 50% read and 50% write workload for a RAID 1, 10 (write penalty = 2) pool

IOPs needed = (300 x 0.5) + (300 x 0.5 x 2) = 150 + 300 = 450 IOPs

This says that a 750 IOPs pool would be needed to support a 300 IOPs RAID-Z pool with a 50/50 read/write workload.

Note, only a 450 IOPs pool is needed to support the same 300 IOPs on a RAID 1, 10 pool with the same 50/50 read/write workload.

See zpool IOPs read/write CLI metrics (operations column)

zpool iostat -v 1
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
abyss       6.78G  67.2G      0      0  4.84K  5.57K
  c5d0      6.78G  67.2G      0      0  4.84K  5.57K
----------  -----  -----  -----  -----  -----  -----

Optimal RAID-Zx pool member per vdev rule 2^n + p

Where n is 1, 2, 3, 4, . . .

And p is the parity: p=1 for raid-z1, p=2 for raid-z2 and p=3 for raid-z3

RAID-Z  = (2^1 + 1) … (2^n + 1) = 3, 5, 9,  17, …
RAID-Z2 = (2^1 + 2) … (2^n + 2) = 4, 6, 10, 18, …
RAID-Z3 = (2^1 + 3) … (2^n + 3) = 5, 7, 11, 19, …

RAID1 aka mirror

This example creates a mirror with 1 vdev and 1 mirrored data disk (2 disks on the vdev).

zpool create abyss mirror disk1 disk2

adding a striped, mirrored vdev. Growth becomes similar to RAID 10 and adds an additional vdev each add.

zpool add    abyss mirror disk3 disk4

3 way mirror

zpool create abyss mirror disk1 disk2 disk3
zpool add    abyss mirror disk4 disk5 disk6

RAID-Z

Similar to RAID5, but uses variable width stripe for parity which allows better performance than RAID5. RAID-Z allows a single disk failure. This example creates pool with 1 vdev, 2 data disks and 1 parity disk.

zpool create abyss raidz disk1 disk2 disk3

adding a vdev to grow the RAID-Z nested pool

zpool add    abyss raidz disk4 disk5 disk6

RAID-Z2

Similar to RAID6, and allows 2 drive failures before being vulnerable to data loss. Here we have 1 vdev with 2 data and 2 parity disks

zpool create abyss raidz2 disk1 disk2 disk3 disk4

adding a vdev to grow the RAID-Z2 pool

zpool add    abyss raidz2 disk5 disk6 disk7 disk8

RAID-Z3

Allows 3 drive failures before being vulnerable to data loss. Here we have 1 vdev with 2 data and 3 parity disks.

zpool create abyss raidz3 disk1 disk2 disk3 disk4 disk5

adding a vdev to grow the RAID-Z3 pool

zpool add    abyss raidz3 disk6 disk7 disk8 disk9 disk10

Adding a spare

Having a spare ready minimizes the time your pool is unprotected. You can begin replacement as soon as a failure occurs.

zpool add    abyss spare disk1

Adding a log (ZIL/Write cache)

zpool add    abyss log disk1                           ok
zpool add    abyss log mirror disk1 disk2              better

Adding a cache (L2ARC/Read cache)

zpool add    abyss cache disk1