Linux filesystems. XFS

Última actualització: maig 2013

L'experiència m'ha demostrat que si vols un sistema de fitxers que aguanti miles de fitxers alguns d'ells de gran tamany, una molt bona elecció és XFS.

M'he trobat que la instrucció xfs_check no funciona bé amb sistemes de fitxers grans (obtenim un error "Out of memory"). En aquest casos utilitzarem l'instrucció xfs_repair.

--

Aquí algunes característiques de diferents sistemes de fitxers. Son unes línies extretes de "Guide to Linux File Systems" de Val Henson:

Choosing and tuning the right file system for your workload:

XFS only Linux FS to support more than 1TB reliably

'iostat' -- useful tool

No single best file system, workload-dependent

Factor of 10^6 in time between CPU/memory ops and I/O ops -- ns versus ms

How FS like to be treated:

-Mostly reads

-Large, contiguous IO

-Medium-sized files -- 4K-128K

-Medium-sized dirs -- 10-1000 entries

-Most IO near beginning of file

-Few metadata ops

-Clean unmount

How to abuse your FS:

-Fill one dir with a million files

-Simultaneously create one huge file with remaining space

-Randomly create and delete small files in same dir

-Randomly read and write single bytes of the large file

-Add and remove ACL/extended attribs

-Slowly yank the power plug

Diffs betw FS's:

-File system and file size

-Number of inodes

-Dir size and lookup algorithm

-File data R/W performance

-File create/delete performance

-Space efficiency

-Special features -- direct IO, execute in place, etc

Crash recovery method:

-Ease of repair

-Stability

-Support

ext2:

simple, fast, stable, slow recovery, easy to repair

ext3:

rock stable, fast recovery, slow metadata ops

reiser3:

lots of small files, big dirs, less stable, poor repair, less support

xfs:

large files, big dirs, big FS's, slow repair

jfs:

end-of-life'd by IBM

others less well tested, poor support

Common workloads:

embedded

avoid writing flash unless necessary

ext2 (for read-only) / ext3, minix for ramdisks

jffs2 for flash without write-balancing (modern flash _has_ write-bal)

laptop

withstanding frequent crashes

low performance demands

ext3 is best

eliminate writes as much as possible

mount -o noatime,nodiratime

group writes with laptop mode, read Documentation/laptop-mode.txt

desktop

sweet spot of most FS's

ext3 or reiser

reiser notail option improves performance at cost of efficient storage

large file working set? increase # of inodes cached in memory

Documentation/sysctl/fs.txt

file server

ext3 for few metadata ops

reiser for more metadata ops, small files

xfs for large streaming reads/writes, large dirs

ext3: data=writeback trades speed for data integrity after a crash

faster

ext3: data=journal reduces latency of sync NFS writes

ext3: default is data=ordered

can tweak block size

for v high perf, consider ext2

some cluster FS's use ext2 as the per-node base

mail server

mbox format (one big file) -- ext3

maildir format (lots of small files) -- reiser

ext3 w small blocks, high inode-to-file ratio can be good for maildir

don't cut any corners on your mail server -- reliability is key

database server

ocfs2 for cluster oracle databases

direct IO often imp

tuning FS's for DB's is an arcane art

video server

large files, write-once, read-many

streaming access

XFS clear winner

ext3 with larger reservation could work

NFS tuning tips

raise r/w size to ~8192 (8K)

use NFSv3 and TCP (not UDP)

async raises write perf but could cause probs in crash --

no longer the default

Can't recommend NFSv4 yet

distributed FS's

tradeoff of latency vs consistency

most are buggy and slow

use optimized for one case

NFS - multi read, single write

OCFS2 - DB's