Here is the scenario:
the output of `df` shows a filesystem at 100% capacity
adding up `du -sk *` does not match the usage in `df`
What is happening is someone launched a program or script that opens a file perhaps for logging purposes. Then, the user deletes the file from the filesystem. Since the file handle is open, the user is only deleting the inode associated (unlink) with the open handle. The process continues to write to the open file handle filling up your filesystem in an almost undetectable manner.
I have run into this a number of times with application developers. You can artificially recreate this scenario by starting syslogd and then delete /var/log/messages. Syslogd will actually contiune to write to the data portion that the inode was pointing to.
Question: So how do you detect this?
Answer: Can be found in A Quick Start for Lsof, Finding an Unlinked Open File section.
Finding an Unlinked Open File section:
a. Finding an Unlinked Open File
=================================
A pesky variant of a file that is filling a file system is an
unlinked file to which some process is still writing. When a
process opens a file and then unlinks it, the file’s resources
remain in use by the process, but the file’s directory entries
are removed. Hence, even when you know the directory where the
file once resided, you can’t detect it with ls.
This can be an administrative problem when the unlinked file is
large, and the process that holds it open continues to write to
it. Only when the process closes the file will its resources,
particularly disk space, be released.
Lsof can help you find unlinked files on local disks. It has an
option, +L, that will list the link counts of open files. That
helps because an unlinked file on a local disk has a zero link
count. Note: this is NOT true for NFS files, accessed from a
remote server.
You could use the option to list all files and look for a zero
link count in the NLINK column — e.g.,
$lsof +L
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
…
less 25366 abe txt VREG 6,0 40960 1 76319 /usr/…
…
> less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/sd0a)
Better yet, you can specify an upper bound to the +L option, and
lsof will select only files that have a link count less than the
upper bound. For example:
$ lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/sd0a)
You can use lsof’s -a (AND) option to narrow the link count search
to a particular file system. For example, to look for zero link
counts on the /home file system, use:
$ lsof -a +L1 /home
CAUTION: lsof can’t always report link counts for all file types
– e.g., it may not report them for FIFOs, pipes, or sockets.
Remember also that link counts for NFS files on an NFS client
host don’t behave as do link counts for files on local disks.