DISCREMENTAL-LITE (forever incremental hard disk based backups) LICENSE GPL version 2 INTRODUCTION This program (set of scripts) is yet another rsync based backup solution. It uses rsync with the --link-dest option to create hard links to files that have not changed rather than creating another copy of the same file. Using this technique, each snapshot is a full backup even though the backup process used is incremental (backing up only the files that changed since the last backup). The snapshots of files are arranged by date and time. This paradigm is sometimes called forever incremental backups. This program is a quick hack to get something that works. The full version if DISCREMENTAL has regression tests and is more robust. Unfortunately, it also means that it takes longer to develop initially. If it is ever finished DISCREMENTAL should be used instead of DISCREMENTAL-LITE. SYNTHETIC BACKUP Using rsync in this way is similar to a synthetic backup except that in our case each snapshot is a full backup. A synthetic backup is a view of multiple incremental backups that appears as a full backup. PHILOSOPHY Tapes suck. If you are looking into this type of program you already know this. I don't need to defend the rationale for minimizing the use of tapes. That said, you may still use tape and other mediums with this solution. Once the files are in the archive you can choose to copy them to any other medium. Like many other disk based backup programs this program is a wrapper around rsync. Unlike many of those other programs it keeps that as the central goal--to make it easy and efficient to use the features of rsync and manage disk based backups. This philosophy also leads to the program being easily maintained. It does not use a compiled language. Why should a wrapper around rsync need to be compiled? It also minimizes the amount of code that must be maintained by using proven and common subsystems such as cron and ssh. Each backup is atomic in that it is run directly from cron and configured by the user. This allows maximum flexibility when needing to run pre and post actions without needing yet another markup language and/or configuration subsystem. TERMINOLOGY This is loosely based on the terminology that dirvish uses. A box is an individual snapshot made by rsync that will share files with other snapshots by the use of hard links. The naming of the boxes is based on the date. A vault is a collection of boxes that share files. The vault will also hold meta data and logs. A bank is a directory that holds vaults and meta data. In addition to vaults, the bank should contain a template vault and a bank log. Review BANK - a directory that holds vaults VAULT - a collection of backups that have hard links to some of the same files BOX - a snapshot/backup CONFIGURATION I Tweak defaults in /etc/discremental-lite.conf Read the examples and notes in discremental-lite.conf for more information. II Create the BANK directory specified in the config III If using a network, setup connections from server to client(s) a. if using ssh, setup passwordless public/private key authentication b. if using rsync daemon setup rsyncd.conf on clients IV create a vault configuration file in /etc/discremental-lite.d/ printf "U_SERVER=root@server1\nSOURCEFILES=\"/etc/ /root /usr/local\"\nDEST=server1" > /etc/discremental-lite.d/server1 V Schedule cron jobs echo "30 1 * * * root /usr/local/sbin/discremental-lite /etc/discremental-lite.d/server1" >> /etc/cron.d/discremental-lite MULTIPLE SERVERS The program can accept a configuration file or a directly as an argument. If a directory is given it will process each configuration found in that directory one at a time. PARALLEL BACKUPS If you wish to run multiple sets of backups at the same time: First, group the configurations into directories as specified in MULTIPLE SERVERS. Second insert separate cron entries for each set so that they run at the same time. CHECKSUMS The default checksum program is discremental-checksums. It is implemented as a separate script so that it can be replaced with faster programs written in other languages. The program takes 2 args. The full path to a box and optionally the full path to the md5 hash file of the last box that the new box was linked against. The script may grab md5 hashes from the lastbox rather than calculating new ones when the files were hardlinked. The only requirements are that two files are created named after the box that the script works on. MULTI-THREADED CHECKSUMS The discremental-checksums script will use multiple threads if the THREAD parameter is set in discremental-lite.conf or if the environment variable DLTHREADS= is set. FREEDUPS - hard link duplicate files within backup More information on freedups can be found here http://www.stearns.org/freedups/index.html The upstream version has some limitations that I disagree with such as quitting at the first sign of a problem. It also has a few bugs that I have fixed such as bombing on filenames that have spaces. Check http://pcxperience.org/freedups for details. The duplicate removal can be done within a box, a vault, a bank, or the entire filesystem. If configured discremental-lite freedups the boxes. It is possible to freedup the whole bank but you must do that manually. Please check freedups.pl -h for details. Since the data is structured so that duplicate files between boxes are linked as the box is created it should not be necessary to freedup the whole vault. EXAMPLE If two files are the same in one backup (box) freedups will hardlink them. If the files have NOT changed when the next backup is run they will be linked to the same inode by rsync. If the files HAVE changed when the next backup runs they will be freeduped again. If any are duplicates the process repeats. CACHE The md5 cache file will be stored in the vault and passed to freedups: freedups.pl --cachefile=/path/bank/vault/md5sum-v1.cache PATHS The boxes need to be passed with full paths: freedups.pl --cachefile=/path/bank/vault/md5sum-v1.cache \ /path/to/bank/vault/box /path/to/bank/vault/box2 REPORTS VS DOING THE WORK If configured freedups will be called to create hardlinks from duplicates. If you do not want to remove duplicates or wish to use some other program freedups can be turned off or called in a report-only mode. Set the FREEDUPS variable in /etc/discremental-lite.conf to change the behaviour.