Clones for Backups ================== Short summary: file modification times are tracked independent of mtime, a hierarchy of files modified after a certain date can be isolated in a snap-shot with copy-on-write semantics, an easily parsable or json report can feed a back-up script. Two python scripts (eos-backup, eos-backup-browser) illustrate how to use the functionality and can serve as backup/restore utilities. 1. syncTime a "server-store-timestamp" is maintained for all directories and files. This is conceptually different from mtime (which can be set by the user), however: for containers the TMTime field is used; for files 'syncTime' has been added - but it is only different from the default (hence mostly not stored) if mtime is set explicitly, which should be rare. Thus, in most practical cases syncTime will be mtime, minimising the increase in quarkdb footprint. While a clone exists, transient "cloneId" and possibly "cloneFST" attributes are maintained in the mgm. 2. the clone the clone is a "sparse" or "lazy" clone, starting empty. Upon cloning, all files are tagged with the cloneId field. When files are modified, their original contents are "reflinked" on the FSTs (see "cp --reflink", which implements a copy-on-write on the original FST) into a new file, which is placed in a container modelled after their parent containers under /eos//proc/clone/cloneId. For each file the clone process returns the foreseen clone path, for use in backup scripts. 3. cloning in hierarchies Cloning starts at directory level. All files at directory level are tagged with the cloneId. The cloning process then recursively descends into subdirectories. To prevent this, a directory can be flagged with a different cloneId (i.e. cloning a parent directory will not re-clone a child directory previously cloned), or a special "dummy" cloneId used as a marker on a directory preventing descent. 4. clone process details A cloneId is a [unique] second-resolution timestamp generated automatically at the beginning of the cloning process. The clone process is triggered by a find command with the '-x sys.clone=+...' option (specifying sys.clone requires privileges): "-x sys.clone=? ..." - displays all files and clone attributes under ..., "-x sys.clone=-1556195651 ..." - deletes all traces of the clone 1556195651, "-x sys.clone=+10 ..." - makes a full backup clone (numbers < 10 are markers, from 10 onwards second-precision timestamps), "-x sys.clone=+1 ..." - sets a marker not to descend into ... while cloning, "-x sys.clone=+1556195651 ..." - makes a clone of all files altered after 1556195651.0 Cloning tags every file and directory in the hierarchy with the (generated) cloneId. 5. copy-on-write Whenever a file is being deleted, or being rewritten using OTRUNCATE, its underlying data are transferred (using a low-cost "mv") to a clone file under a predefined name in the clone hierarchy. When rewritten with OTRUNCATE, a new file will then automatically be created, tagged as "cloned". When a file is updated (modified) without OTRUNCATE, a copy-on-write clone is created, using reflink (which is low-cost in recent file systems, but elsewhere a real copy). 6. the backup process "find -j -x sys.clone=+nnnnnnnn /path" clones a directory tree and creates a json output for reference, which must be kept safe indicating which files should be backed up and where their copy-on-write clones. As soon as the clone has been created, write operations to files may result in copy-on-write clones be created, hence during back-up a decision has to be made whether to back up a possible clone file or, for a full backup, the live file. For a live file, the decision may even have to be revised after the copy if a copy-on-write clone has been created meanwhile. 7. cleanup After data have been backed up, 'find -x sys.clone=-nnnnnnnn' deletes all the cloneId and clone-tag attributes and cleans the clone directory under /proc/clone/cloneId. The temporary clones disappear from the FSTs by-and-by. The back-up now resides on backup media only. A typical backup-restore cycle ------------------------------ Here's a complete backup and restore recipe, including incremental (as implemented by test/eos-backup-test): a. backup triggers a full backup issuing 'find -j -x sys.clone=+10 ' and stores the returned json output for reference in a safe place. For each output line, the json dictionary's key 'n' contains the original path name, 't' the server-store-time, 'c' the cloneId (if this is different from the backup's cloneId the file belongs to another backup!), 'p' the clonePath suffix. The eos-backup-test program illustrates how those can be used for managing backups and restores. The backupId is the cloneId returned for the top-level directory (the first line). For each file, backup copies the version under the clone path to backup media if it exist, or the live file otherwise. If the live file has been copied, a check is performed again after the copy to assess no clone file has been created meanwhile, or the clone file replaces the live file on backup media. is backed up recursively, descent into subdirectories can be stopped with a directory marker (see above). A filename ending in a '/' signals a directory. The clonePath is relative to /eos//proc/clone and structured backupId/cloneDir/clonePath for files, or simply backupId for directories in the report; b. backup then cleans the backup clone: find -f -x sys.clone=-nnnnnnnn . The directory returns to its normal EOS state (without copy-on-write overhead); c. incremental backups may be triggered using 'find -j -x sys.clone=+ '. The previousBackupId would either be the backupId of a previous backup or the backupId of the full backup, depending on the type of increment desired. Again, the backupId of the incremental backup is the cloneId of the top-level (first) directory. An incremental backup reports *all* current files below . Those not modified since the previous backup can be distinguished in the output by a backupId (often '0') that differs from the top-level directory - no need to back them up again; d. restore collects all stored backup reports (full, incremental 1, incremental2, ... up-to-desired-restore date) for into one list sorted by filename and backupId. The last (!) incremental report contains the *final* list of files to be restored. For every file in the sorted list, only the most recent (highest backupId) is restored from backup media (and only if it appears in the final list. At this point a filter could be applied to the list to only restore selected files. Timing ------ - metadata operations under (ideally) the biggest lock possible for consistency - the standard write may be delayed for copy-on-write - slow with ancient kernels - copy-on-write clones are short lived, and only needed for partially modified files - the backup will be "largely" consistent, even without a BIG lock Timing in current implementation (in seconds), 100 files in 100 directories (=10000 files), 30 bytes each (`date`), dockertest-instance, bash script, using fusex: creation: 105 /eos/rtb/tobbicke/backuptest/t??/tt?? append `date`: 115 clone creation: 1 append `date`: 443 append `date`: 147 "wc -c" on clones: 45 /eos/dockertest/proc/clone/1558357820/Dxxx/Fxxx "wc -c" on live files: 49 remove all clones: 2 The cloning time alone is not visible here, since it includes formatting and displaying the result. Tests would be needed on bigger instances to evaluate the impact of mgm locking. The "append date" processes naturally increase in duration - there are more data to ship to the FSTs. The first run after the clone actually includes creating the clones - hence it is more "expensive". Backup / Restore Utilities ========================== Short summary: eos-backup can be used to clone, back up and restore EOS directories. Data are stored in "blobs" in a filesystem, or in EOS or CTA. eos-backup-browser can "mount" an existing backup hierarchy (i.e., including incremental or differential backups) using fuse for easy consultation or recovery. eos-backup ---------- eos-backup clone -B /BackupPrefix [-P ] /eos/pathName - clones a directory tree, fully or based on parentId, and creates a cloneFile Outputs "cloneId catalog " eos-backup backup -B /BackupPrefix [-F catalogFile] [-P ] /eos/pathName - clones a directory tree unless a cloneFile is passed with -F, and then backs up files into /BackupPrefix.cloneId and deletes the clone. Outputs "cloneId backup media " followed by whatever the deletion of the clone says. eos-backup restore -F /inputCatalog1[,/inputCatalog2[,...]] /outputDirectory - performs a restore into outputDirectory based on 1 or more catalogFiles /BackupPrefix can be a path on a mounted file system, or root://instance//path to select a EOS/CTA store. /eos pathnames are as seen by the MGM, not in the local file system! environment (can also be specified through -U root://eos-instance): EOS_MGM_URL=root://eos-instance must point to the MGM serving /eos/pathName Example ------- A primitive full_backup/make_changes/incremental_backup/restore example in bash: # full clone and backup in one go read xx cloneId1 xxx media1 <<<$(eos-backup backup -B /tmp/Backup/ /eos/dockertest/backuptest) : make some changes, delete/add/modify files # an incremental clone based on first backup read xx cloneId2 xxx cloneFile2 <<<$(eos-backup clone -B /tmp/Backup/ -P $cloneId1 /eos/dockertest/backuptest) # back those files up read xx cloneId2 xxx media2 <<<$(eos-backup backup -B /tmp/Backup/ -F $cloneFile2 /eos/dockertest/backuptest) # restore the lot eos-backup restore -F ${media1}catalog,/tmp/Backup.$cloneId2/catalog -B /tmp/Backup /tmp/Restore2 eos-backup-browser ------------------ eos-backup-browser -F $media1/catalog[,$media2/catalog[,...]] [-L regexp1[,regexp2[,...]] /mountPoint - mounts the eos-backup tree designated by the catalog Files under /mountPoint, read-only. Files can be copied somewhere else from there, provided the backup media-files are accessible. - if '-L' is specified, it designates a list of regular expressions. All media-files containing data for files matching any of the regular expressions are listed. This can be used to establish which files would have to be restored from e.g. tape. files can be copied