Background:

The Enstore volume import system is to allow tapes generated outside of Fermilab to be added to the D0 tape inventory managed by Enstore.

This process comprises two stages: creating the tape volumes off-site, and importing these into our tape library.

Tapes that are to be added to Enstore must be properly labelled and written in a compatible format. Also, additional information (metadata) about the tape volumes and the files it contains - such as checksums - must be collected at the time the tapes are written and submitted to the Enstore administrators along with the tape volumes themselves.

A simple standalone program has been developed to facilitate the process of creating the Enstore volumes and associated metadata files. This has been developed in ANSI C for portability and to eliminate the dependence on system utility programs (e.g. tar, mt, cpio), the behavior of which can vary from system to system. It is available as a binary for Fermi-supported UNIX systems and as C source code for other systems.

Prerequisites:

You must have a compatible tape drive [XXX] which operates in non-rewinding mode, and you must have sufficient permissions to write to this device.

You must use approved tapes with barcode labels assigned by Fermilab.

The volume import software needs a directory to store its tape database. This database amasses information about files and volumes, and persists until the volumes are shipped to us. This persistent storage of metadata makes it possible to add files to a tape which had been started at an earlier date - it is not necessary to write the files to the volume all at one time.

The tape device and tape database directory can be specified on the command line or as the environment variables TAPE_DEVICE and TAPE_DB. Specifying this information as environment variables eases the use of the software somewhat, since these values will then not need to be typed on every command-line.

If the specified tape database directory is not present, it will be created when the software is run for the first time.

The enstore_tape program:

The program that is used to write (and later, read) Enstore tapes is called enstore_tape. It has four main modes of operation, selected by the first (non-optional) command-line argument, which must be one of --init, --write, --dump-db, or --read. The use of each of these options is explained in the following.

enstore_tape --init

This must be run to label a new tape and initialize a database entry for this tape, prior to writing any files to the tape.

Usage:

enstore_tape --init [--tape-device=devname] [--tape-db=dbdir] 
 [--verbose] [--erase]  --volume-label=label
If TAPE_DEVICE or TAPE_DB are set in the environment, the corresponding command-line arguments are not necessary. $TAPE_DEVICE must be a non-rewinding device, and $TAPE_DB must be a path to a directory (which will be created if needed) where the user has write permission.

The volume label must be a legal volume label, matching the external barcode label on the tape

If the tape is already labelled with a VOL1 header, or if the local tape database already has an entry for the given volume label, then enstore_tape --init will refuse to relabel the tape. In order to override this, use the --erase option, which erases both the existing tape label and the local database entries. Use this option with caution.

enstore_tape --write

Once a tape is labelled, you can begin adding files to it. To do this use the --write mode of the enstore_tape program.

Usage:

enstore_tape --write [--tape-device=devname] [--tape-db=dbdir]
  [--verbose] --volume-label=label file_list [file_list...]
tape-device, tape-db, and volume-label are as described above.

volume-label must match a label already existing in the local database (i.e. the tape must have been labeled by enstore_tape --init).

Each file_list takes the form:

  --pnfs-dir=path [--strip-path=path] filename [filename...]
--pnfs-dir specifies the directory in the PNFS file space (i.e. the file namespace within Enstore itself) where the files are to appear, when the tape is actually added to the Enstore library. These paths must start with /pnfs. The --pnfs-dir argument is "sticky", that is, it applies to all subsequent files until another --pnfs-dir argument is specified. --strip-path specifies a leading pathname component which is to be stripped from the filenames when they are imported into Enstore. (This argument may be omitted). Finally, one or more filenames are specified.

A few examples may clarify this usage:

To specify all local files in the directory /tmp/sim/data starting with "MC", and cause them to be imported into the PNFS filesystem in the directory /pnfs/test/data, use

  --pnfs-dir=/pnfs/test --strip-path=/tmp/sim/ /tmp/sim/data/MC*
To specify all files in the current directory, and insert them into the PNFS file system in /pnfs/test

  --pnfs-dir=/pnfs/test *
Multiple file_lists may be specified.

Tapes need not be rewound after writing. This is convenient in the case that further files are to be appended to the tape.

Note that currently enstore_tape --write does not descend into subdirectories. All of the filenames specified on the command line must be files rather than directories. If any of the filename arguments are directories they will not be written to tape (and an error message will be printed). This may be changed in a future version of the program.

enstore_tape --dump-db

The final step (prior to shipping the tapes) is to turn the local database directory into a flat file so that it can be easily submitted via FTP or electronic mail. The --dump-db option of enstore_tape accomplishes this task.

Usage:

enstore_tape --dump-db [--tape-db=dbdir] > output_file
If tape-db is not specified, the value of the environment variable TAPE_DB is used.

Reading Enstore Tapes

Since the format used for writing Enstore tapes is based on Unix standards, Enstore tapes can be read without needing special software. You can use the Unix commands mt, dd, and cpio to read Enstore tapes. The GNU version of cpio is suggested, although other versions will probably work (the cpio flags in the example below will need to be changed if you use a non-GNU cpio.)

Assuming that the tape device is /dev/tape, to read back the third file from a tape, you would use the following commands

#rewind the tape
mt -f /dev/tape rewind   
#skip the VOL1 header and the first two files
mt -f /dev/tape fsf 3                 
#extract the cpio archive contents
dd bs=32768 if=/dev/tape | cpio -idv --no-absolute-filenames  
After performing these steps, the tape will be positioned and ready for extracting the fourth file. To reposition the tape to read the n'th file repeat the mt rewind and mt fsf commands.

For simplicity, and to reduce the dependence on external utility programs, an --read option to enstore_tape, similar to the --write option, is planned. This option is not yet implemented.

Implementation Details:

The local TAPE_DB database simply uses the Unix directory structure to arrange keys, subkeys, and values as directories, subdirectories, and files. This allows simple shell-scripts to be written to query the local database.

Tape volumes begin with a modified ANSI VOL1 header: 80 bytes of data, starting with "VOL1", followed by the volume label, padded by space characters up to a final character of ASCII "0" (not NUL). Files are written in variable-blocksize mode, with a default blocksize of 32768, in Posix cpio-odc format. Files are separated by a standard EOF marker. At the end of the tape come 2 EOF markers.

In order to make it easy to add files to an existing tape without rewinding and seeking to the end of data, an EOT header is written *after* the 2 EOF markers. The EOT header is similar to the VOL1 header, except that it starts with EOT and 7 ASCII digits giving the number of files already written to this tape. Following this is the volume label as in the VOL1 header.

After files are written to tape, the EOT header is written, and the tape drive backspaces to the beginning of the EOT header. On subsequent writes, the enstore_tape program will check to see if such an EOT header is present at the current tape location; if it is, (and the volume names and file counts match) it is safe to continue writing to this tape without rewinding and seeking to end of tape - we simply skip backwards over one of the EOF markers preceding the EOT header. If the EOT header is not found, the tape is rewound and the VOL1 header is sought.

Checksums are generated using the Adler32 algorithm. In addition to a checksum of the entire file, a "sanity" checksum (for early error-detection) is generated using the first 65536 bytes of the file (if the length of the file exceeds 65536 bytes).


Legal Notices
Charles G Waldman
Last modified: Fri Aug 12 16:19:27 CDT 2005