If you are used to using locally attached commodity tape drives, you will have to appreciate the differences between locally attached tape and a pool of network attached tape, such as implemented in Enstore.
Therefore the main new computer systems task is to allow for data to flow between your computer and the enstore system as good rate. Enstore is deployed in a scalable fashion data are transferred between tape and network at full network rate. The Enstore system is capable of immense throughput -- each tape stream will move data at about 10 MB/sec. The steps that are needed to make data flow well to any particular computer are:
$ setup -q stken encp $ enstore monitor
http://stkensrv2.fnal.gov/enstore/to see transfer rates for your transfers.
To use enstore obtain the "encp" product from the computing division. Contact the ISD department via the Computing Division office for detailed help and to arrange for resources and other tangible items.
As you provide for an adequate computer installation, you should begin to plan how you will use your tapes.
The computing division office will provide you with a quota on robot slots and tapes. Unlike the HPSS system you can have more tapes than slots. When this is the case, your experiment needs to specify which tapes should be readily available within the tape library, and which tapes will be in the FCC vault.
In late FY2000, the need to actually do this is moot, because we are enjoying a surplus of tape slots. However, if the system is successful, this may not be the case in the future. Chances are you will become slot-limited, and will need to remove some of the datasets you are make today from the library. Therefore, it is important that you begin to use enstore by planning how files should fit onto tape, allowing for their transfer to the FCC tape vault as their use becomes less likely.
In addition to the quota issues discussed above it is often desirable to concentrate related data on the same tapes. Factors to consider are:
File family width limits the number of tape drives which will be brought to bear to write files into a file family. (note that enstore does not "Stripe" files across tape, to exploit the bandwidth advantage of two tape drives you have to transfer two files into enstore simultaneously.)
The rule of thumb for administering file family width is "no wider that needed to sustain throughput". This requires that you know the underlying bandwidth for a tape drive in the stken system and eagle library. Take this as 10 MB/sec.
Setting the file family width to a higher value than this requires some thought and consideration. Among items to consider are:
The PNFS name space provides a mechanism called "tags". "tags" are data buckets bound to directories in the PNFS name space. New directories (which are make by the unix "mkdir" command) inherit the tags of the parent directory.
Enstore picks up a default file family and a default file family
width from the directory in the name space as the file is being written
to tape. The "enstore pnfs -tags The following illustrates how to administer file families:
A side effect of this implementation is that enstore will not seek to
put a tiny file at the end of a tape volume it has finished filling.
This means the volumes in a file family will be time-ordered. This
will be useful to you the day your slot quota forces you remove, say, older
data from a file family , but not all data you put into a file file family.
encp is meant to be similar to the unix "cp" or copy utility.
its basic syntax is
One of inpath or outpath must be absolute pathnames beginning with "/pnfs".
The other path must refer to a local disk. If outpath is in the PNFS
name space then you are writing to tape. If inpath is in the PNFS name
space then you are reading from tape.
If outpath is a directory, then inpath may be list of files. If inpath
is in the PNFS space, then the request is optimized in the following way:
the names are sorted by file family and an traversal through the tapes
which minimizes mounting and spacing is arranged for.
If outpath is not in the PNFS name space it must be a disk directory or file
however the special path "/dev/null" is allowed as well. In that case the
output is written to /dev/null.
encp takes a variety of options, the ones you should concern yourself with are:
If you have to read many files, staging some at a time, it is good to think a bit about
what will make the staging efficient.
For a Simple case, where the files were written to a file family with width
of 1, traversing the files in the order they were created should prove
efficient, since enstore by default does not go back to pack little files at
the end of tapes. If you did not keep your own time ordered record of the
your files, you can use the UNIX "ls -t" command to place your files in time
order.
If your situation is more complex, then you can use enstore's pnfs command
to minimize tape mounts. This is more complicated, but if the file family
width was ever greater than one, may prove to be worth the effort.
The command yields alot of information about a specific file, in particular it
will tell you the name of the volume it was written on.
$ setup encp
$ cd /pnfs/.......
$ enstore pnfs --file_family newfilefamily
$ enstore pnfs --file_family_width 2
Time Ordering of Volumes
Transfers
encp
encp inpath outpath
--verbose=n where n is between 1 and 9 makes encp chatty
--crc computes a CRC on the your computer
--data_access_layer prints diagnostic output on stdout
Notes on Reading your tapes
enstore pnfs --info filename
Example:
$ enstore pnfs --info /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000
bfid="96204504900000L";
bfid2="96204504900000L";
volume="VO0126";
location_cookie="0000_000000000_0000031";
size="764411940L";
file_family="theory-serial-D-0048-d";
filename="/pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000";
orig_name="/pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000";
map_file="/pnfs/theory/volmap/theory-serial-D-0048-d/VO0126/0000_000000000_0000031";
pnfsid_file="000300000000000000007040";
pnfsid_map="0004000000000000000149F0";
If you eval this command in a posix shell, you will get local shell variables
you can use to drive scripts:
$ eval `enstore pnfs --info /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000`
$ echo $volume
VO0126
You can use the volume name to find all other files on that tape:
$ enstore pnfs --volume $volume
/pnfs/theory/volmap/theory-serial-D-0048-d/VO0126
$ enstore pnfs --files /pnfs/theory/volmap/theory-serial-D-0048-d/VO0126
-rw-r--r-- 1 2937 g038 764411940 Jun 25 20:43 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_740000
-rw-r--r-- 1 2937 g038 320 Jun 25 20:44 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_740000.info
-rw-r--r-- 1 2937 g038 764411940 Jun 25 21:47 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_744000
-rw-r--r-- 1 2937 g038 320 Jun 25 21:48 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_744000.info
-rw-r--r-- 1 2937 g038 764411940 Jun 25 22:53 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_748000
-rw-r--r-- 1 2937 g038 320 Jun 25 22:54 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_748000.info
-rw-r--r-- 1 2937 g038 764411940 Jun 25 23:59 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_752000
.....
Notes on Writing your tapes