Enstore Users Guide

System Engineering

Enstore v.s. Locally attached tape

If you are used to using locally attached commodity tape drives, you will have to appreciate the differences between locally attached tape and a pool of network attached tape, such as implemented in Enstore.

  • + You will see higher availability since the failure of any one tape device will not significantly reduce your ability to analyze your data.
  • + You will see much lower mount latencies since tape mounts are automated.
  • - You will burn more CPU moving the data to and from tape because IP data transfers are inherently less efficient than direct SCSI transfers.
  • - Expansion of the Tape I/O capacity of large servers requires relatively more capacity planning and system engineering compared to locally attached tape.
  • + For most applications, expansion of Tape I/O to farms of P.C.'s requires less planning, since the network I/O, when implemented by a switched Ethernet scales naturally by just expanding the system.
  • + You get, for free, a distributed catalog of your files, usable from any computer you use for analysis. You can manipulate this catalog with UNIX commands, like "ls" and"find".
  • - The utility of the catalog is limited by the UNIX commands; You had better not do silly, unscalable things like put alot of files into a single directory.
  • + Since Enstore decouples tape library slots from the number of tape volumes you have. You are still limited only by the number of tapes you can afford, not by the number of tape slots the Computing Division can afford to buy.
  • + You are able to control which files are place on which tapes, and able to explicitly iterate though tapes while reading files, if you are used to working in this style. However the mechanisms used to do this need to be learned.
  • Therefore the main new computer systems task is to allow for data to flow between your computer and the enstore system as good rate. Enstore is deployed in a scalable fashion data are transferred between tape and network at full network rate. The Enstore system is capable of immense throughput -- each tape stream will move data at about 10 MB/sec. The steps that are needed to make data flow well to any particular computer are:

  • Specify the number and type of Network Interface cards for your computer to allow good throughput, keeping in mind that a unit of throughput is 10 MB/sec.
  • Have the network group analyze the network between your system and the Enstore servers, the goal being to guarantee throughput and eliminate the any substantial chance of packet loss.
  • Unit test the network. looking for throughput of about 10 MB/sec. Enstore provides the command "Enstore Monitor" to do this:
    
               $ setup -q stken  encp
    	   $ enstore monitor
    
    
  • Check that the file systems you will use to hold data have good throughput, for example by using the "bonnie" benchmark. Make sure they will transfer data at rates greater than 10 MB/sec.
  • examine ans regularise the UIDs and GIDs on all computert systems used by your experiment.
  • Use the encp command to test actual data transfer rates of large files (bigger than 200 MB) between your system and tape. You can use the web pages at:
    
         http://stkensrv2.fnal.gov/enstore/
    
    
    to see transfer rates for your transfers.
  • Resolve any problems.
  • Software

    To use enstore obtain the "encp" product from the computing division. Contact the ISD department via the Computing Division office for detailed help and to arrange for resources and other tangible items.

    Tape Planning

    As you provide for an adequate computer installation, you should begin to plan how you will use your tapes.

    Volume and Slot Quotas

    The computing division office will provide you with a quota on robot slots and tapes. Unlike the HPSS system you can have more tapes than slots. When this is the case, your experiment needs to specify which tapes should be readily available within the tape library, and which tapes will be in the FCC vault.

    In late FY2000, the need to actually do this is moot, because we are enjoying a surplus of tape slots. However, if the system is successful, this may not be the case in the future. Chances are you will become slot-limited, and will need to remove some of the datasets you are make today from the library. Therefore, it is important that you begin to use enstore by planning how files should fit onto tape, allowing for their transfer to the FCC tape vault as their use becomes less likely.

    Putting data on tape

    In addition to the quota issues discussed above it is often desirable to concentrate related data on the same tapes. Factors to consider are:

  • The need to minimize tape mounts (which cause wear on the tape and delays in your processing).
  • Limited and controllable impact to your statistics should a tape be damaged and files become unrecoverable.
  • Allow you to conserver your tape quote by scratching and re-using tape volumes when the data on them are no longer meaningful.
  • Allow you to manage your tape library slot quota by removing tapes unlikely to be accessed to the Fermilab Tape vault.
  • File families

    The mechanism provided by enstore is called "file families". It has two components, a "name" and a "width". These are completely administered by the experiment.

    File family names

  • Each non-empty tape has a string that the experiment makes up called a "file family" associated with it. The name should consist of alphanumeric characters and the special characters "_" and "-", except ".".
  • Any given file being written to tape has a file family name associated with it (How this is done is discussed below). The file will either be written to a tape with other files in the same family or to a new,blank volume.
  • A tape is closed for writing when Enstore estimates that the next file to be put on it will not fit.
  • A tape is closed for writing when current file being written to it does not fit. (n.b. the write is re-tried on a new volume, transparently to you when this happens)
  • File Family Width

    File family width limits the number of tape drives which will be brought to bear to write files into a file family. (note that enstore does not "Stripe" files across tape, to exploit the bandwidth advantage of two tape drives you have to transfer two files into enstore simultaneously.)

    The rule of thumb for administering file family width is "no wider that needed to sustain throughput". This requires that you know the underlying bandwidth for a tape drive in the stken system and eagle library. Take this as 10 MB/sec.

    Setting the file family width to a higher value than this requires some thought and consideration. Among items to consider are:

  • If I am rate-limited writing to enstore, is the limitation really the number of tape drives? Or is there a network or local bottleneck?
  • If I raise the width, will my files be less concentrated on the minimal number of tape volumes? Will this enhance or confound the subsequent reading of the data?
  • Administering File families

    The PNFS name space provides a mechanism called "tags". "tags" are data buckets bound to directories in the PNFS name space. New directories (which are make by the unix "mkdir" command) inherit the tags of the parent directory.

    Enstore picks up a default file family and a default file family width from the directory in the name space as the file is being written to tape. The "enstore pnfs -tags " command is used by the experiment to administer these default file families and file family widths.

    The following illustrates how to administer file families:

    
    $ setup encp
    $ cd /pnfs/.......
    $ enstore pnfs --file_family newfilefamily
    $ enstore pnfs --file_family_width 2
    

    Time Ordering of Volumes

    A side effect of this implementation is that enstore will not seek to put a tiny file at the end of a tape volume it has finished filling. This means the volumes in a file family will be time-ordered. This will be useful to you the day your slot quota forces you remove, say, older data from a file family , but not all data you put into a file file family.

    Transfers

    encp

    encp is meant to be similar to the unix "cp" or copy utility. its basic syntax is

    
        encp inpath outpath
    
    

    One of inpath or outpath must be absolute pathnames beginning with "/pnfs". The other path must refer to a local disk. If outpath is in the PNFS name space then you are writing to tape. If inpath is in the PNFS name space then you are reading from tape.

    If outpath is a directory, then inpath may be list of files. If inpath is in the PNFS space, then the request is optimized in the following way: the names are sorted by file family and an traversal through the tapes which minimizes mounting and spacing is arranged for.

    If outpath is not in the PNFS name space it must be a disk directory or file however the special path "/dev/null" is allowed as well. In that case the output is written to /dev/null.

    encp takes a variety of options, the ones you should concern yourself with are:

    
    
    --verbose=n           where n is between 1 and 9       makes encp chatty
    --crc                                                  computes a CRC on the your computer
    --data_access_layer                                    prints diagnostic output on stdout
    
    
    

    Notes on Reading your tapes

    If you have to read many files, staging some at a time, it is good to think a bit about what will make the staging efficient.

    For a Simple case, where the files were written to a file family with width of 1, traversing the files in the order they were created should prove efficient, since enstore by default does not go back to pack little files at the end of tapes. If you did not keep your own time ordered record of the your files, you can use the UNIX "ls -t" command to place your files in time order.

    If your situation is more complex, then you can use enstore's pnfs command to minimize tape mounts. This is more complicated, but if the file family width was ever greater than one, may prove to be worth the effort. The command yields alot of information about a specific file, in particular it will tell you the name of the volume it was written on.

    
     enstore pnfs --info filename
    
    
     Example:
    
    
    
    $ enstore pnfs --info  /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000
    bfid="96204504900000L";
    bfid2="96204504900000L";
    volume="VO0126";
    location_cookie="0000_000000000_0000031";
    size="764411940L";
    file_family="theory-serial-D-0048-d";
    filename="/pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000";
    orig_name="/pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000";
    map_file="/pnfs/theory/volmap/theory-serial-D-0048-d/VO0126/0000_000000000_0000031";
    pnfsid_file="000300000000000000007040";
    pnfsid_map="0004000000000000000149F0";
    
    
    If you eval this command in a posix shell, you will get local shell variables you can use to drive scripts:
    
    $ eval `enstore pnfs --info  /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_800000`
    $ echo $volume
    VO0126
    
    
    You can use the volume name to find all other files on that tape:
    
    $ enstore pnfs --volume $volume
    /pnfs/theory/volmap/theory-serial-D-0048-d/VO0126
    $ enstore pnfs --files /pnfs/theory/volmap/theory-serial-D-0048-d/VO0126
    -rw-r--r--   1 2937     g038     764411940 Jun 25 20:43 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_740000
    -rw-r--r--   1 2937     g038          320 Jun 25 20:44 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_740000.info
    -rw-r--r--   1 2937     g038     764411940 Jun 25 21:47 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_744000
    -rw-r--r--   1 2937     g038          320 Jun 25 21:48 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_744000.info
    -rw-r--r--   1 2937     g038     764411940 Jun 25 22:53 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_748000
    -rw-r--r--   1 2937     g038          320 Jun 25 22:54 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_748000.info
    -rw-r--r--   1 2937     g038     764411940 Jun 25 23:59 /pnfs/theory/serial/D/0048/d/ser_D48_qf_d_d_0.1373_1.46_752000
    .....
    
    

    Notes on Writing your tapes

  • If you have many files for different file families, consider putting the files for a given family all at once.