D0EnSrv1
$Revision$
$Date$GMT
D0EnSrv1 is the node that contains all information that needs to be
backed up and preserved. That is, it has the Enstore file and volume
databases, and the pnfs databases.
File Clerk
-
Detailed Information on the Enstore File Clerk is available from
the Enstore
Technical Document. Command details are covered there.
- The file clerk's database is /diskb/enstore-database/file. This
file is on a raid level-1 (software mirroring) disk.
- A backup is made once an hour and copied to d0ensrv3.
- Rsync is run once every 5 minutes and a complete copy of the
directory tree is made to a 2nd (independent) raid disk on
d0ensrv1.
- The file clerk's database is journaled to
/diska/enstore-journal/file.jou. This is an ascii readable file and is
the final authority in the cases of conflicting data (resulting from
corruption). This file is also on a mirrored disk and rsynced to
another mirrored disk.
-
Complete information on a particular bfid can be seen by using
the Enstore User command web page. Alternatively,
- Logging in as enstore to one of the console servers
- Type "setup enstore"
- Type " enstore file --bfid= < bfid >
Volume Clerk
-
Detailed Information on the Enstore Volume Clerk is available
from the
Enstore Technical Document. Command details are covered there.
- The volume clerk's database is /diskb/enstore-database/volume. This
file is on a raid level-1 (software mirroring) disk.
- A backup is made once an hour and copied to d0ensrv3.
- Rsync is run once every 5 minutes and a complete copy of the
directory tree is made to a 2nd (independent) raid disk on
d0ensrv1.
- The volume clerk's database is journalled to
/diska/enstore-journal/volume.jou. This is an ascii readable file and
is the final authority in the cases of conflicting data (resulting
from corruption). This file is also on a mirrored disk and rsynced to
another mirrored disk.
-
Complete information on a particular volume can be seen by using
the Enstore User command web page. Alternatively
- Logging in as enstore to one of the console servers
- Type "setup enstore"
- Type " enstore volume --vol= < label >
Databases
- One, shared between the file and volume clerk, copy of
db_deadlock must be running to prevent database deadlocks.
- One, shared between the file and volume clerk, copy of
db_checkpoint must be running to provide frequent
flushing/checkpointing of data to the database files.
- In the case of database corruption,
- Log in to d0ensrv1 as enstore
- Stop all Enstore processes on the node "setup enstore"
"enstore stop". Make sure they are stopped. "lsof | egrep
'/diska|/diskb' shows you who is still accessing the databases.
Stop everyone.
This information is also available on a web page.
- Make sure the inquisitor doesn't start Enstore up while you
are working.
- Type "cd /diskb/enstore-database"
- Type "db_recover -h . -v" and then try to restart the Enstore
servers.
- If that fails (it almost never does), rm __db* *.index log.*
and then try to restart the Enstore servers.
- If that fails, it is time to go to the database backups!
Contact an Enstore developer for help.
- Restart
the Enstore servers.
PNFS
- PNFS: Perfectly
Normal File System, written and maintained by DESY, provides the
namespace for the files that D0 transfers in and out of the robot.
-
Detailed Information on how Enstore uses DESY's PNFS is available
from the
Enstore Technical Document. Installation procedures and command
details are covered there.
- The root of the PNFS directory tree is /diska/pnfs. This
directory is on a raid level-1 (software mirroring) disk.
- A backup is made once an hour and copied to d0ensrv3.
- Rsync is run once every 5 minutes and a complete copy of the
directory tree is made to a 2nd (independent) raid disk on
d0ensrv1.
- Two files, owned and readable by root, must be in /usr/etc,
pnfsObserver and pnfsSetup, for the automated pnfs scripts to work.
- You can gain access to the PNFS admin commands by logging into
d0ensrv1 as root and typing "setup pnfs", and "mount /pnfs/fs".
Typical commands are:
- Stopping manually: pnfs.server stop
- Starting manually: pnfs.server start
- Checking export list: pmount show host < nodename >
- Adding a host, and ONLY AFTER PERMISSION FROM LEE
LUEKING or his superiors, pmount add < nodename > <
mountpoint >
- A remote node needs an entry in their fstab similar to
"d0ensrv1:/sam-ait /pnfs/sam/ait nfs user,intr,bg,hard,rw,noac 0 0",
in order to be able to mount pnfs.
- It is also possible for nodes to automount pnfs. We do not
provide any support to help people with their automount problems
whatsoever.
- The PNFS wormhole is in /pnfs/fs/admin/etc/config/flags/
- The PNFS exports list is in /pnfs/fs/admin/etc/exports/
- The PNFS user tree starts at /pnfs/fs/usr/.
- PNFS metadata information can be examined by logging into
d0ensrv1 and typing "setup enstore" and then "enstore pnfs cat filename 4"
- Sometimes if there is a crash and the raid disks are not shut
down properly, it takes an extended time to fix them on the next
reboot. If this happens, PNFS may not start correctly and may have to
be stopped and restarted. Sometimes it is better to reboot and start
again after the disk fixes have been resolved.
Cron jobs
. executing on d0ensrv4:
- user root - plog. If this fails, either you are out of disk
space on /diska for the pnfs logs or else something is not configured
correctly. (This cronjob runs in the pnfs framework, not the Enstore
one).
- user root - pnfsFastBackup. If this fails either you are out of
disk space on /diska for the backups, or the backup node, d0ensrv3, is
down or out of disk space, or the network connection to the backup
node is down. (This cronjob runs in the pnfs framework, not the
Enstore one).
- user root - delfile. If this fails, there are most likely
problems with the file clerk.
- user root - rsync. If this fails, there are problems with the
raid disks, /diska or /diskb.
- user enstore backup. If this fails either you are out of disk
space on /diskb for the backups, or the backup node, d0ensrv3, is down
or out of disk space, or the network connection to the backup node is
down.
- user enstore - checkfile. This doesn't work right now.
- user enstore - checkvolume. This doesn't work right now.
- user root - tarit. If this fails, the most likely reason is
because either rip8 (the backup node for the ide system disk) or the
network connection to rip8 is down.
Raid Software
- Mirroring has been disabled in the standard linux kernel. To
re-enable it, a patch from
ftp://ftp.kernel.org/pub/linux/daemons/raid/alpha/raid0145-19990824-2.2.11.patch
was applied to the 2.2.13 linux kernel. This went smoothly.
- Up to date raid tools, not available via the standard
distribution, also had to be installed to make mirroring work
properly. These raidtools were gotten from the same place as the raid
patch,
ftp://ftp.kernel.org/pub/linux/daemons/raid/alpha/raidtools-19990824-0.90.tar.gz
- The raid devices are declared in /etc/raidtab (not the older
/etc/mdtab). There are 2 raid disks on d0ensrv1 and each one is
composed of 2 9 GB scsi disks to form 1 9 GB raid mirrored disk.
There are no spare disks in the configuration.
- You can check the Raid Status
Page to look for errors. This page is updated every few
minutes. You can also log into d0ensrv1 and type "raidinfo" to get
current values. One needs to pay particular attention to the 1st few
lines that show the disk's current state.
Bootup
- Raidstart is invoked.
- Each raid disk has e2fsck run on it to correct any errors
- The pnfs servers are started.
- The Enstore servers are started.
- FTT makes /dev/rmt scsi devices.
- The BMC watchdog is armed and deadman is started at real-time
priority.
Some Details: