D0EnSrv2
$Revision: 1.6 $
$Date: 1999/11/19 16:32:37 $GMT
D0EnSrv2 is the node that runs the "base" Enstore servers, namely, the
Configuration Server, the Logger, the Alarm Server, and the
Inquisitor. It is also the central D0En Patrol monitoring node as
well as the primary web server.
Configuration Server
-
Detailed Information on the Enstore Configuration Server is
available from the Enstore Technical
Document. Command details are covered there.
- Complete
Configuration Information is available on the web. You can also
see it directly by
- Logging in as enstore to one of the console servers
- Type "setup enstore"
- Type " enstore config --show
- New configurations (Be careful! You need 20 years experience
before you do this.) can be loaded by
- Logging in as enstore to one of the console servers
- Type "setup enstore"
- type "python < new_config_file >" and verify there are
no errors.
- Type " enstore config --load --config_file= <
new_config_file >"
Logger
-
Detailed Information on the Enstore Logger is available from the
Enstore
Technical Document. Command details are covered there.
- The log information ultimately goes into a file. This file is
named by LOG-YYYY-MM-DD. New logs are opened every midnight. The
logs are backed up to d0ensrv3 every 15 minutes.
- The
Enstore Logs are available on the web. They are current when you
load the page, but you need to hit the "reload" button to keep them
current.
- One nice feature about the web interface is that you can
search a subset of log files.
- You can also check the log out directly by looking at the log
files on d0ensrv2:/diska/enstore-log. An easy way to understand what
is currently going on in the Enstore system is to "tail -f <
log_file >.
- Logs are backed up every 15 minutes to the d0ensrv3.
Alarm Server
-
Detailed Information on the Alarm server is available from the
Enstore
Technical Document. Command details are covered there.
-
Current Alarms need to be checked and corrected by an
administrator since Enstore attempts error recovery whenever
possible. Any alarms will need human intervention to correct the
problem.
- There is more Information
available on watching alarms.
Inquisitor
-
Detailed Information on the Inqusitor server is available from
the Enstore
Technical Document. Command details are covered there.
- The Inquistor is the source for Information
about what is happening in Enstore. It queries and checks everything
possible. There is no reason for user's to perform any other checks
themselves.
- To reduce load and traffic, the Inquisitor performs its checks
periodically. These values are tunable. Information on the web pages
may be several minutes old and not represent exactly what is
happening.
- If the Inquisitor discovers that an Enstore component is not
responding to pings, it will try to restart it. If it can't it will
generate an alarm.
- There are several actions done on d0ensrv2 because the Inquisitor
is running and it needs local files to present the data:
- A Tape
Inventory is made every night and stored to
/diska/tape-inventory.
- The
AML/2 Log files are fetched every 15 minutes. The Inquisitor
Organizes
these and other log files for the user.
- A whole series of
Miscellaneous Status commands are executed to give the user
more information about the d0en cluster.
Patrol
Web Server
- The standard apache web server, from KITS, is used to provide
Enstore's web pages.
- The web server must be running on the same node as the Inqusitor runs.
- The web page should be accessed by using the alias www-d0en and
not d0ensrv2. This allows us to change the web server node without changing the
web address.
- The Enstore pages are rooted at /local/ups/prd/www_pages/enstore,
the patrol pages at /local/ups/prd/www_pages/patrol, and apache's own
log at /local/ups/prd/www_pages/logs and/var/adm/www/d0en
Cron jobs
. executing on d0ensrv1:
- user enstore - log-stash. If this fails, ?
- user enstore - tape_inventory. If this fails, most likely the
file and/or the volume clerk are not working.
- user enstore - inqPlotUpdate. If this fails, mostly likely the
Inqusitor is not working.
- user enstore - aml2logs. If this fails, most likely the AML/2
OS/2 node, adic2, or its network connection is not working.
- user enstore - rdist-log. If this fails, mostly likely the backup
node, d0ensrv3, or its network connection is not working.
- user patrol - patrol.job. If this fails, ?
- user root - monthlystats. This always fails and needs to be
deleted or fixed!
- user root - tarit. If this fails, the most likely reason is
because either rip8 (the backup node for the ide system disk) or the
network connection to rip8 is down.
- user root - chkcrons. If this fails, the most likely reason is
because one of the d0en nodes, or its network connection, is down.
Bootup
- The Enstore servers are started.
- FTT makes /dev/rmt scsi devices.
- The apache web server is started.
- The BMC watchdog is armed and deadman is started at real-time
priority.
Some Details:
- 2 apache related accounts, wsrvd0en and wadmd0en, are needed.