For each CE to monitor run
check_arcce -H <HOST> submit
This should be run at a relatively low frequency in order to let one job finish before the next is submitted. The probe keeps track of submitted jobs, and will hold the next submission if necessary. Subsequent sections describe additional options for testing data-staging, running custom scripts, etc.
On a more regular basis, each 5 min or so, run
check_arcce -H <NAGIOS-HOST> monitor
which will monitor all job status of each host and submit it passively to a service matching the host name and the service description “ARCCE Job Termination”. The passive service name can be configured.
Finally, a probe is provided to tidy the ARC job list after unsuccessful attempts by check_arcce monitor to clean jobs. This is also set up as a single service, and only needs to run occasionally, like once a day:
check_arcce -H <NAGIOS-HOST> clean
For additional options, see
check_arcce --help
check_arcce submit --help
check_arcce monitor --help
check_arcce clean --help
The main configuration section for this probe is arcce, see Configuration Files. This probe requires an X509 proxy, see Proxy Certificate.
Connection URLs for job submission (the --ce option) may be specified in the section arcce.connection_urls.
Example:
[arcce]
voms = ops
user_cert = /etc/nagios/globus/robot-cert.pem
user_key = /etc/nagios/globus/robot-key.pem
loglevel = DEBUG
[arcce.connection_urls]
arc1.example.org = ARC1:https://arc1.example.org:443/ce-service
arc0.example.org = ARC0:arc0.example.org:2135/nordugrid-cluster-name=arc0.example.org,Mds-Vo-name=local,o=grid
The user_key and user_cert options may be better placed in the common gridproxy section.
You will need command definitions for monitoring and submission:
define command {
command_name check_arcce_monitor
command_line $USER1$/check_arcce -H $HOSTNAME$ monitor
}
define command {
command_name check_arcce_clean
command_line $USER1$/check_arcce -H $HOSTNAME$ clean
}
define command {
command_name check_arcce_submit
command_line $USER1$/check_arcce -H $HOSTNAME$ submit \
[--test <test_name> ...]
}
For monitoring, add a single service like
define service {
use monitoring-service
host_name localhost
service_description ARCCE Monitoring
check_command check_arcce_monitor
}
define service {
use monitoring-service
host_name localhost
service_description ARCCE Cleaner
check_command check_arcce_clean
normal_check_interval 1440
retry_check_interval 120
}
For each host, add something like
define service {
use submission-service
host_name arc0.example.org
service_description ARCCE Job Submission
check_command check_arcce_submit
}
define service {
use passive-service
host_name arc0.example.org
service_description ARCCE Job Termination
check_command check_passive
}
The --test <test_name> options enables tests to run in addition to a plain job submission. They are specified in individual sections of the configuration files as described below. Such a test may optionally submit the results to a named passive service instead of the above termination service. To do so, add the Nagios configuration for the service and duplicate the “service_description” in the section defining the test.
See the arcce-example.cfg for a more complete Nagios configuration.
By default, running jobs are tracked on a per-host basis. To define multiple job submission services for the same host, pass to --job-tag a tag which identify the service uniquely on this host. Remember to also add a passive service and pass the corresponding --termination-service option.
The scheme for configuring an auxiliary submission/termination service is:
define command {
command_name check_arcce_submit_<test_name>
command_line $USER1$/check_arcce -H $HOSTNAME$ submit \
--job-tag <test_name> \
--termination-service 'ARCCE Job Termination for <Test-Description>' \
[--test <test1> ...]
}
define service {
use submission-service
host_name arc0.example.org
service_description ARCCE Job Submission for <Test-Description>
check_command check_arcce_submit_<test_name>
}
define service {
use passive-service
host_name arc0.example.org
service_description ARCCE Job Termination for <Test-Description>
check_command check_passive
}
It is possible to add custom commands to the job scripts and do a regular expression match on the output. E.g. to test that Python is installed and report the version, add the following section to the plugin configuration file:
[arcce.python]
jobplugin = scripted
required_programs = python
script_line = python -V >python.out 2>&1
output_file = python.out
output_pattern = Python\s+(?P<version>\S+)
status_ok = Found Python version %(version)s.
status_critical = Python version not found in output.
service_description = ARCCE Python version
The options are
See arcnagios.ini for more examples.
The “staging” job plug-in checks that file staging works in connection with job submission. It is enabled with --test <test-name> where the plugin configuration file contains a corresponding section:
[arcce.<test-name>]
jobplugin = staging
staged_inputs = <URL> ... <URL>
staged_outputs = <URL> ... <URL>
service_description = <TARGET-FOR-PASSIVE-RESULT>
Note that the URLs are space-separated. They can be placed separate indented lines. Within the URLs, the following substitutions may be useful:
If a staging check fails, the whole job will fail, so it’s status cannot be submitted to an individual passive service as with scripted checks. For this reason, it may be preferable to create one or more individual submission services dedicated to file staging. Remember to pass unique names to --job-tag to isolate them.
If the generated job scripts and job descriptions are not sufficient, you can provide hand-written ones by passing the --job-description option to the submit subcommand. This option is incompatible with --test.
Currently no substitutions are done in the job description file, other than what may be provided by ARC.