NAME
sge_pe - Sun Grid Engine parallel environment configuration
file format
DESCRIPTION
Parallel environments are parallel programming and runtime
environments allowing for the execution of shared memory or
distributed memory parallelized applications. Parallel
environments usually require some kind of setup to be opera-
tional before starting parallel applications. Examples for
common parallel environments are shared memory parallel
operating systems and the distributed memory environments
Parallel Virtual Machine (PVM) or Message Passing Interface
(MPI).
sge_pe allows for the definition of interfaces to arbitrary
parallel environments. Once a parallel environment is
defined or modified with the -ap or -mp options to qconf(1)
and linked with one or more queues via pe_list in
queue_conf(5) the environment can be requested for a job via
the -pe switch to qsub(1) together with a request of a range
for the number of parallel processes to be allocated by the
job. Additional -l options may be used to specify the job
requirement to further detail.
Note, Sun Grid Engine allows backslashes (\) be used to
escape newline (\newline) characters. The backslash and the
newline are replaced with a space (" ") character before any
interpretation.
FORMAT
The format of a sge_pe file is defined as follows:
pe_name
The name of the parallel environment as defined for pe_name
in sge_types(1). To be used in the qsub(1) -pe switch.
slots
The number of parallel processes being allowed to run in
total under the parallel environment concurrently. Type is
number, valid values are 0 to 9999999.
user_lists
A comma separated list of user access list names (see
access_list(5)). Each user contained in at least one of the
enlisted access lists has access to the parallel environ-
ment. If the user_lists parameter is set to NONE (the
default) any user has access being not explicitly excluded
via the xuser_lists parameter described below. If a user is
contained both in an access list enlisted in xuser_lists and
user_lists the user is denied access to the parallel
environment.
xuser_lists
The xuser_lists parameter contains a comma separated list of
so called user access lists as described in access_list(5).
Each user contained in at least one of the enlisted access
lists is not allowed to access the parallel environment. If
the xuser_lists parameter is set to NONE (the default) any
user has access. If a user is contained both in an access
list enlisted in xuser_lists and user_lists the user is
denied access to the parallel environment.
start_proc_args
The invocation command line of a start-up procedure for the
parallel environment. The start-up procedure is invoked by
sge_shepherd(8) prior to executing the job script. Its pur-
pose is to setup the parallel environment correspondingly to
its needs. An optional prefix "user@" specifies the user
under which this procedure is to be started. The standard
output of the start-up procedure is redirected to the file
REQNAME.poJID in the job's working directory (see qsub(1)),
with REQNAME being the name of the job as displayed by
qstat(1) and JID being the job's identification number.
Likewise, the standard error output is redirected to
REQNAME.peJID
The following special variables being expanded at runtime
can be used (besides any other strings which have to be
interpreted by the start and stop procedures) to constitute
a command line:
$pe_hostfile
The pathname of a file containing a detailed descrip-
tion of the layout of the parallel environment to be
setup by the start-up procedure. Each line of the file
refers to a host on which parallel processes are to be
run. The first entry of each line denotes the hostname,
the second entry the number of parallel processes to be
run on the host, the third entry the name of the queue,
and the fourth entry a processor range to be used in
case of a multiprocessor machine.
$host
The name of the host on which the start-up or stop pro-
cedures are started.
$job_owner
The user name of the job owner.
$job_id
Sun Grid Engine's unique job identification number.
$job_name
The name of the job.
$pe The name of the parallel environment in use.
$pe_slots
Number of slots granted for the job.
$processors
The processors string as contained in the queue confi-
guration (see queue_conf(5)) of the master queue (the
queue in which the start-up and stop procedures are
started).
$queue
The cluster queue of the master queue instance.
stop_proc_args
The invocation command line of a shutdown procedure for the
parallel environment. The shutdown procedure is invoked by
sge_shepherd(8) after the job script has finished. Its pur-
pose is to stop the parallel environment and to remove it
from all participating systems. An optional prefix "user@"
specifies the user under which this procedure is to be
started. The standard output of the stop procedure is also
redirected to the file REQNAME.poJID in the job's working
directory (see qsub(1)), with REQNAME being the name of the
job as displayed by qstat(1) and JID being the job's iden-
tification number. Likewise, the standard error output is
redirected to REQNAME.peJID
The same special variables as for start_proc_args can be
used to constitute a command line.
allocation_rule
The allocation rule is interpreted by the scheduler thread
and helps the scheduler to decide how to distribute parallel
processes among the available machines. If, for instance, a
parallel environment is built for shared memory applications
only, all parallel processes have to be assigned to a single
machine, no matter how much suitable machines are available.
If, however, the parallel environment follows the distri-
buted memory paradigm, an even distribution of processes
among machines may be favorable.
The current version of the scheduler only understands the
following allocation rules:
<int>: An integer number fixing the number of processes
per host. If the number is 1, all processes have
to reside on different hosts. If the special
denominator $pe_slots is used, the full range of
processes as specified with the qsub(1) -pe switch
has to be allocated on a single host (no matter
which value belonging to the range is finally
chosen for the job to be allocated).
$fill_up: Starting from the best suitable host/queue, all
available slots are allocated. Further hosts and
queues are "filled up" as long as a job still
requires slots for parallel tasks.
$round_robin:
From all suitable hosts a single slot is allocated
until all tasks requested by the parallel job are
dispatched. If more tasks are requested than suit-
able hosts are found, allocation starts again from
the first host. The allocation scheme walks
through suitable hosts in a best-suitable-first
order.
control_slaves
This parameter can be set to TRUE or FALSE (the default). It
indicates whether Sun Grid Engine is the creator of the
slave tasks of a parallel application via sge_execd(8) and
sge_shepherd(8) and thus has full control over all processes
in a parallel application, which enables capabilities such
as resource limitation and correct accounting. However, to
gain control over the slave tasks of a parallel application,
a sophisticated PE interface is required, which works
closely together with Sun Grid Engine facilities. Such PE
interfaces are available through your local Sun Grid Engine
support office.
Please set the control_slaves parameter to false for all
other PE interfaces.
job_is_first_task
The job_is_first_task parameter can be set to TRUE or FALSE.
A value of TRUE indicates that the Sun Grid Engine job
script already contains one of the tasks of the parallel
application (the number of slots reserved for the job is the
number of slots requested with the -pe switch), while a
value of FALSE indicates that the job script (and its child
processes) is not part of the parallel program (the number
of slots reserved for the job is the number of slots
requested with the -pe switch + 1).
If wallclock accounting is used (execd_params
ACCT_RESERVED_USAGE and/or SHARETREE_RESERVED_USAGE set to
TRUE) and control_slaves is set to FALSE, the
job_is_first_task parameter influences the accounting for
the job: A value of TRUE means that accounting for cpu and
requested memory gets multiplied by the number of slots
requested with the -pe switch, if job_is_first_task is set
to FALSE, the accounting information gets multiplied by
number of slots + 1.
urgency_slots
For pending jobs with a slot range PE request the number of
slots is not determined. This setting specifies the method
to be used by Sun Grid Engine to assess the number of slots
such jobs might finally get.
The assumed slot allocation has a meaning when determining
the resource-request-based priority contribution for numeric
resources as described in sge_priority(5) and is displayed
when qstat(1) is run without -g t option.
The following methods are supported:
<int>: The specified integer number is directly used as
prospective slot amount.
min: The slot range minimum is used as prospective slot
amount. If no lower bound is specified with the
range 1 is assumed.
max: The of the slot range maximum is used as prospec-
tive slot amount. If no upper bound is specified
with the range the absolute maximum possible due
to the PE's slots setting is assumed.
avg: The average of all numbers occurring within the
job's PE range request is assumed.
accounting_summary
This parameter is only checked if control_slaves (see above)
is set to TRUE and thus Sun Grid Engine is the creator of
the slave tasks of a parallel application via sge_execd(8)
and sge_shepherd(8). In this case, accounting information
is available for every single slave task started by Sun Grid
Engine.
The accounting_summary parameter can be set to TRUE or
FALSE. A value of TRUE indicates that only a single account-
ing record is written to the accounting(5) file, containing
the accounting summary of the whole job including all slave
tasks, while a value of FALSE indicates an individual
accounting(5) record is written for every slave task, as
well as for the master task.
Note: When running tightly integrated jobs with
SHARETREE_RESERVED_USAGE set, and with having
accounting_summary enabled in the parallel environment,
reserved usage will only be reported by the master task of
the parallel job. No per parallel task usage records will
be sent from execd to qmaster, which can significantly
reduce load on qmaster when running large tightly integrated
parallel jobs.
RESTRICTIONS
Note, that the functionality of the start-up, shutdown and
signaling procedures remains the full responsibility of the
administrator configuring the parallel environment. Sun
Grid Engine will just invoke these procedures and evaluate
their exit status. If the procedures do not perform their
tasks properly or if the parallel environment or the paral-
lel application behave unexpectedly, Sun Grid Engine has no
means to detect this.
SEE ALSO
sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1),
qsub(1), access_list(5), sge_qmaster(8), sge_shepherd(8).
COPYRIGHT
See sge_intro(1) for a full statement of rights and permis-
sions.
Man(1) output converted with
man2html