Roland Dittel
20 November 2006
In large enterprise clusters it is necessary to prevent users from consuming all available resources. In order to achieve this, N1GE6 supports complex attributes which can be configured on a global, queue or host layer. This feature is sufficient in certain cases, especially in small clusters, but has shortcomings and drawbacks for enterprise usage.
Customers have asked for a feature to enhance resource limits so that they apply to several kinds of resources, several kinds of resource consumers, to all jobs in the cluster and to combinations of consumers. In this context, "resources" are any defined complex attribute (see complex(5)) known by the Grid Engine configuration. For example this can be slots, arch, mem_total, num_proc, swap_total or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per) queues, (per) hosts, (per) projects, (per) parallel environments. This specification describes a user interface to define such flexible resource limits.
This feature provides a way for administrators to limit the resources used at a single time by a consumer. However, it is not a way to define priorities by which user should obtain a resource. Priorities can be defined by using the Share Tree feature released with N1GE6.
The aim of this project is a solution that allows utilization of built-in and user-defined resources to be managed in a more flexible manner. In particular, this is a means to limit resources on a per user basis and a per project basis. Similarly, resource limitations on the basis of a user groups and project groups are also required.
The Issues targeted with this project are:
Issue | Description |
---|---|
74 | Support maxujobs on a per host level |
1532 | Max jobs per user on a queue basis |
1644 | Per-user slot limits for limiting PE usage |
CR 6298406 | Hostgroups should be added as another configuration layer b/w global and host |
CR 6289250 | Request for Job limit per User of Queue |
The expectation is that the management of N1GE cluster resources will be possible in a much more targeted manner. The enhancement must make it easy to freely manage limits for arbitrary resources in relation to existing N1GE objects, such as project/user/host/queue/pe, without the burden of doing micro-management with countless projects/users/hosts/queues.
Suggestions for future enhancements are:
switch | Description |
---|---|
-aattr obj_nm attr_nm val obj_id_lst | add to a list attribute of an object |
-Aaatr obj_nm fname obj_id_lst | add to a list attribute of an object |
-dattr obj_nm attr_nm val obj_id_lst | delete from a list attribute of an object |
-Dattr obj_nm fname obj_id_lst | delete from a list attribute of an object |
-mattr obj_nm attr_nm val obj_id_lst | modify an attribute (or element in a sublist) of an object |
-Mattr obj_nm fname obj_id_lst | modify an attribute (or element in a sublist) of an object |
-rattr obj_nm attr_nm val obj_id_lst | replace an attribute (or element in a sublist) of an object |
-Rattr obj_nm fname obj_id_lst | replace an attribute (or element in a sublist) of an object |
obj_nm | rqs - resource quota set |
attr_nm | name or enabled or description or limit |
val | new value of attr_nm |
obj_id_lst | rule set or rule for limit |
switch | description |
---|---|
-j job_identifier_list | show scheduler job information |
-u user_list | view only jobs of this user |
switch | Description |
---|---|
-arqs [name] | add resource quota set(s) |
-Arqs fname | add resource quota set(s) from file |
-mrqs [name] | modify resource quota set(s) |
-Mrqs fname [name] | modify resource quota set(s) from file |
-srqs [name_list] | show resource quota set(s) |
-srqsl | show resource quota set list |
-drqs [name_list] | delete resource quota set(s) |
switch | description |
---|---|
-help | print this help |
-h host_list | display only selected host |
-l resource_attributes | request the given resources |
-u user_list | display only selected users |
-pe pe_list | display only selected parallel environments |
-P project_list | display only selected projects |
-q wc_queue_list | display only selected queues |
-xml | display the information in XML-Format |
Qmon will be enhanced to allow the configuration of resource resource quota sets. The configuration will be the same as on CLI with an editor.
No Diagnose Support will be provided by qmon
Not affected
This Specification is used by Doc writer.
Installation will not not change. For future releases the installation may change if complex configuration for global/queue/host becomes obsolete.
At installation time no default rules sets are created.
Does not change
According to customers and the filed RFEs it's desired to define a limit only for specific consumers like users or projects and only for specific providers like hosts or queues. To achieve this administrators must be able to define a rule set which consists of the limiting resource and the limit value, and additionally the consumers or providers to whom this rule should apply. Because every rule can be expressed by a tuple of filter specifiers we decided to implement the rule sets in style of firewall rules.
In practice a rule is defined by:
The Resource Quota Rules are separate configuration objects and only used for scheduling decisions. They don't affect the overall cluster configuration like cluster queues, hosts or projects.
Deliberate use of restrictions in first step of implementation:
The Resource Quotas are an addition to the current global, host and queue instance based scheduling order. The old implementation is still valid and can be used without the new rules. The rules enhances the old implementation and adds a new order layer on top of global to define a more precise limitation.
The implications of the layer order on resources are described in complexes(5) under "Overriding attributes". In general the layers are AND associated and if one layer denies the job, then the next layer is ignored. For example, a limit value of "slots=4" can be overwritten in global, host or queue layer if the layer value is more restrictive, eg, "slots=2". The exception (see complexes(5)) is for boolean values; for example "is_linux=true" defined in the tree can not be overwritten to "is_linux=false" in global host or queue definition.
resource quotas |-DENIED->break | global |-DENIED->break | host |-DENIED->break | queue |-DENIED->break | OK |
Resource Reservation will for Resource Quotas analogue to the current global/host/queue resource configuration. No changes on client side necessary.
ALL: '*' SEPARATOR: ',' STRING: [^\n]* QUOTE: '\"' S_EXPANDER: '{' E_EXPANDER: '}' NOT: '!' BOOL: [tT][rR][uU][eE] | 1 | [fF][aA][lL][sS][eE] | 0 NAME: [a-zA-Z][a-zA-Z0-9_-]* LISTVALUE: ALL | [NOT]STRING LIST: LISTVALUE [SEPARATOR VALUE]* NOTSCOPE: LIST | S_EXPANDER LIST E_EXPANDER SCOPE: ALL | STRING [SEPARATOR STRING]* RESOURCEPAIR: STRING=STRING RESOURCE: RESOURCEPAIR [SEPARATOR RESOURCEPAIR]* rule: "limit" ["name" NAME] ["users" NOTSCOPE] ["projects" SCOPE] ["pes" SCOPE] \ ["queues" SCOPE] ["hosts" NOTSCOPE] "to" RESOURCE NL ruleset_attributes: ("name" NAME NL) ("enabled" BOOL NL)? ("description" QUOTE STRING QUOTE)? ruleset: "{" (ruleset_attributes) (rule)+ "} NL" rulesets: (ruleset)* |
Resource Quota rules specify the filter criteria that a job must match and the resulting limit that is taken when a match is found.
A rule must always begin with the keyword "limit". The order of the filter criteria is not important to define and input a rule. After sending the new rule set to the qmaster the rules will be ordered automatically to a human readable form.
To define a rule for more than one filter scope, it is possible to
group scopes to a list. The defined resource limit counts for all
objects listed in the scope in sum.
For example we have a consumable virtual_free defined as:
#name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------- virtual_free vf MEMORY <= YES YES 1g 0 |
In the rule defined below, both users can use together only 5g of virtual_free:
limit users roland, andre to virtual_free=5g |
If the administrator wants to limit each of the two users to 5g virtual_free he could define two rules:
limit users roland to virtual_free=5g limit users andre to virtual_free=5g |
This is very cumbersome for large numbers of users or user groups. For this case a rule can be defined with an expanded list. This would look like:
limit users {roland, andre} to virtual_free=5g |
If the scope contains a usergroup then it gets also expanded and the limit counts also for each member of that group.
For example if a hostgroup @lx_hosts contains host durin and carc both rules are equivalent:
1) limit users * hosts durin to virtual_free=10g limit users * hosts carc to virtual_free=10g 2) limit users * hosts {@lx_hosts} to virtual_free=10g |
Sometimes it is necessary to define a rule for a userset but exclude some users of that set. This can be defined by using the NOT operator ('!' sign) in front of the user name. A rule so defined will not affect the excluded user, even if the user is explicitly added to the rule.
For example, user "roland" is also member of usergroup "staff". If a resource quota rule looks like this:
limit users @staff,!roland to slots=10 |
limit users @staff,!roland,roland to slots=10 |
Resource Quota rules always define a maximal value of a resource that can be used. In the most cases these values are static and equal for all matching filter scopes. If administrators want different rule limits on different scopes then they have to define multiple rules; this leads to a duplication of nearly identical rules. With the concept of dynamical limits this kind of duplication can be avoided.
A dynamical limit is a simple algebraic expression used to derive the rule limit value. To be dynamical the formula can reference a complex attribute whose value is used for the calculation of the resulting limit. The limit formula expression syntax is that of a summation weighted complex values, that is:
{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]] |
The following example clarifies the use of dynamical limits: Users are allowed to use 5 slots per CPU on all linux hosts.
limit hosts {@linux_hosts} to slots=$num_proc*5 |
The complex attribute num_proc is defined on all hosts and its value is the processor count on every host. The limit is calculated by the formula "$num_proc*5" and so is different on some hosts. On a 2 CPU host users can run 10 slots whereas on a 1 CPU host users only can run 5 slots.
To be able to set the limitation to a well-defined value some prerequisites must be fulfilled
In principle all INT or DOUBLE kind of complex values could be referenced but due to time constrains the first implementation allows only $num_proc in combination with an expanded host list.
In practice administrators define some global limits and some limits that only apply for some resource consumers. These resource quota rules are equitable. But in some cases it's necessary to define exceptions for some resource consumers. These resource quota rules are not equal and dominate some others. As a matter of that fact it is necessary to allow the definition of a prioritized rule list and a rule list that apply all of the time. This is done by grouping one or more singe rules into a number of rule sets.
Inside one rule set the rules are ordered and the first rule found is used. This is analogous to firewall rules and generally understood by administrators and allows the prioritization of some rules. A rule set always results in one or none effective resource quota for a specific request.
All of the configured rule sets apply all of the time. This means if multiple rule sets are defined the most restrictive set is used and allows to define equitable limits.
The following example clarifies the combination of rules and rule sets. We have a consumable defined as:
#name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------- compiler_lic cl INT <= YES YES 0 0 |
The resource quota sets are defined as:
{ name ruleset1 limit users roland to compiler_lic=3 limit projects * to compiler_lic=2 limit users * to compiler_lic=1 } { name ruleset2 limit users * to compiler_lic=20 } |
The first rule set ruleset1 express:
Inside ruleset1 the priority is clear defined, user roland will always get 3 compiler_lic resources even though he matches to "users *" of the last rule in the rule set and even if he would submit his request in a project. Also the interaction between ruleset1 and ruleset2 is clear defined and results in a reject if 20 compiler_lic resources are already in use, even if user roland does not use all of his 2 compiler_lic resources.
With qconf it is possible to edit the rule sets in an editor session like with the most qconf switches. To reduce the amount of data presented to the administrator its possible to select only one rule set for editing.
It's not possible to edit single rules. Because the rules inside the rule set are ordered, the meaning of a single rule depends on the context of all other rules. Therefore it doesn't make sense to edit a single rule without presenting the context of the rule.
Switch Descriptions:
$ more rule_set.txt { name rule_set_2 enabled true description "rule set 2" } $ qconf -Arqs rule_set.txt rd141302@es-ergb01-01 added "rule_set_2" to resource quota set list $ qconf -Arqs rule_set.txt resource quota set "rule_set_2" already exists |
$ more rule_set.txt { name rule_set_3 enabled true description "rule set 2" } $ qconf -Mrqs rule_set.txt rule_set_3 resource quota set "rule_set_3" does not exist $ qconf -Mrqs rule_set.txt rule_set_4 resource quota set "rule_set_4" does not match rule set definition $ qconf -Mrqs rule_set.txt rd141302@es-ergb01-01 modified resource quota set list |
$ qconf -arqs <- { <- name template <- enabled true <- description "" <- } -> :q resource quota set name "template" is not valid $ qconf -arqs rule_set_1 <- { <- name rule_set_1 <- enabled true <- description "" <- } -> :wq rd141302@es-ergb01-01 added "rule_set_1" to resource quota set list $ qconf -arqs rule_set_1 resource quota set "rule_set_1" already exists |
$ qconf -mrqs rule_set_1 <- { <- name rule_set_1 <- enabled true <- description "" <- } -> :wq rd141302@es-ergb01-01 modified "rule_set_1" in resource quota set list $ qconf -mrqs unknown_set resource quota set "unknown_set" does not exist $ qconf -mrqs <- ... <- name rule_set_1 <- ... <- name rule_set_2 <- ... |
$ qconf -srqs ... name rule_set_1 ... name rule_set_2 ... $ qconf -srqs rule_set_1 ... name rule_set_1 ... |
$ qconf -drqs rule_set_1 rd141302@es-ergb01-01 removed "rule_set_1" from resource quota set list $ qconf -drqs unknown_rule_set denied: resource quota set "unknown_rule_set" does not exist $ qconf -drqs rd141302@es-ergb01-01 removed resource quota set list |
$ qconf -srqs ruleset_1 { name ruleset_1 enabled true limit users @eng to slots=10 limit name arch_rule users @eng to arch=lx24-amd64 } $ qconf -aattr resource_quota limit slots=20 ruleset_1/1 No modification because "slots" already exists in "limit" of "ruleset_1/1" $ qconf -aattr resource_quota limit compiler_lic=5 rule_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in rqs list $ qconf -aattr resource_quota limit arch=sol-sparc64 rule_1/arch_rule No modification because "arch" already exists in "limit" of "ruleset_1/1" |
$ more resource.txt limit slots=20 $ qconf -Aattr resource_quota resource.txt ruleset_1/1 No modification because "slots" already exists in "limit" of "ruleset_1/1" $ more resource2.txt limit compiler_lic=5 $ qconf -Aattr resource_quota resource2.txt ruleset_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list |
$ qconf -dattr resource_quota limit compiler_lic=5 rule_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in rqs list $ qconf -dattr resource_quota limit compiler_lic=5 rule_1/1 "compiler_lic" does not exist in "limit" of "resource_quota" |
$ more resource.txt limit compiler_lic=20 $ qconf -Dattr resource_quota resource.txt rule_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list $ qconf -Dattr resource_quota resource.txt rule_1/1 "compiler_lic" does not exist in "limit" of "resource_quota" |
$ qconf -mattr resource_quota limit slots=5 rule_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list $ qconf -mattr resource_quota limit new_resource=5 rule_1/1 Unable to find "new_resource" in "limit" of "resource_quota" - Adding new element. $ qconf -mattr resource_quota enabled false rule_1 rd141302@es-ergb01-01 modified "ruleset_1" in resource_quota list |
$ more resource.txt limit slots=20 $ qconf -Mattr resource_quota resource.txt ruleset_1/1 rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list $ more resource2.txt limit new_resource=5 $ qconf -Mattr resource_quota resource2.txt ruleset_1/1 Unable to find "new_resource" in "limit" of "resource_quota" - Adding new element. |
Switch Descriptions:
Additional Output
Example: cannot run on cluster because exceeds limit in rule_set_1 cannot run on host "bla" because exceeds limit in rule_set_1 cannot run on queue instance "all.q@host" because exceeds limit in rule_set_1 |
To be consistent with qquota the default value of user_list changes from * (all users) to the calling user.
The qquota command is a diagnose tool for the resource resource quotas. The output is a table with the following rows:
resource quota rule | limit | filter |
For each matched rule per rule set a line is printed if the usage count is not 0 for this rule. If one rule contains more than one resource attribute then one line is printed per resource attribute. By default it shows the effective limits for the calling user and for all other filter criteria like project or pe the wildcard "*" is used which means not explicit is used.
The output for the limit table is:
complex=used/limit (for example slots 2/20) |
complex=value (for example arch lx24-amd64) |
The administrator and the user may define files (analogue to sge_qstat(5)), which can contain any of the options described below. A cluster-wide sge_qquota file may be placed under $SGE_ROOT/$SGE_CELL/common/sge_qquota The user private file is searched at the location $HOME/.sge_qquota. The home directory request file has the highest precedence over the cluster global file. Command line can be used to override the flags contained in the files.
Example:
Rule Set:
qstat Output:
qquota Output:
|
qquota XML Schema:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:element name="qquota_result"> <xsd:sequence> <xsd:element name="qquota_rule" type="QQuotaRuleType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:element> <xsd:complexType name="QQuotaRuleType"> <xsd:sequence> <xsd:element name="user" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="xuser" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="project" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="xproject" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="pe" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="xpe" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="queue" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="xqueue" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="host" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="xhost" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="limit" type="ResourceLimitType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="name" type="xsd:string" use="required"/> </xsd:complexType> <xsd:complexType name="ResourceLimitType"> <xsd:attribute name="resource" type="xsd:string" use="required"/> <xsd:attribute name="limit" type="xsd:string" use="required"/> <xsd:attribute name="value" type="xsd:string" use="optional"/> </xsd:complexType> </xsd:schema> |
All lists are used by qmaster and scheduler
File sge_resource_quotaL.h
#ifndef __SGE_RESOURCE_QUOTAL_H #define __SGE_RESOURCE_QUOTAL_H #include "sge_boundaries.h" #include "cull.h" #ifdef __cplusplus extern "C" { #endif /* *INDENT-OFF* */ /* Resource Quota Set */ enum { RQS_name = RQS_LOWERBOUND, RQS_description, RQS_enabled, RQS_rule }; LISTDEF(RQS_Type) JGDI_ROOT_OBJ(ResourceQuotaSet, SGE_RQS_LIST, ADD | MODIFY | DELETE | GET | GET_LIST) JGDI_EVENT_OBJ(ADD(sgeE_RQS_ADD) | MODIFY(sgeE_RQS_MOD) | DELETE(sgeE_RQS_DEL) | GET_LIST(sgeE_RQS_LIST)) SGE_STRING(RQS_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL) SGE_STRING(RQS_description, CULL_DEFAULT | CULL_SPOOL) SGE_BOOL(RQS_enabled, CULL_DEFAULT | CULL_SPOOL) SGE_LIST(RQS_rule, RQR_Type, CULL_DEFAULT | CULL_SPOOL) LISTEND NAMEDEF(RQSN) NAME("RQS_name") NAME("RQS_description") NAME("RQS_enabled") NAME("RQS_rule") NAMEEND #define RQSS sizeof(RQSN)/sizeof(char*) /* Resource Quota Rule */ enum { RQR_name = RQR_LOWERBOUND, RQR_filter_users, RQR_filter_projects, RQR_filter_pes, RQR_filter_queues, RQR_filter_hosts, RQR_limit, RQR_level }; LISTDEF(RQR_Type) JGDI_OBJ(ResourceQuotaRule) SGE_STRING(RQR_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL) SGE_OBJECT(RQR_filter_users, RQRF_Type, CULL_DEFAULT | CULL_SPOOL) SGE_OBJECT(RQR_filter_projects, RQRF_Type, CULL_DEFAULT | CULL_SPOOL) SGE_OBJECT(RQR_filter_pes, RQRF_Type, CULL_DEFAULT | CULL_SPOOL) SGE_OBJECT(RQR_filter_queues, RQRF_Type, CULL_DEFAULT | CULL_SPOOL) SGE_OBJECT(RQR_filter_hosts, RQRF_Type, CULL_DEFAULT | CULL_SPOOL) SGE_LIST(RQR_limit, RQRL_Type, CULL_DEFAULT | CULL_SPOOL) SGE_ULONG(RQR_level, CULL_DEFAULT | CULL_JGDI_RO) LISTEND NAMEDEF(RQRN) NAME("RQR_name") NAME("RQR_filter_users") NAME("RQR_filter_projects") NAME("RQR_filter_pes") NAME("RQR_filter_queues") NAME("RQR_filter_hosts") NAME("RQR_limit") NAME("RQR_level") NAMEEND #define RQRS sizeof(RQRN)/sizeof(char*) enum { FILTER_USERS = 0, FILTER_PROJECTS, FILTER_PES, FILTER_QUEUES, FILTER_HOSTS }; enum { RQR_ALL = 0, RQR_GLOBAL, RQR_CQUEUE, RQR_HOST, RQR_QUEUEI }; /* Resource Quota Rule Filter */ enum { RQRF_expand = RQRF_LOWERBOUND, RQRF_scope, RQRF_xscope }; LISTDEF(RQRF_Type) JGDI_OBJ(ResourceQuotaRuleFilter) SGE_BOOL(RQRF_expand, CULL_DEFAULT | CULL_SPOOL) SGE_LIST(RQRF_scope, ST_Type, CULL_DEFAULT | CULL_SPOOL) SGE_LIST(RQRF_xscope, ST_Type, CULL_DEFAULT | CULL_SPOOL) LISTEND NAMEDEF(RQRFN) NAME("RQRF_expand") NAME("RQRF_scope") NAME("RQRF_xscope") NAMEEND #define RQRFS sizeof(RQRFN)/sizeof(char*) /* Resource Quota Rule Limit */ enum { RQRL_name = RQRL_LOWERBOUND, RQRL_value, RQRL_type, RQRL_dvalue, RQRL_usage, RQRL_dynamic }; LISTDEF(RQRL_Type) JGDI_OBJ(ResourceQuotaRuleLimit) SGE_STRING(RQRL_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL) SGE_STRING(RQRL_value, CULL_DEFAULT | CULL_SPOOL) SGE_ULONG(RQRL_type, CULL_DEFAULT | CULL_SPOOL | CULL_JGDI_RO) SGE_DOUBLE(RQRL_dvalue, CULL_DEFAULT | CULL_SPOOL | CULL_JGDI_RO) SGE_LIST(RQRL_usage, RUE_Type, CULL_DEFAULT | CULL_JGDI_RO) SGE_BOOL(RQRL_dynamic, CULL_DEFAULT | CULL_JGDI_RO) LISTEND NAMEDEF(RQRLN) NAME("RQRL_name") NAME("RQRL_value") NAME("RQRL_type") NAME("RQRL_dvalue") NAME("RQRL_usage") NAME("RQRL_dynamic") NAMEEND #define RQRLS sizeof(RQRLN)/sizeof(char*) /* *INDENT-ON* */ #ifdef __cplusplus } #endif #endif /* __SGE_RESOURCE_QUOTAL_H */ |