Qmaster1


Qmaster is the central application within a Grid Engine installation. A running instance of this application controls the overall behavior of a whole Grid Engine cluster.

This process holds the most current data about states of jobs, queues and other necessary objects. Data persistence is secured by the qmaster daemon. Objects are spooled to a filesystem before the user gets a confirmation about requested changes. Every daemon and client application which is interested in information about a specific object has to ask the qmaster daemon. Communication with qmaster is possible via the Grid Engine Database Interface (GDI )

Qmaster interoperates with the scheduler daemon schedd for the task of assigning suitable resources (hosts, queues) to jobs waiting for execution.
Schedd gets all its information from the qmaster via an event driven update protocol utilizing the GDI. Once schedd has determined an assignment of jobs to queues, qmaster will get notified via a list of so called "orders", which is again a GDI-based communication.

Such a job execution order will result in the action, that qmaster sends the corresponding job to the execution daemon execd of the selected host. This execd will start with the execution of the job. The current status of a job or a host again will be reported to qmaster by execd using the GDI.

Interoperating with Clients - Implementing the Grid Engine Database Interface

GDI stands for Grid Engine Database Interface. All functions which are needed for a client application or a daemon to use this interface are packaged together in the library libgdi.a. Documentation for this library can be found here.

One of the core tasks of qmaster is to implement the framework which handles incoming GDI requests. The most important part of this framework is the array gdi_object[]. Its elements contain the same constants used to identify GDI requests, and function pointers describing which functions serve a particular request.

When a client request arrives at qmaster it will pass through several steps. Information from the gdi_object[] array will be used as necessary. The following explains the steps being processed in the example of a modify request of a Checkpointing object. Please find all function definitions and constants mentioned below in source/libs/gdi/sge_gdi.h and source/daemons/qmaster/sge_c_gdi.c.

    sge_c_gdi() will be called with a new request structure. The function recognizes that it got a SGE_GDI_MOD-request and calls sge_c_gdi_mod().

    sge_c_gdi_mod() identifies the type of the object which should be modified (SGE_CKPT_LIST). It calls sge_gdi_add_mod_generic().

    sge_gdi_add_mod_generic() looks up the following three function pointer entries in gdi_object[] and executes them in the right order:
     

      A function which is capable of changing a copy of an object according to the information received from the client,

      a function which will spool the changed object to a corresponding file, and

      a function which will be called after the object was successfully modified.


    During the previous steps, lists will be filled with answer messages sent to the client before the request is completely finished.

Interoperating with the Scheduler - Events and Orders

The communication and interoperation with schedd is another core task of qmaster. The data exchange between the both is performed via an event and order protocol being based upon the GDI. An introduction concerning this protocol can be found here.

Interoperating with the Execution Daemon - Job and Load Protocols

Jobs get dispatched to execd by qmaster and execd reports job, host status and load information back to qmaster. A protocol between qmaster and execd handles these and similar tasks. More information about it can be found here.

Process Flow

When Qmaster starts up successfully following actions are taken:
Start a Commd process when none is running.
Connect to Commd.
Prevent that other Qmaster processes can start up successfully.
Read Qmasters configuration
Read all spooled GDI objects from the filesystem
After startup, Qmaster enters its main loop where it
accepts GDI requests
gets reports from Execds
answers acknowledge requests
You can find the implementation of the main loop in the main() function in source/daemons/qmaster/qmaster.c
 
 
1Copyright 2001 Sun Microsystems, Inc. All rights reserved.