Standard Compatibility Test

Compatibility is an important aspect of the License under which Grid Engine source code is made available. In this context, cross compatibility may have to be certified between a version of the Grid Engine software which you have enhanced and a version declared as one of Grid Engine's Reference Builds from which your modification deviates. Note, that over time there can be multiple Reference Builds representing different stable software release levels. You might intend to test compatibility with one or with multiple of those Reference Builds. A list of the currently available Reference Builds with all pertinent information can be found here.

The following describes how to test compatibility between two builds. You will need to create a binary distribution package for both builds before you start with the compatibility checking process. You will also have to make all preparations to be able to run the Grid Engine Testsuite. You will have to use the Testsuite level as defined in the Reference Build definition.

The compatibility test consist of a preparation step, a validation run of the Standard Version and multiple compatibility checks. Your changed version has to pass all tests without error and has to deliver the same results as the validation run to be considered compatible.

Release Notes:

Version 5.3

Preparation:

Set up a testsuite cluster with more than one host

The Testsuite documentation is describing how this can be done. Use the Reference Build to which you want to test compatibility and run:

expect check.exp install
The testsuite will generate a default setup file "defaults.sav" in the testsuite directory. After that the testsuite will start the vi command in order that the user can edit the testsuite settings. You will be asked on which hosts you want to install a testsuite cluster and you will have to use at least 2 for the purpose of the compatibility test. Please enable the error mails by providing your e-mail address when setting up the testsuite. The testsuite will report errors by e-mail.

Validation:

Run testsuite on Reference Build

Run the testsuite with following command (Testsuite start output):

expect check.exp all 2 category COMPATIBILITY

Do not remove the test results of the validation run. Every testsuite run will manipulate the results directory, so copy your validation results before running another test. You will need to compare your validation run results with the subsequent compatibility runs. No errors must occur during the validation run. If you encounter errors then this might be due to network setup problems in your cluster or similar issues. Fix those first before you proceed. Report your problem if it persists. You cannot test compatibility with a validation run with errors.

Check 1: Qmaster compatibility

Shutdown the system

Use the testsuite to shutdown the cluster:

expect check.exp kill

Exchange the sge_qmaster binary with the one from your modified build

Run the testsuite again:

expect check.exp all 2 category COMPATIBILITY

Compare results with validation run

Stop if the results are not equal.

Shutdown the system

Use the testsuite to shutdown the cluster:

expect check.exp kill

Re-Exchange the sge_qmaster binary with the one from the Reference Build

Check 2: Scheduler compatibility

(If you are absolutely sure that your modification did not change sge_schedd you may skip this step, but be aware that changes in some libraries, like for instance the GDI library, may also modify sge_schedd. Carry out the test if you are not 100% sure.)

Exchange the sge_schedd binary with the one from your modified build

Run the testsuite again:

expect check.exp all 2 category COMPATIBILITY

Compare results with validation run

Stop if the results are not equal.

Shutdown the system

Use the testsuite to shutdown the cluster:

expect check.exp kill

Re-Exchange the sge_schedd binary with the one from the Reference Build

Check 3: Commd compatibility

(If you are absolutely sure that your modification did not change sge_commd you may skip this step, but be aware that changes in some libraries, like for instance the zlib, may also modify the sge_commd. Carry out the test if you are not 100% sure.)

Exchange all sge_commd binaries with the ones from your modified build

Run the testsuite again:

expect check.exp all 2 category COMPATIBILITY

Compare results with validation run

Shutdown the system

Use the testsuite to shutdown the cluster:

expect check.exp kill

Re-Exchange the sge_commd binaries with the ones from the Reference Build

Check 4: Client compatibility

(If you are absolutely sure that your modification did not change any Grid Engine client binary you may skip this step, but be aware that changes in some libraries, like the GDI library, may also modify the client binaries. Carry out the test if you are not 100% sure.)

Exchange all client binaries with the new ones

Run the testsuite again:

expect check.exp all 2 category COMPATIBILITY

Compare results with validation run

Check 5: General compatibility

Shutdown the system

Use the testsuite to shutdown the cluster:

expect check.exp kill

Set up a testsuite cluster with more than one host using the modified build

The Testsuite documentation is describing how this can be done. Use the modified build.

expect check.exp install

The testsuite will generate a default setup file "defaults.sav" in the testsuite directory. After that the testsuite will start the vi command in order that the user can edit the testsuite settings. You will be asked on which hosts you want to install a testsuite cluster and you will have to use at least 2 for the purpose of the compatibility test.

Run the testsuite with following command on your modified build:

expect check.exp all 2 category COMPATIBILITY

Compare results with validation run

Testsuite start output

After starting the testsuite with the command
expect check.exp all 2 category COMPATIBILITY
the testsuite should produce the following output:

===============================================================================
system version     : SGE 5.3 (1) / feature: none
current dir        : [testsuite_root_directory]/checktree
max. runlevel      : day long medium short week
selected runlevels : long medium short
categories         : COMPATIBILITY PERFORMANCE SYSTEM
selected categories: COMPATIBILITY
est. run time      : 6 h 40 m
===============================================================================
2 test(s) available in subdir: functional
1 test(s) available in subdir: install_core_system
0 test(s) available in subdir: performance
19 test(s) available in subdir: system_tests
===============================================================================
run all tests ...
you have no ssh access and no root password
test in directory [testsuite_root_directory]/checktree/functional/access_lists
needs root access ...
root access needed, please enter root password:

After entering the root password the testsuite will start with the compatibility tests.

Notes for Grid Engine release 5.3

Following tests may cause trouble. If one of the check functions will report an error described in this table, the
error can be ignored:

Check name Check function Remarks

submit_del submit_del_test A job deleted immediately after submit, may stay in delete state. A second qdel call will delete the job. This is a known problem. The testsuite provokes this behaviour and reports errors.
The error message is "timeout waiting for end of all jobs"

qdel qdel_submit_delete_when_transfered See remarks for submit_del.
Error message is `timeout waiting for job "X" "leeper""`

qrsh qrsh_trap This test notifies the user with "test not completely implemented" this is only a warning.
The result is listet as "unsupported tests". Any other error should not pop up.

Check name	Check function	Remarks
submit_del	submit_del_test	A job deleted immediately after submit, may stay in delete state. A second qdel call will delete the job. This is a known problem. The testsuite provokes this behaviour and reports errors. The error message is "timeout waiting for end of all jobs"
qdel	qdel_submit_delete_when_transfered	See remarks for submit_del. Error message is `timeout waiting for job "X" "leeper""`
qrsh	qrsh_trap	This test notifies the user with "test not completely implemented" this is only a warning. The result is listet as "unsupported tests". Any other error should not pop up.