= DTrace-based Master monitoring under Solaris 10 = == Copyright statement == '''Note:''' This document is under Grid Engine project [http://gridengine.sunsource.net/license.html SISSL license]. == Introduction == Trouble-shooting in a distributed system spanning multiple thousands active components like Grid Engine can become a real challenge. In practice Grid Engine administrators have no apriori certainty about reasons for performance degradation degregation in their production environments and reproducing such problems is not easy for those who wish to help. For that reason it is important to have a utility for on-site performance monitoring of the Grid Engine master component. Thanks to [http://docs.sun.com/app/docs/doc/817-6223 DTrace(1) support in Solaris 10] a platform exists that is suited even for monitoring of multi-threaded/multi-processing components like Grid Engine master (=qmaster+scheduler) :) == Project Overview == === Project Aim === Aim is to provide an easy-to-use DTrace-based bottleneck analysis first-aid kit to Grid Engine administrators. Though with this utility it will not be possible to entirely understand any single case where performance degregation occurs. However in a majority of cases it will provide sufficient insight required by administrator to independently find an appropriate remedy. In other cases it shall deliver a base that facilitates more experienced people to help in a determined and time-saving way. === Project Benefit === Grid Engine administrators will be able to independently tune their Grid Engine clusters and can get faster and better help from those who made comparable experiences in other challenging cases. == Block diagram == Imagine one of these nice block diagrams pictures. Components apparently are * Grid Engine master * DTrace master monitor (D scripts) * dtrace(1M) command * Solaris 10 == Performance Impact == There is no performance impact to Grid Engine when DTrace is not enabled. When DTrace is enabled, the overhead is negligible because DTrace does not stop and restart the qmaster for each traced function. Instead, DTrace collects data in per-CPU buffers which dtrace(1M) asynchronously reads. (ref 1) == User Documentation == That is kept under [http://gridengine.sunsource.net/source/browse/*checkout*/gridengine/source/scripts/dtrace/README-dtrace.txt?rev=HEAD README-dtrace.txt]. == Packaging == Master monitor will be part of the 'common' package in $SGE_ROOT/dtrace directory. == Implementation == The master monitor soley depends on DTrace pid provider probes. These probes are used to track calls to throughput-sensitive functions in qmaster/scheduler. Note, for the DTrace pid provider based monitor to function it is required these functions are not inlined as explained in DTrace documentation [http://bugs.opensolaris.org/view_bug.do?bug_id=6480235 CR #6480235]. The list of the functions tracked by the monitor is: * cl_message_list_append_receive() * cl_message_list_append_send() * cl_message_list_remove_receive() * cl_message_list_remove_send() * dispatch_jobs() * do_c_ack() * do_gdi_packet() * report_list_send() * select_assign_debit() * sge_c_gdi_add() * sge_c_gdi_copy() * sge_c_gdi_del() * sge_c_gdi_get() * sge_c_gdi_mod() * sge_c_gdi_permcheck() * sge_c_gdi_trigger() * sge_c_report() * sge_lock() * sge_log() * sge_mirror_process_events() * sge_unlock() * spool_delete_object() * spool_write_object() * spool_read_object() * spool_delete_script() * spool_write_script() * spool_read_script() == Appendix == === Example #1 === The following monitoring output sample illustrates a case where a Grid Engine master bottleneck could be detected. With the at hand sample performance degregation happend between 17:40:32 and 17:41:05.
CPU     ID                    FUNCTION:NAME
  0      1                           :BEGIN                 Time |   #wrt  wrt/ms |#rep #gdi #ack|   #dsp  dsp/ms    #sad|   #snd    #rcv|  #in++   #in--  #out++  #out--|  #lck0  #ulck0   #lck1  #ulck1
  0  36909                       :tick-3sec 2006 Nov 24 17:39:23 |      43       3|   0    8    4|      3     691     121|      4       4|     11      11      15      15|     68      68     289     288
  0  36909                       :tick-3sec 2006 Nov 24 17:39:26 |      83      16|   0   10    3|      3     699     122|      3       3|     14      13      17      17|     90      90     681     681
  0  36909                       :tick-3sec 2006 Nov 24 17:39:29 |     117      24|   0    9    4|      4    1092     198|      4       4|     13      13      17      17|     71      71     591     591
  0  36909                       :tick-3sec 2006 Nov 24 17:39:32 |      19       4|   0    9    3|      3     591     147|      3       3|     12      12      15      15|     44      43     249     249
  0  36909                       :tick-3sec 2006 Nov 24 17:39:35 |     144      28|   0    9    4|      4    1012     173|      4       4|     13      13      17      17|     61      62    1246    1247
  0  36909                       :tick-3sec 2006 Nov 24 17:39:38 |      46       5|   0    8    3|      3     705     122|      3       3|     11      11      14      14|     67      67     293     293
  0  36909                       :tick-3sec 2006 Nov 24 17:39:41 |     154      31|   0    9    3|      4     894     198|      3       3|     13      13      16      16|     73      72     968     969
  0  36909                       :tick-3sec 2006 Nov 24 17:39:44 |      46       5|   0   10    4|      4     971     162|      4       4|     13      13      17      17|     71      72     304     304
  0  36909                       :tick-3sec 2006 Nov 24 17:39:47 |     154      29|   0    8    3|      3     739     158|      3       3|     11      11      14      14|     67      67     990     990
  0  36909                       :tick-3sec 2006 Nov 24 17:39:50 |      46       5|   0   10    4|      4     815     162|      4       4|     14      14      18      18|     76      76     692     693
  0  36909                       :tick-3sec 2006 Nov 24 17:39:53 |      74      15|   0    8    3|      3     746     136|      3       3|     12      12      15      15|     54      53     571     571
  0  36909                       :tick-3sec 2006 Nov 24 17:39:56 |     116      20|   0   11    4|      4     992     184|      4       4|     14      14      18      18|     80      81     669     669
  0  36909                       :tick-3sec 2006 Nov 24 17:39:59 |      87      18|   0   11    4|      4     851     176|      5       4|     15      15      21      21|     77      76     670     670
  0  36909                       :tick-3sec 2006 Nov 24 17:40:02 |     109      20|   0   12    5|      4     930     184|      4       5|     17      17      20      20|     77      78     624     624
  0  36909                       :tick-3sec 2006 Nov 24 17:40:05 |      88      15|   0    9    3|      4     995     176|      3       3|     12      12      15      15|     71      71    1026    1026
  0  36909                       :tick-3sec 2006 Nov 24 17:40:08 |     112      20|   0   12    4|      4     927     184|      5       4|     16      16      22      22|     81      81     652     652
  0  36909                       :tick-3sec 2006 Nov 24 17:40:11 |      32       6|   0    7    4|      3     618     121|      3       4|     11      11      13      13|     54      53     336     336
  0  36909                       :tick-3sec 2006 Nov 24 17:40:14 |     145      30|   0   11    4|      4     988     199|      4       4|     15      15      19      19|     64      65     827     827
  0  36909                       :tick-3sec 2006 Nov 24 17:40:17 |      43       3|   0    7    3|      3     618     121|      3       3|     10      10      13      13|     64      64     286     286
  0  36909                       :tick-3sec 2006 Nov 24 17:40:20 |     157      31|   0   11    4|      4     977     199|      4       4|     15      15      19      19|     80      80    1406    1408
  0  36909                       :tick-3sec 2006 Nov 24 17:40:23 |      43       4|   0    7    3|      3     701     121|      3       3|     10      10      13      13|     64      64     285     285
  0  36909                       :tick-3sec 2006 Nov 24 17:40:26 |      73      18|   0   11    4|      4     948     171|      4       4|     15      15      19      19|     77      77     700     700
  0  36909                       :tick-3sec 2006 Nov 24 17:40:29 |     127      31|   0   10    4|      4     968     189|      4       4|     14      14      18      18|     74      74     584     584
  0  36909                       :tick-3sec 2006 Nov 24 17:40:32 |      10       3|   0    6    0|      1     203      41|      0       0|     58       8      62      62|     23      22     106     106
  0  36909                       :tick-3sec 2006 Nov 24 17:40:35 |      19       5|   0    5    0|      0       0       0|      0       0|      8       5      13      13|     30      30     200     200
  0  36909                       :tick-3sec 2006 Nov 24 17:40:38 |      16       5|   0    5    1|      0       0       0|      0       0|      5       6      10      10|     27      26     558     559
  0  36909                       :tick-3sec 2006 Nov 24 17:40:41 |       1       0|   0    4    0|      0       0       0|      0       0|      7       4      11      11|      9       9      34      34
  0  36909                       :tick-3sec 2006 Nov 24 17:40:44 |       0       0|   0    4    0|      0       0       0|      0       0|      7       4      11      11|      8       8      28      28
  0  36909                       :tick-3sec 2006 Nov 24 17:40:47 |       0       0|   0    6    0|      1     744      81|      1       1|     10       6      15      15|     14      14      33      33
  0  36909                       :tick-3sec 2006 Nov 24 17:40:50 |       1       0|   0    5    1|      0       0       0|      0       0|      8       6      14      14|     11      11      49      49
  0  36909                       :tick-3sec 2006 Nov 24 17:40:53 |       0       0|   0    4    0|      0       0       0|      0       0|      9       4      12      12|      6       7      28      28
  0  36909                       :tick-3sec 2006 Nov 24 17:40:56 |       0       0|   0    5    0|      0       0       0|      0       0|      8       5      13      13|     12      12     420     420
  0  36909                       :tick-3sec 2006 Nov 24 17:40:59 |       0       0|   0    4    0|      0       0       0|      0       0|      8       4      12      12|      9       8      30      30
  0  36909                       :tick-3sec 2006 Nov 24 17:41:02 |       0       0|   0    4    1|      0       0       0|      0       0|     12       5      16      16|      7       8      25      25
  0  36909                       :tick-3sec 2006 Nov 24 17:41:05 |     165      41|   0   48   60|      0       0       0|      1       1|     23     106      71      71|     96      97    1236    1236
  0  36909                       :tick-3sec 2006 Nov 24 17:41:08 |     178      28|   0   15   53|      4     965     206|      4       4|     68      68      75      75|    130     130    1336    1336
  0  36909                       :tick-3sec 2006 Nov 24 17:41:11 |     106      23|   0   27   35|      4     855     166|      4       4|     82      82      91      91|    115     114    1040    1040
  0  36909                       :tick-3sec 2006 Nov 24 17:41:14 |     198      37|   0   41   70|      4    1189     196|      4       4|    185     185     185     185|    134     135    1327    1327
  0  36909                       :tick-3sec 2006 Nov 24 17:41:17 |      16       5|   0    9    5|      4     940     161|      3       3|     17      17      20      20|     43      42     234     234
  0  36909                       :tick-3sec 2006 Nov 24 17:41:20 |     162      35|   0   13    8|      4     958     200|      4       4|     23      23      28      28|     80      81    1018    1018
  0  36909                       :tick-3sec 2006 Nov 24 17:41:23 |      44       6|   0    6    3|      2     544      81|      3       3|      8       8      11      11|     63      63     747     747
  0  36909                       :tick-3sec 2006 Nov 24 17:41:26 |     150      34|   0   13    6|      4     921     199|      4       4|     21      21      25      25|     73      72     923     923
  0  36909                       :tick-3sec 2006 Nov 24 17:41:29 |      43       3|   0    5    2|      2     506      81|      2       2|      7       7       9       9|     57      57     260     260
  0  36909                       :tick-3sec 2006 Nov 24 17:41:32 |     157      37|   0    9    3|      4     978     199|      3       3|     13      13      16      16|     73      72     970     970
  0  36909                       :tick-3sec 2006 Nov 24 17:41:35 |      43       3|   0    7    3|      2     512      85|      3       3|      9       9      12      12|     61      62     274     274
  0  36909                       :tick-3sec 2006 Nov 24 17:41:38 |     127      29|   0    8    3|      4     994     185|      3       3|     11      11      14      14|     68      68    1265    1265
  0  36909                       :tick-3sec 2006 Nov 24 17:41:41 |      66      11|   0   10    4|      4     973     171|      4       4|     14      14      18      18|     67      67     354     354
  0  36909                       :tick-3sec 2006 Nov 24 17:41:44 |      48      10|   0    8    3|      3     785     128|      3       3|     11      11      14      14|     52      51     399     399
  0  36909                       :tick-3sec 2006 Nov 24 17:41:47 |     142      31|   0   12    4|      4     913     192|      5       4|     17      17      23      23|     89      90     830     830
  0  36909                       :tick-3sec 2006 Nov 24 17:41:50 |      64      13|   0   11    5|      4     853     168|      4       5|     15      15      18      18|     75      75     542     542
cause for the performance degregation were a couple of dozens 'qsub -sync y' as explained in [http://gridengine.sunsource.net/issues/show_bug.cgi?id=2125 qmaster performance gets throttled if qsub -sync y is used when many jobs are in the system] === Example #2 === This case shows how the 6.0u8+ based DTrace script ([http://gridengine.sunsource.net/files/documents/7/97/SGE60-dtrace.tar.gz SGE60-dtrace.tar.gz]) helped to understand the cause of sporadic 'qmaster alive timeout expiring' loggings
                Time |    #wrt  wrt/ms|#rep #gdi #ack|   #dsp  dsp/ms    #sad|   #snd    #rcv|  #lck0  #ulck0   #lck1  #ulck1
2006 Oct 21 04:04:52 |      80    2020|  77   30   15|     15     810      30|     15      15|    199     199    1660    1660
2006 Oct 21 04:05:07 |     101    2563|  76   30   15|     15     808      36|     15      15|    197     197    1701    1702
2006 Oct 21 04:05:22 |     130    3502| 100   31   15|     15     818      40|     15      15|    232     232    2028    2027
2006 Oct 21 04:05:37 |     161    4798| 103   30   15|     15     804      47|     15      15|    224     224    2050    2050
2006 Oct 21 04:05:52 |     169    4926| 108   30   15|     15     818      50|     15      15|    230     230    2133    2136
2006 Oct 21 04:06:07 |     134    3809|  98   30   15|     15     801      41|     15      15|    219     219    1942    1942
2006 Oct 21 04:06:22 |     192    5838| 113   31   15|     15     836      53|     15      15|    245     245    2172    2170
2006 Oct 21 04:06:37 |     346   13390| 185   27   13|     14     828      89|     13      13|    296     295    3064    3063
2006 Oct 21 04:06:52 |     364   13975| 178   13    6|      7     449      80|      7       7|    231     231    2879    2879
2006 Oct 21 04:07:07 |     355   14263| 182    7    4|      3     243      72|      3       3|    212     212    2991    2992
2006 Oct 21 04:07:22 |     335   14190| 179    5    2|      2     187      70|      2       2|    207     207    2857    2858
2006 Oct 21 04:07:37 |     297   14203| 178    3    1|      2     206      90|      2       2|    190     190    2639    2641
2006 Oct 21 04:07:52 |     383   13728| 133    4    2|      3     419     219|      2       3|    153     153    2442    2442
2006 Oct 21 04:08:07 |     432   13680| 115    3    1|      2     328     190|      1       2|    126     126    1930    1931
2006 Oct 21 04:08:22 |     498   13451| 226    6    2|      2     281     142|      3       2|    251     251    3086    3085
2006 Oct 21 04:08:37 |     411   13331|  89    4    2|      2     422     245|      2       2|    114     114    1938    1939
2006 Oct 21 04:08:52 |     473   13595| 136    3    1|      2     420     264|      1       2|    147     147    2194    2193
2006 Oct 21 04:08:57 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:09:07 |     322   11950|  85    3    1|      0       0       0|      2       1|    100     100    1561    1561
2006 Oct 21 04:09:22 |     512    9825| 291    9    1|      1     356     215|      2       1|    325     325    3850    3849
2006 Oct 21 04:09:37 |     226   14398|   0    0    0|      1     353     215|      0       1|      0       0     843     843
2006 Oct 21 04:09:40 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:09:52 |     200   13072|  30    1    0|      0       0       0|      1       1|     35      35    1301    1301
2006 Oct 21 04:10:07 |     516   11657| 139    2    0|      0       0       0|      1       0|    148     148    1900    1900
2006 Oct 21 04:10:22 |     302    9965| 434    3    0|      0       0       0|      0       0|    449     450    4439    4441
2006 Oct 21 04:10:37 |     404   12147|  10    5    1|      2     759     456|      1       2|     26      25    1214    1214
2006 Oct 21 04:10:44 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:10:52 |     225   14003|   0    0    0|      0       0       0|      0       1|      0       0     834     834
2006 Oct 21 04:11:07 |     381   12033| 120    1    0|      0       0       0|      1       0|    127     127    1733    1733
2006 Oct 21 04:11:22 |     502   12540| 338    2    0|      0       0       0|      1       0|    346     346    3492    3493
2006 Oct 21 04:11:37 |     434   12547| 151    7    1|      2     708     422|      1       2|    179     179    2964    2963
2006 Oct 21 04:11:47 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:11:52 |     187   14300|   0    0    0|      0       0       0|      0       1|      0       0     706     706
2006 Oct 21 04:12:07 |     378   12619|  93    1    0|      0       0       0|      1       0|    100     100    1483    1483
2006 Oct 21 04:12:22 |     474   13377| 357    2    0|      0       0       0|      1       0|    365     365    3897    3897
2006 Oct 21 04:12:37 |     384    8363| 154    8    1|      1     371     220|      1       1|    184     184    2809    2809
2006 Oct 21 04:12:52 |     203   14475|   0    0    0|      1     376     220|      0       1|      0       0     770     770
2006 Oct 21 04:12:52 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:13:07 |     234   13167|  98    1    0|      0       0       0|      1       1|    105     105    1556    1556
2006 Oct 21 04:13:22 |     430   11001|  21    1    0|      0       0       0|      0       0|     24      24    1080    1080
2006 Oct 21 04:13:37 |     332   14139| 387    1    0|      0       0       0|      1       0|    394     394    4101    4101
2006 Oct 21 04:13:52 |     420   12256| 141    8    1|      2     820     488|      1       2|    169     169    2227    2227
2006 Oct 21 04:14:03 | sge_log(4, qmaster alive timeout expired)
2006 Oct 21 04:14:07 |     212   14377|   0    0    0|      0       0       0|      0       1|      0       0     788     788
2006 Oct 21 04:14:22 |     274   12997|  86    1    0|      0       0       0|      1       0|     93      93    1537    1537
2006 Oct 21 04:14:37 |     376   11234|  23    1    0|      0       0       0|      0       0|     26      26    1020    1020
2006 Oct 21 04:14:52 |     273   14371| 325    1    0|      0       0       0|      1       0|    332     332    3274    3274
here inadequately set-up spooling (classic over NFS) caused that qmaster was almost entirely busy with IO. This can be concluded from the interval being 15 seconds with 'wrt/ms' being up to 14.475 seconds. The qmaster alive timeout was a side implication. The bug is described under [http://gridengine.sunsource.net/issues/show_bug.cgi?id=2108 qmaster logging "scheduler tried to remove a incomplete"] = Reference = * [http://www.sun.com/books/catalog/solaris_perf_tools.xml McDougall, Richard; Mauro, Jim; and Gregg, Brendan. Solaris Performance and Tools, page 57. Prentice Hall PTR, 2007.] = See also = * [https://dtrace.developer.network.com/nonav/using-dtrace-in-grid.html Sun Grid Application Observability Using DTrace]