Tdb

Monitoring our daemons

A good part Enstore is a system of daemons which do not run with a terminal. There is a need to get access to Python (as opposed to "C-language") level access to the daemons in order to understand their behavior better and to help debug them.

tdb is a package which allows one or more people to monitor a running program, inspecting it at the python level. The monitor also allows for one person to run the python debugger, pdb, if needed.

Access to the monitor is by telnet. The necessary setup is is done automatically in the Enstore Dispatching Worker object, if the threading is available. Therefore, all servers which inherit from dispatching worker can be debugged and monitored using this tool. The dispatching worker implementation uses the same number for its "well known" TCP port as the well-known UDP port for the server's services. The host name used right now is localhost, as there is no security (yet) in the package.

Monitors

There are three kinds of monitoring supported by the package. The most lightweight is non-blocking, with no use of the Python language's tracing facilities. Do not confuse the native Python trace with Ron Rechenmacher's Trace module. The Native Python Trace features call a Python subroutine at least at each subroutine call in the main application, and perhaps for every Python source line in the main application. More heavyweight is non-blocking, monitor, This records the python stack as the main thread runs. Given the stack, a user can print a snapshot of the stack and print the local variables at each level, which the program continues to run. Right now, the snapshot is taken each time the main thread executes a line of Python. In the future, I think I can modify this code to remember the python stack each subroutine call. If this is done, the overhead of this kind of monitoring could be acceptable for use in production. Most heavyweight is stopping the main thread to run PDB in it. Right now this is possible if the program is not blocked in, say an I/O or a select.

Non blocking, non tracing monitor

The package has has a non interfering monitor. This neither blocks the server's main thread nor intrinsically degrades the performance of the server. The following functions are of this type:

list

List lines from a python source file. Specify the full path name, omitting the .py suffix.

modules

List modules imported into this program.

who "module"

List the variables at global scope in "module". You can look at these variables in a detailed fashion using the eval statement, described below.

whoall "module"

List the variables and values of all variables at global scope in "module". You can look at these variables in a detailed fashion using the eval statement, described below.

help

Print a help message

quit

Quit the monitor, close the telnet session.

eval

Calls eval with an arbitrary expression. The remainder of the line is passed to the eval function in the scope of the tbd monitor. It may be necessary to extend the scope of the tdb monitor with import statements if you command does not execute as expected.

The module used as the main program will not have the name you expect. For example, if you telnet-ed to the configuration_server, you would expect that you could access configuration_server.__name__. This is not so. In python, the "main"' module has the name __main__. To see variables in the main program you may have to import __main__ in the monitor and then qualify the variable names with the module name __main__.


	tdb>>import __main__

	tdb>>eval __main__.__name__
	'__main__'

	tdb>>

exec

Issue an exec statement with arbitrary python statements. The remainder of the line is attached to an "exec" statement. The statement is exec-ed in the context of the tbd monitor. It may be necessary to extend the scope of the tdb monitor with import statements if you command does not execute as expected. See the discussion of eval.

import

Import a module into the name space seen by the monitor's exec and eval commands. Has one parameter, the module to load.

Non blocking, tracing monitor

If you are willing to accept the overhead of using the python tracing facility, which runs python commands in between the main program's commands, the system can record the current python stack, which can, in turn, be displayed for display by the tdb monitor. The overhead is making a two-element dictionary and saving a reference to it in a variable in the tdb global name space

This feature may be most useful when a process "hangs" and you want to known where it is. Unfortunately, once enabled, the the interpreter must execute at least one instruction before you can see the stack. This is due to the mechanics available to me through python.

Two additional commands will dump the save stack information:

main

Dumps a printout of the stack, very much like the pdb command "where". For each stack frame, the file, line number within file, and program text for that line is dumped. This does not stop the main thread. Issuing this command repeatedly gives a kind of crude trace of the main thread.

mainwhoall

Prints the stack plus the local variables and their values for each stack frame. This does not stop the main thread. Issuing this command repeatedly gives a kind of crude trace of the main thread.

pdb

If your program is not blocked, for example in a read or select call, then you can invoke and use pdb on the main thread, and you can execute all normal pdb commands. A side effect of this is that sys.stdin and sys.stdout are directed to your telnet session.

It is a bug, but also true that you cannot properly quit the debugger and let your program run. Right now, you have to tear the program down when you quit pdb. As best I can tell, this is not a consequence of the mechanisms in python, I just need to write the code to support this, and have not done this as of yet.