Tdb
Monitoring our daemons
A good part Enstore is a system of daemons which do not run with a
terminal. There is a need to get access to Python (as opposed to "C-language")
level access to the daemons in order to understand their behavior better and
to help debug them.
tdb is a package which allows one or more people to monitor a running
program, inspecting it at the python level. The monitor also allows for
one person to run the python debugger, pdb, if needed.
Access to the monitor is by telnet. The necessary setup is is done
automatically in the Enstore Dispatching Worker object, if the threading is
available. Therefore, all servers which inherit from dispatching worker can
be debugged and monitored using this tool. The dispatching worker
implementation uses the same number for its "well known" TCP port as the
well-known UDP port for the server's services. The host name used right now is
localhost, as there is no security (yet) in the package.
Monitors
There are three kinds of monitoring supported by the package.
The most lightweight is non-blocking, with no use of the Python language's
tracing facilities. Do not confuse the native Python trace with Ron
Rechenmacher's Trace module. The Native Python Trace features call a Python
subroutine at least at each subroutine call in the main application, and
perhaps for every Python source line in the main application.
More heavyweight is non-blocking, monitor, This records the python
stack as the main thread runs. Given the stack, a user can print a snapshot of
the stack and print the local variables at each level, which the program
continues to run. Right now, the snapshot is taken each time the main thread
executes a line of Python. In the future, I think I can modify this code to
remember the python stack each subroutine call. If this is done, the overhead
of this kind of monitoring could be acceptable for use in production.
Most heavyweight is stopping the main thread to run PDB in it. Right now
this is possible if the program is not blocked in, say an I/O or
a select.
Non blocking, non tracing monitor
The package has has a non interfering monitor. This neither blocks
the server's main thread nor intrinsically degrades the performance
of the server. The following functions are of this type:
list
List lines from a python source file. Specify the full path
name, omitting the .py suffix.
modules
List modules imported into this program.
who "module"
List the variables at global scope in "module". You can look at these
variables in a detailed fashion using the eval statement, described below.
whoall "module"
List the variables and values of all variables at global scope in "module".
You can look at these variables in a detailed fashion using the eval statement,
described below.
help
Print a help message
quit
Quit the monitor, close the telnet session.
eval
Calls eval with an arbitrary expression. The remainder of the line is passed
to the eval function in the scope of the tbd monitor. It may be necessary to
extend the scope of the tdb monitor with import statements if you
command does not execute as expected.
The module used as the main program will not have the name you expect.
For example, if you telnet-ed to the configuration_server, you would expect
that you could access configuration_server.__name__. This is not so. In
python, the "main"' module has the name __main__. To see variables in the main
program you may have to import __main__ in the monitor and then qualify
the variable names with the module name __main__.
tdb>>import __main__
tdb>>eval __main__.__name__
'__main__'
tdb>>
exec
Issue an exec statement with arbitrary python statements. The remainder
of the line is attached to an "exec" statement. The statement is exec-ed
in the context of the tbd monitor. It may be necessary to
extend the scope of the tdb monitor with import statements if you
command does not execute as expected. See the discussion of eval.
import
Import a module into the name space seen by the monitor's exec and eval
commands. Has one parameter, the module to load.
Non blocking, tracing monitor
If you are willing to accept the overhead of using the python tracing
facility, which runs python commands in between the main program's commands,
the system can record the current python stack, which can, in turn, be
displayed for display by the tdb monitor. The overhead is making a two-element
dictionary and saving a reference to it in a variable in the tdb global name
space
This feature may be most useful when a process "hangs" and you want to
known where it is. Unfortunately, once enabled, the the interpreter must
execute at least one instruction before you can see the stack. This is due to
the mechanics available to me through python.
Two additional commands will dump the save stack information:
main
Dumps a printout of the stack, very much like the pdb command "where". For
each stack frame, the file, line number within file, and program text for that
line is dumped. This does not stop the main thread. Issuing this command
repeatedly gives a kind of crude trace of the main thread.
mainwhoall
Prints the stack plus the local variables and their values for
each stack frame. This does not stop the main thread. Issuing this command
repeatedly gives a kind of crude trace of the main thread.
pdb
If your program is not blocked, for example in a read or select call, then
you can invoke and use pdb on the main thread, and you can execute all normal
pdb commands. A side effect of this is that sys.stdin and sys.stdout are
directed to your telnet session.
It is a bug, but also true that you cannot properly quit the debugger and
let your program run. Right now, you have to tear the program down when you
quit pdb. As best I can tell, this is not a consequence of the mechanisms in
python, I just need to write the code to support this, and have not done
this as of yet.