$Id: CHANGES,v 1.2.2.4 2005/01/09 03:27:55 trockij Exp $ Changes between mon-1.0.0pre4 and mon-1.0.0pre5 Tue Dec 14 17:22:52 EST 2004 ----------------------------------------------- (some of this may have been omitted in previous changelog entries) -call_alert should not call waitpid, it only needs to call close, since perl's close will call waitpid for us. calling waitpid after closing a filehandle opened with open(FH, "|-") will always return an error, which is what it was doing. -don't quote output of "list descriptions" -carp about duplicate services and periods -added some comments to better explain the alert suppression in do_alert, added ignore_summary from head -updated explanation of alert decision logic in mon.8 man page, i.e. observe_detail and ignore_summary and strict options to alertevery -added -tcp flag to dns.monitor -fix to fping.monitor to better understand output from fping by Daniel Wallace -added http_tppnp.monitor by Jon Meek -updated ntpdate.monitor by Jon Meek -added radius.monitor -added snmpvar.monitor -added "excludewatch" to monshow -made netappfree.monitor use MON_CFBASEDIR env var -made snmpvar.monitor use MON_CFBASEDIR env var Changes between mon-1.0.0pre3 and mon-1.0.0pre4 Tue Aug 3 08:02:35 EDT 2004 ----------------------------------------------- -when allow_empty_group is not set and no host arguments to pass to a monitor, the interval wasn't being reset so it would spam the syslog with lots of "no host arguments" messages. this is fixed. -in reset_timer, there was a chance that _timer could get set to a negative value, which is not right. fixed it. -fixed the bug where lots of mon processes could accumulate if the exec of an alert failed. also fixed error handling of failed alerts. -added "show failures only" button to mon.cgi to speed it up. by Ed Ravin -small permissions fix to rpm spec file -added MON_CFBASEDIR variable to monitor and alert environment, which is set to the value of "cfbasedir" in the config file. -removed unfinished snmp trap handling stuff. it doesn't work at all, and it's misleading to people even though the man page says it doesn't work. -added monitor_duration and monitor_running output to opstatus detail in monshow Changes between mon-1.0.0pre1 and mon-1.0.0pre3 Mon Jul 12 09:12:29 EDT 2004 ----------------------------------------------- -changed README to refer to the new, more sensible name for the perl module client, which is mon-client -applied eric's updates to INSTALL and added a mention of monshow and mon.cgi as the web interfaces -added eric's rpm spec file (i removed the patches because they are no longer needed) -added lmb's syslog.monitor (a nifty hack) -added 'alertevery strict' code and docs, updated the README and INSTALL to mention CVS, updated CREDITS -incorporated mon.cgi 1.52 -minor addition to alert behavior explanation in mon.8 -in dialin.monitor.wrap.c, return the exit status of execv (if it fails, that is) -fixed path to perl in file_change.monitor and smtp3.monitor -added some rcs tags to identify the file versions -handle_trap_timeout now calls process_event, and it works fine with alert/upalert/alertevery/etc. as shown by my testing -received traps now reset the trap timeout counter, and fixed some other stuff wrt trap timeouts -added sub process_event and made proc_cleanup and handle_trap use it so that the alert mgmt code is shared rather than in two places. i tested as much of it as i could and all seems to work well now, especially upalert, alertafter, alertevery with traps. -added per-service "_monitor_duration" variable which records how many seconds the previous monitor took to execute. this is available via "list opstatus". if no monitor has executed yet then the value is -1. -added per-service "_monitor_running" variable whose value is 0 or 1 depending on whether the monitor is currently running for that service. -removed gunk from handle_trap regarding the various TRAP_COLDSTART, etc. processing, since most of it was a bad idea anyway, or at least as far as i could tell. traps and their exit values are now processed exactly as monitors are, which simplifies things greatly and adds to more intuitive functionality. this means the "spc" value in a trap is now ignored. -fixed some args processing in call_alert -fixed a bug which would prevent alerts or upalerts from being sent when call alerts is passed the "output" argument whose value is undef -remove usage of parse_line in trap processing (backported from mon 1.1 code) -make esc_str escape spaces in order to be compatible with monperl-1-0-0pre1 -added list of all possible client commands to moncmd -added --community to set the snmp community in reboot.monitor -patch to traceroute.monitor from meekj added StateDir, TracerouteOptions, StopAt config options some bugfixes to config file parsing reap children to avoid defunct processes added timeout alarm -up_rtt.monitor added -r to log individual rtts, better error reporting for tcp and udp check Changes between mon-0.99.3-47 and mon-1.0.0pre1 ----------------------------------------------- Fri Jun 18 10:35:18 EDT 2004 -removed nonsensical unless statement which would conditionally set the op status to STAT_OK. it should be set unconditionally -added "strict" option to alertevery -changed protocol to escape spaces to coincide with the change in Mon::Client Changes between mon-0.99.2 and mon-0.99.3 Fri Jun 11 10:55:27 EDT 2004 -------------------------------------------- -updated lpd.monitor -added "watch" parameter to monshow submitted by Joe Rhett -xedia-ipsec-tunnel.monitor now understands the new OIDs for sysObjectID.0 in the newer versions of the software -fixed exclude_period parsing problem reported by Konstantin 'Kastus' Shchuka and Jeroen Moors -fixed a setlogsock problem reported by Gilles Lamiral added AIX to the systems which require setlogsock -added "clientallow" restriction (trockij renamed it that from serverallow) by Ed Ravin -added monfailures to clients directory, contributed by Ed Ravin -patch to fping.monitor which catches more error messages from fping by Ed Ravin -patch for minor *bsd startup nits by Ed Ravin -patch to msql-mysql.monitor to support the more typical summary/detail output format. by Ed Ravin -patch to phttp.monitor which corrects the uninitialized variable error by Ed Ravin -patch to phttp.monitor to show more detail in regexp failures by Erik Inge Bolsų -patch to imap.monitor to report the usual summary followed by details, and clarify some error messages for a couple of situations by Ed Ravin -adjust for some current fping output (ICMP host unreachable), correct the docs for failure_interval (which is currently listed as a period def rather than a service def) from Debian users, submitted by Roderick Schertler -another patch to fping.monitor to catch ICMP Time Exceeded failure, submitted by John Nelson -MON_DESCRIPTION now supplied to monitors -added "-f" to etc/S99mon -taint fix for perl 5.8 in monshow from Roderick Schertler -added trace.monitor, and alternate route path monitor -changed ftp.monitor to detect no ftp server when socket opened okay. submitted by Dan Kendall -updates to ftp.monitor to show detail of ftp conversation -added irc.alert -added dns-query.monitor -mysql.monitor - fix for deprecation of _ListTables by Aled Treharne -updated smtp.monitor to output detail -added "version => 2" to monitors which use the net-snmp module so that they work with net-snmp 5.0.6 -minor documentation updates -fixed a bug with the CGI invocation of monshow which would yield the error message "premature end of script headers" when you "drilled down". bug reported by Hugh Caley -mail.alert includes the service description in the body -fix for alertafter timer, fix for upalertafter feature sent by Adrian Chung -fix phttp.monitor for RFC compliance, uses \r\n everywhere in its requests. Just \n leads to a "400 Bad Request" on IIS 6.0 in native mode. sent by Erik Inge Bolsų -fixed reboot.monitor and asyncreboot.monitor to handle counter roll-overs -_upalertafterinterval typo fix from Michael Rademacher -fix to phttp.monitor for EINPROGRESS from Erik Inge Bolsų -updated file_change.monitor from Jon Meek -dtlog a bug where blank lines from the dtlog are being output to the client, and the client is interpreting the timestamp as zero. fixed by David Nolan -fixed qpage.alert: Only the first pager gets notified when there is more than one listed for a qpage.alert. The problem is that qpage returns 0 for failure and 1 for success which is backwards from what the alert routine thinks will happen. submitted by -updated nntp.monitor to support authentication submitted by Kai Schaetzl/conactive.com -unbuffered monerrfile, maybe it'll work -fixed trap auth problem, auth.cf parsing bug submitted by danb@level7.ro -updated mon.8 to explain how to set environment variables for each service to be passed to monitors and alerts. also removed the wording that the client handling is iterative (it is not). -updated netappfree.monitor, submitted by Ed Ravin -patch to fix broken upalerts submitted by Daniel Fenert -patch to dns.monitor for added functionality Added -serial_threshold command line argument to allow the zone serials between the master and the slaves by that much, at most. Necessary to avoid spurious errors during zone propagation. High thresholds are typically unnecessary, but when using Dynamic DNS, with zones that update hundreds if not thousands of times an hour, they can be off by quite a bit but still be OK. If propagation completely fails, eventually we'll exceed the threshold. Added a mode for monitoring caching only name servers. Give the -caching_only argument, and then instead of -zone and -master arguments, you specify -query arguments, which are of the form record[:type]. (With A records being the default type) So you might specify '-query myzone.com:MX -query myzone.com:A -query _servicename._udp.myzone.com:SRV' Every server will be queried for each request, and must return a valid response. But the records will NOT be cross checked against each other, as various round-robin DNS situations may cause the different servers to have different data. Fixed some error reporting code to format the output better Changed the script exit value to be the highest count of how many servers failed on a single query. (I.e. if three servers are queried, for 20 records, the highest error code possible is 3, not 20 as it was before) I found all of these changes to be necessary in our environment, and none of them greatly change the original behavior, so I figured they were worth submitting. I would just submit a diff, but a context diff was actually BIGGER then just sending the whole file... submitted by David Nolan -fixed tiny bug in the cmdline operation of monshow which was causing the unexpected "No non-switch args expected" which was reported by -connect STDIN to /dev/null upon daemonization even if monerrfile is specified. -added the "monerrfile" documentation to mon.8 and explained the "all" directive in auth.cf Changes between mon-0.99.1 and mon-0.99.2 Sat Sep 8 10:06:01 EDT 2001 -------------------------------------------- -fping.monitor reports the error when it gets a return value from fping which it doesn't recognize. this could have been the cause of some phantom alerts reported w/empty summary lines. -fixed comments in CHANGELOG -andrew ryan patch to fix checkauth and some monerrfile fixes, theo's fix for alertevery. this fixes the "cannot connect to mon server" problem with mon.cgi. -andrew ryan patch to open/close dtlog for each entry, renamed open_dtlog to init_dtlog -updated KNOWN-PROBLEMS Changes between mon-0.38.21 and mon-0.99.1 Sun Aug 19 15:18:55 EDT 2001 -------------------------------------------- ******DEFAULT BEHAVIOR HAS CHANGED FOR THE FOLLOWING FEATURES************ the following two defaults were changed, since they seem to be unintuitive to most people, based on feedback given on the mailing list. -the old "comp_alerts" is now the default. to get the old behavior, specify "no_comp_alerts" in the period section. -the default is now the old "summary" behavior for alertevery. that means that for successive failures with "alertevery" used to suppress multiple alerts, only the summary line will be used to short-circuit the alert suppression. to get the old behavior, append "no_summary" to the alertevery line. the old "summary" syntax is still permitted to help w/backwards compatibility. ************************************************************************* -cleaned up config parsing a bit -updates to up_rtt.monitor, added traceroute.monitor, smtp3.monitor, http_tpp.monitor, file_change.monitor -fixed problem where upalerts were not sent for ack'd failures -updated the sample etc/auth.cf to give examples of trap authentication -updated man page for mon to include better explanation of auth.cf syntax. -formatting updates to monshow, added "summary-len" option, html fixes -fixed problem where server responded twice with an extra "220 ok" after doing a reload -rewrote fping.monitor to return more verbose output, and to sort the failed hosts on the summary line. this was wreaking havoc with "alertevery", since the order of the failed hosts in the summary might change, even though the same hosts were failing on successive tests. added "-s ms" option which will consider hosts with a response time greater than "ms" milliseconds as failures. added "-a" option to fail only when all hosts fail, and "-T" to call traceroute on each failed host. "-h" lists options. -made nearly all monitors output their summary line (if it is a list of hosts) in sorted order. -updated man page for mon with more detail on the behavior of "alertevery" and "alertevery ... summary" -added xedia-ipsec-tunnel.monitor to monitor site-to-site ipsec tunnels on a Xedia AP450 router. -silkworm.monitor recognizes different brocade OEM'd fcal switches, ignores "absent" sensors, and has a work-around for the braindead behavior of swFCPortAdmStatus to detect offline ports. -fix to msql-mysql.monitor to allow --port to override default port. submitted by Adrian Phillips -stdout and stderr now can be sent to a file by specifying a filename in the variable "monerrfile". submitted by Ed Ravin -updated dns.monitor to output only the failed hosts on the summary line. "test config" fix, new authentication directives "!" and "AUTH_ANY". "AUTH_ANY", check and warnings for hostgroups which are defined but never used more descriptive error when m4 is not found removed second definition of disen_host and load_stat "alertafter timeval" patch, alerts for period will only be called if the service has been in a failure state for more than the length of time desribed by the interval, regardless of the number of failures noticed within that interval. submitted by Andrew Ryan -more verbose error when bind(2) failure tyop fixes to mon.1 updated COPYRIGHT mon.1 is now mon.8, and references to mon.1 changed accordingly update to mon.d/Makefile to use $CFLAGS and $LDFLAGS silence some warnings in rpc.monitor.c add /usr/local/lib to standard search paths for alert.d and mon.d, and updated mon.8 make monshow run under taint mode, fixes view directory to match the docs default server for moncmd and monshow is now localhost http.monitor accepts a 302 status (moved temporarily) fixed --auth in monshow reboot.monitor now uses $MON_STATEDIR as the default state directory, and "reboot.monitor" (not "state") as default state file. FD_CLOEXEC fix update to monshow.1 submitted by Roderick Schertler -fix to pop3.monitor to produce more verbose errors fix to reboot.monitor to add --verbose option submitted by Ed Ravin -qpage.alert accepts "-v" option for verbose smtp.monitor has increased verbosity of failure details submitted by Steve Siirila -re-wrote Steve Siirila's mon.monitor to use Mon::Client and put it in mon.d -patch to do proper syslog handling on openbsd, MON_DEPEND_STATUS env variable passed to monitors submitted by Mark D. Nagel -added "failure_interval" functionality. i actually re-wrote the patch to make it a bit more proper, and renamed the parameter from "alertintervalcheck" to "failure_interval" for clarity. submitted by CHASSERIAU JeanLuc -netappfree.monitor changes Allows the monitor to give more verbose error messages which will handle multiple volumes. Instead of reporting: "1.0GB free on " it will now say: "1.0GB free on :/vol/" Fixes a bug where multiple alerts from a single filer would cause multiple entries in the summary line. Allows the monitor to handle the case where the NetApp MIB isn't available to the script. added na_quota.monitor. trockij made some small changes to it so that it will allow disable and enable to work. submitted by Theo Van Dinter Changes between mon-0.38.20 and mon-0.38.21 Sun Jan 14 11:55:06 PST 2001 -------------------------------------------- -merged in Andrew Ryan's mon.testconfig.patch to enhance error detection and reporting of config file errors. a new client command "test config" loads and parses a new config file w/o committing it, and returns error conditions found. -added foundry-chassis.monitor, detects PSU failures on Foundry chassis devices. -update for up_rtt.monitor and added http_tp.monitor from Jon Meek. -fixed OS detection, patch supplied by Roderick Schertler -tiny patch to freespace.monitor which lets the user specify a min % free, submitted by Christian Lademann -http.monitor now accepts 401 responses as success, a tweak from Tim Small -documentation correction from Chris Snell -added cpqhealth.monitor to which detects PSU/fan/temp problems by querying the Compaq Insight manager agent on Presario systems -save sum and dtl into last_summary and last_detail from traps, bug reported by Jan Krivonoska -patch to correct trap decoding problem, submitted by Ramon Buckland -a trap timeout now clears the value of last_detail -dtlog is now written to upon reception of an "ok" trap -patch from Gilles Lamiral which adds accuracy to scheduler's synchronous operation. this should help keep rrdmon happy. -added silkworm.monitor to test the operational status of Brocade Silkworm FCAL switches. it should detect port, fan, psu, and temperature failures. -fix to http.monitor from Andrew Ryan which prints the HTTP response header even if a timeout was encountered. also fixed another bug w/regards to timeout handling. i applied this fix to the following monitors: up_rtt.monitor http.monitor ftp.monitor http_t.monitor smtp.monitor pop3.monitor nntp.monitor imap.monitor -http.monitor will allow you to supply a user agent string of your own via "-a useragent". also "-o" will omit HTTP headers from properly working hosts (Andrew Ryan ) Changes between mon-0.38.19 and mon-0.38.20 Sat Aug 26 13:29:45 PDT 2000 -------------------------------------------- -updated some docs -http.monitor checks for 401 status code -fixed the buggered 0.38.19 release. damn you, cvs, damn you. Changes between mon-0.38.18 and mon-0.38.19 Sun Aug 20 14:28:23 PDT 2000 ---------------------------------------------- -fixed exclude_hosts (again) and tested and tested and tested and it works -patch from andrew ryan to add checkauth command -included phttp.monitor from Gilles Lamiral -changed some wording in INSTALL -first stage of new config buffering -readhistoricfile now clears out last_alerts before reading it in -added -t TRAPPORT cmdline arg -merged patch from Andrew Ryan to support multiple authtypes, including PAM support. Also fixed a bug when the user is listed in auth.cf but not in the userfile. -updated documentation of mon.1 to include PAM authentication. -removed non-portable sockaddr pack statements from monitors. -CVS has pissed me off to no end with its anomalies, so I did a sensible thing and converted the repository to prcs. prcs seems to be simple, easy to understand, not quirky, and good enough. So, if you notice that the ID version numbers in the sources have changed, this is why. -removed mon.cgi, and replaced it with a README Changes between mon-0.38.17 and mon-0.38.18 Sat Mar 4 11:24:34 PST 2000 ---------------------------------------------- -http.monitor accepts 200 and 302 -monshow changes, mostly detail output -"list opstatus" command shows more data: first_failure failure_duration exclude_hosts interval exclude_period randskew last_alert -fixed exclude_hosts Changes between mon-0.38.16 and mon-0.38.17 Sun Feb 27 20:18:46 PST 2000 ---------------------------------------------- -added "SELF:" for "depend" variable. When the config file is parsed, SELF: expands into "currentwatch:". -fixed some errors in mon.1 -added exclude_hosts -added exclude_period -removed duplicate parsing in read_cf -"list opstatus" will now accept a list of "group,service" pairs if you don't want to list every single group and service. -documented MON_LOGDIR and MON_STATEDIR in mon.1 -changed how args are split in client_command -more enhancements to monshow, esp. config options and "view" support. read the man page for the details. "views" are meant to show a subset of the mon opstatus, and be configurable by the clients. for example, each department can get their own view of the systems and services which they care to monitor instead of seeing the entire list of services monitored by the server. -added protid client command, and store PROT_VERSION as an integer for simple comparison. Changes between mon-0.38.15 and mon-0.38.16 Sun Feb 6 16:45:55 PST 2000 ---------------------------------------------- -monshow now properly displays the "last check" column in seconds, and it also displays the description, and you can click on services to get details. acknlowledged failures are indicated. -rewrote cf-to-hosts to support continuation lines -fixed some documentation -upalerts work with traps now, thanks to Jim Farrell -savestate now produces an error if called w/o arguments -a patch set submitted by Andreas J. Koenig that helps with some of the documentation -silly "list pids" output fixed so that the output doesn't have lines beginning with numbers, which confuses Mon::Client. submitted by David Waitzman -fixed problem with acking non-failed services -config var that allows specification of syslog facility to use -detail about how "use snmp" is parsed. it's now a variable in the config file, and it still doesn't really do anything. -historicfile is re-read upon server reset. -catching a HUP in the I/O event look should no longer produce the "error trying to recv a trap" message in syslog. -new config option "startupalerts_on_reset" -new client command "list dtlog" submitted by Martha H Greenberg Changes between mon-0.38.14 and mon-0.38.15 Sun Nov 14 11:20:23 PST 1999 ---------------------------------------------- -Re-wrote dependency code, and fixed the "no upalerts with dependencies" bug. -list opstatus output now includes a new variable called "depstatus" -Documented the "alertafter" behavior if only 1 argument is supplied. -Fixed a bug in the arg processing of tcp.monitor, submitted by Phillip Pollard . -Disabling hosts which do not exist now produces an error -Giving an invalid disable command now produces an error -Added "list deps" command. -If config file ends in .m4, process it with m4. -monshow now shows --deps -trap.alert uses opstatus found in MON_OPSTATUS or -o, and correctly reports it using "spc=". -fixed problem where ack'ing a non-existent service is not complained about, reported by bem@cmc.net. -"use strict"-ified the server. -monshow now does CGI && command-line, opstatus.cgi is deprecated see etc/example.monshowrc -ldap.monitor now uses Net::LDAP -summary output of successes is saved in _last_summary -client output is hex-escaped, and received traps are un-escaped. install the Mon-0.6 perl client for this to work properly, since it includes the appropriate changes. -renamed "reset" function. This was a BIG booboo and it was causing a core dump once in a while. "reset" is a perl built-in, which I didn't realize :( -tags in traps are unquoted and unescaped in handle_trap. Mon::Client was changed to quote and escape all of them. -added "numalerts" per-period variable and documented it. it controls the number of alerts sent for a failure -added "comp_alerts" per-period variable and documented it. this var stops upalerts from being sent w/o a complementary "down" alert -it is not possible to specify the binding address for server and trap ports. see the man page for details. -fixed some signal handling and terminal input in moncmd. patch provided by djw@metatel.com -patch from doke@conectiv-comm.com to correct error reporting in msql-mysql.monitor -long lines may be continued by trailing them with a backslash. read the man page for more info. -added "alerts_sent" to opstatus output Changes between mon-0.38.13 and mon-0.38.14 Mon Aug 23 10:48:42 PDT 1999 ---------------------------------------------- -Some clarification in INSTALL procedure. -Removed old patch that attempted to fix the "no upalerts with deps" problem. -Added recursion limit for deps, and the "dep_recur_limit" config parameter in the config file. -Some changes to "monitor .* ;;" parsing behavior. -telnet.monitor now uses Net::Telnet, which is more efficient than forking a copy of tcp_scan. -freespace.monitor uses the newly renamed Filesys::DiskSpace, which used to be File::Df. -added asyncreboot.monitor, which uses the UCD SNMP asynchronous API to get the uptime of a bunch of devices in parallel, similar to fping. This requires ucd-snmp-3.6.3 or greater and SNMP-1.8 or greater. -Ditch stderr in fping.monitor, submitted by felicity@kluge.net -ftp.monitor now sends "quit\r\n" -Dependency bug fixed re: $dlastChecked, reported by felicity@kluge.net -Commented out some spurious output in dns.monitor, as submitted by brad@shub-internet.org -Tiny fix to mon.cgi from Matthew Price -Fix to trap.alert to make it actually work w/o complaining about "undefined type". -Fix to opstatus.cgi for refresh, submitted by howie@thingy.com, bug ID 16. -Patch from Petter Reinholdtsen to add debug output to nntp.monitor, and -g to specify the newsgroup to test. -Re-wrote tcp.monitor to not require tcp_scan. No more dependencies on the "Satan" software, since fping is available separately. -Virtual host support in http.monitor, submitted by Neale Pickett Changes between mon-0.38.12 and mon-0.38.13 Sun Jun 13 11:18:16 PDT 1999 --------------------------------------------- -Monitors and alerts are now passed ENV variables MON_STATEDIR and MON_LOGDIR. -Fixes and tuning to opstatus.cgi. -monstatus has been removed. Replacement is monshow. -util/cf-to-hosts accepts -M flag to pre-process with m4. -Fixed some monshow output when service has not yet been tested. -Some adjustments to the monshow man page. -Forked monitors now close server sockets before execing the monitor. Bug ID 16 submitted by james9394@hotmail.com. -Bug re: "time" file in output of monshow. -Some minor code cleanups. -ping.monitor now recognizes netbsd. -mon.cgi uses Mon::Client, but not all the functionality has been converted to this interface, namely the "disable" and "reset" features. Changes between mon-0.38.11 and mon-0.38.12 --------------------------------------------- -Fixed "list descriptions" bug submitted by Vad Adamluk -Added "last_check" and "monitor" output to client list opstatus. opstatus.cgi uses this. Only works for 0.38.* protocol. -opstatus.cgi now uses Mon::Client, and some bug fixes and enhancements. -Removed "bind" from ftp.monitor http.monitor http_t.monitor imap.monitor nntp.monitor pop3.monitor smtp.monitor. It was unnecessary. Changes between mon-0.38.10 and mon-0.38.11 --------------------------------------------- Another small (but substantial) bug fix in call_alert which would prevent alerts from being called if $args{"args"} was passed as an undefined value. Changes between mon-0.38.9 and mon-0.38.10 --------------------------------------------- -Fixed a bug where call_alert didn't set _last_alert correctly, thus causing things like alertevery to not work properly. -Small bug fix in handle_trap_timeout -Removed some debugging junk for dtlogging -A few code cleanups here and there -Fixed @groupargs problem in call_alert Changes between mon-0.38.8 and mon-0.38.9 --------------------------------------------- -Removed %var% substitution in favor of -M, which pre-processes the config file with m4. Macro expansion should be handled by software whose sole purpose is to perform macro expansion, hence m4. -Added an "example.m4" in the etc/ directory. -Added "fail" trap. -Pass _op_status value to alerts via env variable MON_OPSTATUS. -Updated file.alert to log MON_OPSTATUS. -Fixed bug in client buffer handling where a blank line submitted by the client would prevent all future commands from being processed. -The server no longer disconnects the client on an invalid command. -Added "--disabled" and "--state" commands to monshow. Showing disabled hosts is no longer the default. The defaults can be set in ~/.monshowrc. This requires the latest Perl module (Mon-0.4). Also added "--old" option. -Added man page for monshow. -Updated some docs in mon.1 -Don't complain if userfile does not exist and the authtype is not userfile. -Patched in Gilles' historicfile stuff, and documented it in mon.1, and fixed some bugs. -Alerts are no longer called with -l parameter. It's never been documented, and no alerts use it, so I'm ditching it. -version command returns a value like "0.38.9" rather than a float. -Separated alert calling function from the function which determines if an alert should be called. -Alerts are now forked with a separate environment than the parent. -"test alert|upalert|startupalert" client command added, which will immediately call an alert for testing purposes. Updated the docs for moncmd to reflect this command. Changes between mon-0.38pre7 and mon-0.38.8 --------------------------------------------- -mon is now kept under CVS control (exclusively to maintain my own personal sanity). The Perl module is distributed as a separate file now, so that it can find its home in the CPAN module directory. -Documented "traptimeout" and "trapduration", and cleaned up some docs in mon.1. -Included upalerts and startupalerts in gen_scriptdir_hash() -Lots of code cleanups in read_cf. -alertafter now has two forms, one just like before, and one with a single integer argument which alerts after some number of consecutive failures. -I should have done this long ago. %watch now looks like this: $watch{$group}->{$service} instead of $watch{$group}[$service] and $service is the text of the service, not an integer. -Lots of code cleanups regarding global variables which are altered by read_cf. -Fixed "list successes" and "list failures" command. -Added "clear timers" command which clears the timers for things like alertafter and alertevery and such. -netappfree.monitor has some MIB reading changes which fixes the core dumping problem. -Added set_op_status. -Removed some debug cruft from check_depend. -Fix to $fhandles{"$group/$service"}. -Updated "-h" output to be accurate. -Test -f to see if an alert or monitor exists before trying to exec it. -gilles reported a problem with the servertime output, which was fixed. -"interval" initialization was supplying a default interval, which isn't cool because it didn't allow you to have a service w/o an interval for use as a trap sink. The new default is undef. -I started work on muxpect, which is sort of a combination of the mux capabilities of fping and doing Expect-style chat sequences over TCP sockets. It is meant to replace those millions of TCP-based monitors in the mon.d directory with a less CPU-intensive version. -Some alert decision code moved from proc_cleanup to do_alert where it belongs. -Changed some trap code. Changes between mon-0.38pre6 and mon-0.38pre7 --------------------------------------------- -Added "basedir=" and -b, and "cfbasedir=" and -B -use usleep. -Added startupalerts which are called upon startup. -alerts called with env variable MON_ALERTTYPE -logdir, added downtime logging via dtlogging/dtlogfile -Periods can now be specified using a LABEL: tag (similar to labeling blocks and loops in Perl). This allows multiple periods with the same period value. This feature is useful because the "alertafter" and "alertevery" counters are kept on a per-period basis. -Fixed process.monitor to use the new values for the process table in the UCD MIB. -Fixed a problem with reload and path/file expansion. -Alerts are now called with MON_RETVAL set to the exit value of the monitor. -Added trap.alert. Not quite documented. -Added version command to Mon::Client, thanks to nagel@intelenet.net. Changes between mon-0.38pre5 and mon-0.38pre6 --------------------------------------------- -Some small adjustments to fping.monitor. -SNMP trap reception is now part of the I/O loop. -Began work on handle_snmp_trap, and got rid of SNMP-related junk in handle_trap. -Fixed problem with whitespace and monitors ending in ";;" reported by llee@stevens-tech.edu. -mon now has an officially assigned port number from the IANA. It is 2583, and the appropriate adjustments have been made to the clients. -Fixed sock_write in server to handle EAGAIN condition when kernel socket buffers fill up. -Added dialin.monitor which checks to see if dial-in modem lines are operational. It requires the Perl Expect module. Documentation is in doc/README.monitors. -Added an incomplete na_quota.monitor which is meant to monitor Network Appliance quota trees. Changes between mon-0.38pre4 and mon-0.38pre5 --------------------------------------------- -Fixed bug #3, problem with %alias -Fixed bug #4, problem with unpacking a socket which wasn't really a socket yet (out of order assignments) -Renamed Client to Mon-0.01 to follow the Perl module naming convention better, and to make room for things like logging modules and such. -Implemented more protocol commands to Mon::Client. Only 4 left... -Adjusted nntp.monitor to allow for some protocol / implementation inconsistencies. The commands now strictly follow RFC977. -Fixed problem with 0.38 protocol and Mon::Client. -Added multiple authentication types, including getpwnam, shadow, and userfile. Read the man page for details. -Added "version" client command to identify the protocol version. -Added host && user authentication to traps. Configuration is done in auth.cf. No documentation yet. -Added simple downtime logging, and documented it in mon.1. -Tiny change to reboot.monitor. -Added Mon::SNMP module to decode SNMP traps. -Added pod to Mon::Client. I think it took as long to code it as it did to document it. Changes between mon-0.38pre3 and mon-0.38pre4 --------------------------------------------- -Added fixes from Chris Adams that correct some $ALERTDIR and monitor argument problems. -Fixes to monstatus from brian moore. -Another fix to get the "exit=n" stuff working with alerts again, broken because of ALERTHASH code. -Wrote "monshow" in the clients directory, which is a per-user configurable command-line client. -Mon::Client perl module included to help simplify writing clients. It doesn't implement a number of commands yet. Look at the end of Client.pm to see which commands have been implemented and which have not been. "monshow" is in the clients directory, and it is an example of how to use the Mon::Client module. Mon::Client also needs POD documentation. Changes between mon-0.38pre2 and mon-0.38pre3 --------------------------------------------- -Added "ack" client command, which will acknowledge a service failure and surppress all further alerts for that service while it continues to fail. See the moncmd man page for details. You can "ack" with a string of text. -alertdir and scriptdir can now contain multiple colon-separated paths. This feature is useful for keeping site-specific monitors and alerts in their own directory which is separate from the monitors which are distributed with mon itself. Updated the docs for this. A hash is generated after each time the configuration is read which holds the location of where each monitor and alert script can be found. Errors are reported via syslog, so pay attention to them. -Some "alias" code tweaks. Gilles, does it work??? If no, send the patch. -Poked a little with the trap code. The trap format now contains a "spc" tag which specifies the specific type of trap, like maybe SNMPv1 or SNMPv2 or "mon 0.38". -An update to rpc.monitor to let it build under Solaris. It can now also check to see if an arbitrary RPC program number is registered. Documentation updates. Changes between mon-0.38pre1 and mon-0.38pre2 --------------------------------------------- -Some fixed from brian moore to correct client hangups -netappfree.monitor changes, including --list option to list the filesystems on the filers for help in building a config file. -Trap handling changes, including packet format. More provisions for direct SNMP handling. I might add direct provisions for mon to take SNMP traps directly. UCD SNMP trap handling callback mechanism doesn't fit into mon very well. -"list opstatus" output is now different -Time::HiRes is now required. The trick is that handle_io() wants to spend $SLEEPINT handling I/O from clients. Some OSs allow select(2) to return the time remaining, which we want because if select returned in say, 0.2 seconds then we want to call select with a timeout of 0.8 seconds so that we get the full second of waiting for I/O. Some OSs do *not* return the time remaining from a select call, and time(2) doesn't return sub-second resolution, so we need gettimeofday(2) to figure out how long select spent waiting. I guess the whole point here is to try to handle traps as soon as they come in. -Fixed @last_failures discrepancy with traps. -Added Gilles' alias record stuff to config file -Included Jon Meek's up_rtt.monitor which checks the availability of hosts and logs some statistics, like min/mean/max round trip times. Requires Time::HiRes and Statistics::Descriptive. Changes between mon-0.37l and mon-0.38 --------------------------------------- -Asynchronous trap handling. A remote agent may deliver a trap to trigger some action to be performed by a centralized mon server. -Client I/O entirely re-written to support multiple simultaneous non-blocking clients. -New client commands: test, set opstatus, list descriptions -Descriptions are now allowed in service definitions -Added Gilles' my-mon.cgi web interface. -Added Jing Tan's dependency code -When a service comes back up, it resets _first_failure so that alertevery does the right thing. -When handling a "term" from the client, kill -15 children instead of -9. -A fix from brian moore which corrected the client timeouts. -Added "servertime" client command. -Fixed moncmd to be more batch-friendly. -Some security patches to mon.cgi from Roderick Schertler, including and changes to mon and the documentation for Debian compatability. -Added "reload auth" command, which reloads the authentication table. -Added per-service environment variable passing to monitor and alert scripts. -Fixed '"no summary" with upalerts' problem reported by Eric Buda . The output of successful monitors could be lost under certain circumstances. -Fixed a small problem with upalerts reported by Josh Wilmes . Upalerts would be triggered for everything the first time mon is started. -process.monitor may optionally not load the MIBs upon startup. -"-A" option would not make itself relative to the directory that mon was started from. -netpage.alert not calls sendmail rather than "mail -s". Another fix from Josh Wilmes. -A trivial tweak to nntp.monitor. -Fixed problem with hostgroups named with periods reported by several people. This would cause a monitor process to not ever get cleaned up. -Changed how load_auth handles errors -fping.monitor adds a newline (right after removing it with tr :) -changed the debug behavior to allow multiple debug levels Changes between mon-0.37k and mon-0.37l --------------------------------------- -Config parser change from Michael Griffith that complains when "alertafter" will never trigger an alert. -Added "savestate" and "loadstate". Currently these only save and load the state of things disabled. -The server now can authenticate clients using a simple configuration file which can restrict certain users to using only some (or all) commands. "moncmd" was updated to support this feature. -Addition of "upalerts" which may be called when a service changes state from failure to success. "upalerts" can be controlled by the "upalertafter" parameter. -"alertevery" now ignores detailed output when it decides whether or not to send an alert. Patch submitted by brian moore . -"hostgroup and hyphen" patch. This simple patch will allow hyphens and periods in hostgroup tags. -Multiline output fixes in smtp.monitor -Now monitors are not called when no host arguments are supplied. This can be overridden with the per-service "allow_empty_group" option. -A fix to ftp.monitor by Tiago Severina which allows for multiple 220 lines in the greeting from the FTP server. -Added snpp.alert, contributed by Mike Dorman . This requires the SNPP Perl module. -Added ldap.monitor, contributed by David Eckelkamp . This requires the Net::LDAPapi module. -Added dns.monitor, contributed by David Eckelkamp . This requires the Net::DNS module. -Monitor definitions can now include shell-like quoted words, as defined by the Text::ParseWords module (included with perl5). e.g.: monitor something.monitor -f "this is an argument" -a arg -"allow_empty_group" is a new per-service option. If set, monitors will still be run even if all hosts in the applicable hostgroup have been disabled. The default is that allow_empty_group is not set. -Monitors are now forked with stdin connected to /dev/null. -Added "stop" and "start" commands which let make the server cease from scheduling any monitors. While stopped, clients can still be handled. The server may be started[sic] in "stopped" mode with -S. There is now a "reset stopped", which is an atomic version of "reset" and "stop". This is useful if you want to re-disable things immediately after a reset, so there will be no race conditions after the reset and before you disable things. opstatus.cgi now also reports the state of the scheduler. -Updated documentation for monitors, the main "mon" manual, and the "moncmd" manual. -Fixed a few problems in handle_client that had to do with shutting the server down. Changes between mon-0.37j and mon-0.37k --------------------------------------- -ftp.monitor defaults to the SMTP port instead of FTP! Thanks to ryde@tripnet.se for pointing this out :) -alanr@bell-labs.com added "-u" flag to http.monitor so that you can specify the URL to get. -Added hpnp monitor, which uses SNMP to query your HP JetDirect boards in your printers, and warns you when things go awry. For example, if there is a paper jam, mon can send out email telling you exactly that, and it includes in the mail the current readout on the printer's LCD. -Added netappfree.monitor, which uses SNMP to get the free space from Network Appliance filers. Uses a configuration file to set low-watermarks for each filer. -Added process.monitor (thanks to Brian Moore), which queries the UCD SNMP agents to determine if there are errors with particular processes on a machine. This is very useful for monitoring those processes which seem to die off on occasion :) Changes between mon-0.37i and mon-0.37j --------------------------------------- Tue Apr 14 19:22:13 PDT 1998 -Configuration parser now dies when a watch is "accidentally" re-defined. -Added process throttle to prevent a number of forked processes to go beyond a given value. This is a paranoia "safety net" setting. Changes between mon-0.37h and mon-0.37i --------------------------------------- Sun Apr 5 13:59:07 PDT 1998 -Added "randstart" and "randskew" parameters that can help balance out the load from services which are sheduled at the same interval. -Added "exit=range" argument to "alert", which allows triggering alerts based on the exit status of a monitor script. -Added an IMAP monitor, and an SNMP "reboot" monitor -Added http_t.monitor, which times HTTP transactions -Merged in patches supplied by Roderick Schertler - Changes to mon: - Support a pid file. This is necessary for the system's daemon control script (which stops and starts the daemon, plus tells it to reload its configuration) to work. - Treat SIGINT like SIGTERM (for interactive debugging). - Allow a `hostgroup' line in mon.cf which doesn't have any host names (useful for putting each host name on a line by itself). - Add `d' (meaning `days') to the list of suffixes accepted by the interval and alertevery keywords. - Squelch extra blank line output by alerthist and failurehist commands if there are no corresponding history entries. - Bug fix: fork() returns undef, not -1, on error. - Set umask 022, no 0. - Changes to mon.cgi: - Set -T mode. - Allow all local info fields to be blank, and set them that way by default. - Use the same default mon host as the other interfaces. - Use $ENV{SCRIPT_NAME} as the default $url. - Don't hardcode the path to mon, assume it is in the path. - Vet the name passed to the `list group' command. The old code would allow remote users to run arbitrary local commands. - Changes to opstatus.cgi: - Set -T mode. - Correct port, was 32768 should be 32777. - Add missing Content-Type to html_die(). - In monstatus correct the my() line in populate_group(), and add missing $group initialization. - Tweak typesetting in the mon.1 and moncmd.1 man pages. Changes between mon-0.37g and mon-0.37h --------------------------------------- Mon Jan 19 07:22:14 PST 1998 -I didn't merge back in a change to fping.monitor which sorts the output of fping, this causing alerts to go off unnecessarily when fping would return hosts in a different order each time it is run. An alert is send once every "alertevery" interval, unless the output changes. This is where it messed things up. -added GPL header to all source files. Changes between mon-0.37f and mon-0.37g ---------------------------------------- Sat Jan 10 10:40:26 PST 1998 -Fixed memory leak, with the help of Martin Laubach and Ulrich Pfeifer. The Perl 4.004_04 IPC::Open2 routine has a leak in it. -Now includes the SkyTel 2-way pager interface for mon! What a hack, but it works pretty well! -Also includes Art Chan's interactive web interface. It has buttons and graphics and all that other stuff that everyone wants! -Removed the Perl 5.003 Sys::Syslog patches. I don't want to encourage anyone to use an outdated version of Perl, especially since there have been plenty of bug fixes since then. -Server now handles multiple commands per client connection, and opstatus.cgi has been changed to take advantage of this. It's much faster now. Changes between mon-0.37e and mon-0.37f ---------------------------------------- Fri Oct 3 06:14:50 PDT 1997 -Fixed a small typo in "mon.d/freespace.monitor" that would correctly detect a failure condition for low disk space, but the text that it would report was incorrect. -As per Sean Robinson's suggestions, renamed the syslog patches to Perl 5.004 to accurately reflect what versions of Perl they patch. -In "mon.d/http.monitor", fixed problem with what matches as a valid HTTP response. "200 OK" is incorrect, because the text that follows the 200 is undefined in the specs.