SysManager

A system for monitoring many networked machines from a single interface.

Authors: Jason Carlyle and Phil White

Introduction

The SysManager system provides the ability to monitor several aspects of networked machines. The user interface can provide at-a-glance information to the user, and warns the user when conditions on any of the machines change such that the machine requires attention.

The SysManager system is composed of three main parts: Sysmanagerd, Collectord, and the SysManager interface.

Sysmanagerd

Sysmanagerd is the daemon process that runs on each computer you wish to collect information from. Sysmanagerd gathers the following system information from each system:

General Requirements to run Sysmanagerd:

Machines running Sysmanagerd should be on the same LAN as the machine running Collectord. SysManager will function if this is not the case, but Collectord may experience unexpected latencies when polling machines.

Linux Requirements:

The linux version of Sysmanagerd requires /proc filesystem support in order to gather the load average.

Solaris Requirements:

The Solaris version of Sysmanagerd requires uptime to be installed in /usr/bin in order to gather the load average properly.

Installation of Sysmanagerd:

From Source:

Upon untarring the SysManager package, cd to the sysmanager1.0/sysmanagerd directory. Depending on your operating system, type one of the following: The binary will be named sysmanagerd, and it will be placed in the sysmanager1.0/sysmanagerd directory.

Instructions on Running Sysmanagerd:

Sysmanagerd runs as a daemon process. Upon starting Sysmanagerd, the DNS hostname or IP number of the machine running Collectord must be specified. This is done with the '-s' option.

Here is an example of running Sysmanagerd using the machine "server.sysmanager.org" as the server:

sysmanagerd -s server.sysmanager.org

If the administrator so desires, Sysmanagerd can be started from within the init scripts of each client.

Collectord

Collectord is the daemon process that runs on the server you wish to have collect information. Collectord does not collect information about the machine it is running on, unless Sysmanagerd is also running on the machine.

Installation of Collectord:

From Source:

Upon untarring the SysManager package, cd to the sysmanager1.0/collectord directory. Depending upon the OS you are running type one of the following:

The binary will be named collectord and it will be placed in the sysmanager1.0/sysmanagerd directory. It is up to the administrator to decide where to place collectord, but /usr/sbin is a good place for it.

Instructions on running Collectord:

Collectord runs as daemon process. To start Collectord, type:

collectord

This command can also be added to the init scripts of the server if desired.

SysManager User Interface

Requirements to run the SysManager User Interface:

The user interface is a Tcl/Tk application, so it will run on any machine with the Tcl/Tk interpreter 'wish' installed. The 'xterm' program should also be installed in order to open terminals to client machines.

Installation of the Sysmanager User Interface:

Upon untarring the SysManager package, cd to the sysmanager1.0/sysmanger directory. Then, execute the following command:

make

This attempts to locate wish4.1, wish4.2, or wishx on your system and create the appropriate sysmanager executable. If this command fails, please make sure that you have installed wish4.1, wish4.2, or wishx on your system and that it is in your path.

Instructions on running the Sysmanager User Interface:

The Sysmanager user interface can be started from within the 'sysmanager' directory by typing the command:
sysmanager server [-p port]

'server' should be replaced with the hostname or IP number of the machine running Collectord, and the port can be optionally specified.

Upon starting up, SysManager will attempt to contact the server, and a message will be printed indicating whether or not the attempt was succesful. If the attempt was not successful, SysManager will continue to attempt a connection every polling interval (which defaults to five seconds). If the connection was made successfully, SysManager will receive information about all client machines, display the machines in the machine display, and log the new machines to the message display.

Using the SysManager User Interface:

The user interface consists of three parts: the machine display, the information display, and the message display.

The Machine Display



The machine display contains icons representing the machines that are currently connected to the server daemon. Clicking on an icon in the machine display will cause that machine's information to be displayed in the information display. Double clicking on an icon will open a terminal to the respective machine. The icons themselves can be moved in the display by simply dragging them.

As new machines become known to Collectord, they will appear in the machine display, as will they disappear if Collectord loses contact with them.

Auto-cycling The Machines

The machine display will automatically cycle between machines by default. Selecting a machine will deactivate this feature. Selecting "Cycle Machines" from the "Settings" menu will toggle it.

The Information Display



The information display shows statistics about the currently selected machine. The following things are displayed in the information display:

The Message Display



The message display prints information about the client machines as configured in the preferences. The message display will warn about the following events on any machine: Messages regarding the clearing of the display, terminals being opened, and lack of communication between the interface and Collectord are also displayed.

The Clear Messages button, which appears below the message window, will clear the messages currently in the display. This action will also be logged in the display.

The Show All Warnings button will cause SysManager to examine the data it is currently holding about all the machines, and display any applicable warnings. This is useful if the messages in the window are old, and the user would like to see if they still pertain to current conditions.

The Preferences Panel



The preferences panel allows you to change many variables the SysManager system uses to operate.

The preferences panel can be activated by choosing "Preferences..." under the "Settings" menu. Following are the variables that can be set, and a description of how the variable affects the behavior of the system. The settings change as the bars are moved, the 'OK' button merely closes the window.

Polling Interval

The polling interval is the amount of time Collectord waits between polls to each machine. For example, if there is one client, and the polling interval is set to 10 seconds, that machine will be polled every 10 seconds. If there are two clients, machine A and machine B, and the polling interval is 10 seconds, machine A will be polled every 10 seconds, and machine B will also be polled every 10 seconds.

Collectord limits the polling interval to the number of machines, so setting a polling interval less than the number of machines will have the same effect as setting it equal to the number of machines.

Display Cycling Interval

The display cycling interval is the amount of time the machine display will wait before cycling the display to the next machine. (This is only applicable if "Cycle Machines" is turned on in the "Settings" menu.)

Low Disk Space Warning

This setting specifies the percentage of free disk space remaining at which the message display will print a warning.

Rapid Disk Space Threshold

This setting specifies the percent change in free disk space during one polling interval required to cause the message display to print a warning.

Low Swap Space Warning

This setting specifies the percentage of free swap space remaining at which the message display should print a warning.

Rapid Swap Space Threshold

This setting specifies the percent change in free swap space during one polling interval required to cause the message display to print a warning.

High Load Average Warning

This setting specifies the load average at which a warning will be printed to the display. This setting must be equal to or lower than the Critial load average warning (the former of which will cause only the critical message to be displayed). An attempt to set this setting above the critical setting will cause the critical setting to be changed as well.

Critical Load Average Warning

This setting specifies the load average at which a critical warning will be printed to the message display. This setting must be equal or greater than the high load average setting (in the former case only this message is printed). An attempt to set this below the High setting will cause the High setting to be changed as well.

High Process Quantity Threshold

This setting specifies the number of processes at which a warning will be printed.