sysmon overview

SYSMON (C) 2001 by Oliver Seidel, MATA (IHK) MA PhD (Cantab)
============================================================

The SYSMON system has originally been designed in the context of my
workplace: we needed a monitoring capability to supervise the
operation of various systems. The design goals were:

	- robustness
	- simplicity
	- portability
	- distribution

Since I was keen to have the system and other urgent tasks occupied
our time at work, I decided on doing the implementation work myself.
My implementation has been done completely from scratch and I hold all
the rights to it. Since the design has changed significantly during
that time, I believe that most of the design is my own work, too. I
am greatly indepted to my friend and colleague, Andreas Kirschbaum (of
WINE fame), who has done the better half of the original design work
and who has always been a source of insight.

With respect to adaptability and configuration, I have added the
following options:

	- parser debugging
	- 2x memory debugging (own tracing, Electric Fence)
	- support for the old/new bzip library
	- support for either libpq (the PostgreSQL support) or
	shell scripts
	- enable/disable the passing of data to shell scripts
	via env vars

It is possible to simply comment the respective options in or out of
the "Makefile".

Now, let me summarise the design decisions and implementation details:

- all components seek the configuration file ".conf".

- the syntax of the configuration file is most concisely contained
 in the file "config.y". Issuing the command "make" produces the
 file "syntax.txt" which is a stripped and formatted version of
 "config.y". The semantics are implicit and hopefully self-
 explanatory after reading this description.

- all components accept the argument "-h" to display a help message.

- all components accept the argument "-c " for
 configuration.

- all components except the receiver regularly "stat()"
 (and if necessary re-read) their config file.

- all components write a logfile with a name of the form
 ".log" into their current working directory.
 Upon failure to write to that file, an attempt is made to
 write to "/tmp/_help_me_.log". As soon as
 the configuration file has been read successfully, the
 component uses the filename provided from there.

- all configuration files contain the directive
 "log_level=;" which is interpreted as follows
 (higher numbers always include the information of lower ones):

	- n=0: error messages (loss of continuous I/O,
 configuration parsing errors) are logged
	- n=1: protocol, memory, data & fopen errors are
 logged once daily
	- n=2: normal events (start/shutdown, reconfiguration,
	start/shutdown of connections) are logged
	- n=3: messages describing configuration data are emitted
	- n=4: protocol, memory, data & fopen errors are
 logged once hourly
	- n=5: error messages are logged unconditionally
	- n=6: input/output meta-data is logged (length, names)
	- n=7: lines, source_names, internal states and
 actions are logged (alloc list, non-loop
	execution points)
	- n=8: all inputs and outputs are logged
	- n=9: internal data is logged also (no, IBM don't
	pay me comission on hdd sales!)

The default log level can be set in the Makefile. This may
cause or prevent some messages to be emitted before the first
successful interpretation of the configuration file, at which
point each particular sysmon component can be configured
individually.

Should you have a more elegant solution to the problem of
determining the log-level, or indeed to the problem of
determining a log-file name before it can be read from a
configuration file, please feel free to write to me.

- the data is processed along the following chain:

	- n sensors : 1 watcher, all on one machine
	- n watchers : 1 collector, all at one site
	- 1 collector : 1 sender, both at the same site
	- n senders : n receivers, each at differing sites
	- 1 receiver : 1 database, both at the same site
	- 1 database : 1 refinery, both at the same site
	- 1 refinery : 1 server, both at the same site
	- 1 server : n viewers, all at the same site

- sensors are invoked with a single integer argument s and
 henceforth deliver one record every s seconds.

- records produced by the sensor are of the format:

	{typetitleunitsvalue}+
- the 'type' may be one of { int, float, timestamp, text }.
- the 'title' may only contain alphanumerics.
- the 'units' may only contain alphabetics.
- the 'title' and 'units' may not be longer than 10 characters.
- the 'value' may not contain  or .

- the watcher is a program which reads a configuration file
 and simultaneously starts several sensors. If a sensor
 dies, the watcher starts another copy. Sensor output is
 syntax-checked, correct lines are prepended with the following
 information and handed on:

	<"text"><"machine">machine_name<"name">
	<"text"><"sensor">sensor_name<"name">
	<"timestamp"><"timestamp">timestamp<"epoch">

- the watcher establishes a tcp connection to its collector and
 the two parties perform a challenge-response protocol for
 authentication:

	a) upon accpting a new connection, the collector
	generates a random piece of data and sends it
	to the new watcher.
	b) the watcher constructs a buffer of the form
	"".
	c) the watcher computes a mac of this buffer and sends
	it to the collector.
	d) the collector constructs the same buffer, computes
	a mac and compares the local and remote values.
	e) upon success, the collector changes the state of
	that particular source from "unauthenticated" to
	"authenticated" and begins using its data.

- after the authentication procedure, the watcher maintains the
 tcp connection to its collector and passes the constructed
 lines of text. In case of collector failure, it buffers the
 input in a cache and regularly tries to re-establish the
 connection.

- it is intended to install no more than one watcher on a
 particular machine and to install no more than one collector
 on each site.

- the collector passes its input in complete lines to the sender.

- the sender performs a syntax-check on the data.

- the sender prepends the following to each line:

	<"text"><"site">site_name<"name">

- the sender filters the data according to the 'machine' and
 'site' fields and inserts it into multiple output buffers.
 Each destination has its own output buffer.

- the sender accumulates data for each receiver until the
 configured chunk size is reached or the time limit has been
 exceeded (to avoid 'starvation': a crash has occurred, no
 more data is flowing, but the data just before the crash
 is vital). It then compresses, encrypts and base64-encodes
 the data, and sends it off with an HTTP POST request.

- the encryption algorithm is hardened TEA, used in CBC mode.

- to avoid a known-plaintext attack at the beginning of the
 stream, the first 8 bytes are stuffed with random data.

- the receiver reverses the process by decoding the
 base64-data, skipping the random header, performing the
 decryption and uncompressing the chunk.

- both, the sender and receiver perform record validation,
 where a record is one line.

- the receiver reads the list of available database tables and
 then processes each record in turn.

- when a record for an unknown table is encountered, the table
 is created.

- when a record for a known table with different type is
 encountered, the old table is renamed and a new table
 is created, into which the record is subsequently inserted.

- the receiver not only inserts the records into their respective
 tables, it also maintains a directory (relation 'tables') which
 lists the available data sources, their types, and quite
 importantly, the maximum of the timestamps inside each
 particular table.

- all data insertion commands are bracketed by "begin;" and
 "commit;"

- communication with the database backend is performed through
 the postgres library or a set of shell scripts (tables.sh
 nextval.sh update.sh). As mentioned above, a compile time
 option in the "Makefile" controls which interface is used.
 In case the shell scripts are used, the values of database
 host/name/user/password are passed in environment variables.
 I know that this is a security risk (try "ps axuewww" on
 Linux) and thus I have added a further compile-time option
 to disable this behaviour. External programs invoked via
 "popen" have the same access permissions as the receiver
 itself and thus may themselves read the receiver configuration
 file. My examples don't, but then, I recommend the usage of
 libpq anyway.

- this is where the source driven chain ends.

- the next module in the sysmon architecture is the refinery:
 this program reads its configuration file, connects to the
 database and offers a socket.

- only one connection is accepted on the socket at any one
 time: this connection is to be used by the module "server".

- the protocol accepted by the refinery is a typical digest
 authentication, the same as between the watcher and the
 collector (please refer to the description above).

- after authentication, the refinery sends the list of
 available outputs.

- the refinery has two types of answers: the list of
 available outputs and a particular output. Both lists
 are provided in the same format:

	<"data"|"directory">n
data line 1
:
data line n
- for the list of available outputs, each data line has the
following format:
nameroleavailablemin_timestamp
	typerecommended_delayheading
- the "name" is the label from the configuration file,
"available" is either the literal character 't' or the
literal character 'f'; the "min_timestamp" is the minimum
of the epoch values from all the tables that this output
depends on; the "type" is one of the following: "scalar",
"histogram", or "table"; the "recommended_delay" is the
recommended minimum number of seconds between accesses
to this output. The "heading" is a sequence of elements
<":">heading<"/">units.
- the availability of an output depends on the contents of
the database. The configuration file of the refinery
lists the dependencies for a particular output
- each of the three output types is constructed in the same
syntax: columns start with a  and finish on the next
  or . The first column is a description and the
 following columns contain the data values.

- an output of type "scalar" contains exactly two columns:
 the description and the data value. It also will not
 contain more than one line. Naming this type of output
 a "scalar" is a hint to the viewer further down the line,
 so that a mismatch between the data and the viewer can be
 avoided (a speed dial, or a thermometer style display
 would be useless for table data).

- an output of type "histogram" will always contain two
 columns: the first column will contain descriptions and
 the second column will contain data values. A histogram
 looks like a list of scalars.

- the final data type is that of a table. A table has at
 least two columns: namely the first column with the
 abscissa values and one or more data columns. The exact
 number of columns can be learned from the data type. A
 table will almost always have many rows.

- the protocol between the server and the refinery works as
 follows:

	a) the refinery sends a challenge
	b) the server answers the challenge
	c) the refinery either closes the stream or sends the
	"directory"
	d) the server sends requests whenever it wishes,
	and receives either of the two output types as
	an answer. The server thus MUST NOT expect to
	receive an answer to its question.

- the refinery reads its configuration file on startup
 and "stat()"s it regularly afterwards. Whenever the
 config file is changed, the refinery re-reads it and
 answers the next request with the "directory", instead
 of sending an "output".

- the refinery expects requests in the following syntax:

	namelinecount
- the refinery then computes the availability of each
output type and attempts to produce the desired output.
Should the name not exist in the directory, or should a
lookup in the relation "tables" show that not all
dependencies are satisfied, then the refinery returns
not "output", but "directory".
- The data lines for output of type "data" are the output
from the SQL command with columns prepended by  and
 rows terminated by . Before the command is executed,
 an attempt is made to insert an integer using "sprintf"
 into the command. This is used so that only the needed
 amount of data is produced. It it thus recommended to
 terminate all "select" statements as follows:

	" where (... ,)*
	and ts>(now()-'30 hours'::timespan)
	order by ts desc (, ...)* limit %d;"

 This ensures a deterministic ordering and an upper limit
 on the rows.

- With this much input, the server is equipped to perform
 its duty. The duty of the server consists in fetching
 data from the refinery, caching it, and performing access
 control for individual clients. The refinery marks each
 output type with a specific role and the server manages
 a list of users that are members of each role. Only if
 the user connecting to the server is a member of the role
 list, does the user see the output provided by the
 refinery. The server also maintains a list of users and
 their passwords.

- The protocol between the server and a viewer is the same
 protocol as spoken between refinery and server. This time
 the viewer initiates the connection, the server sends a
 challenge and the viewer answers it. If the connection
 persists (meaning: the server doesn't terminate the
 conversation), then the viewer may send requests and get
 either a directory or the requested data in return.

- One client to the server is called "viewer-1". This
 utility reads its config file, connects to the server,
 provides authentication credentials and then repeatedly
 requests a particular output source. It strips that
 data of the first line and the leading tab in each of
 the successive lines and writes it to a file. It is thus
 able to monitor several sources.