sysmon overview


SYSMON (C) 2001 by Oliver Seidel, MATA (IHK) MA PhD (Cantab)
============================================================

The SYSMON system has originally been designed in the context of my
workplace: we needed a monitoring capability to supervise the
operation of various systems. The design goals were:

- robustness
- simplicity
- portability
- distribution

Since I was keen to have the system and other urgent tasks occupied
our time at work, I decided on doing the implementation work myself.
My implementation has been done completely from scratch and I hold all
the rights to it. Since the design has changed significantly during
that time, I believe that most of the design is my own work, too. I
am greatly indepted to my friend and colleague, Andreas Kirschbaum (of
WINE fame), who has done the better half of the original design work
and who has always been a source of insight.

With respect to adaptability and configuration, I have added the
following options:

- parser debugging
- 2x memory debugging (own tracing, Electric Fence)
- support for the old/new bzip library
- support for either libpq (the PostgreSQL support) or
shell scripts
- enable/disable the passing of data to shell scripts
via env vars

It is possible to simply comment the respective options in or out of
the "Makefile".

Now, let me summarise the design decisions and implementation details:

- all components seek the configuration file ".conf".

- the syntax of the configuration file is most concisely contained
in the file "config.y". Issuing the command "make" produces the
file "syntax.txt" which is a stripped and formatted version of
"config.y". The semantics are implicit and hopefully self-
explanatory after reading this description.

- all components accept the argument "-h" to display a help message.

- all components accept the argument "-c " for
configuration.

- all components except the receiver regularly "stat()"
(and if necessary re-read) their config file.

- all components write a logfile with a name of the form
".log" into their current working directory.
Upon failure to write to that file, an attempt is made to
write to "/tmp/_help_me_.log". As soon as
the configuration file has been read successfully, the
component uses the filename provided from there.

- all configuration files contain the directive
"log_level=;" which is interpreted as follows
(higher numbers always include the information of lower ones):

- n=0: error messages (loss of continuous I/O,
configuration parsing errors) are logged
- n=1: protocol, memory, data & fopen errors are
logged once daily
- n=2: normal events (start/shutdown, reconfiguration,
start/shutdown of connections) are logged
- n=3: messages describing configuration data are emitted
- n=4: protocol, memory, data & fopen errors are
logged once hourly
- n=5: error messages are logged unconditionally
- n=6: input/output meta-data is logged (length, names)
- n=7: lines, source_names, internal states and
actions are logged (alloc list, non-loop
execution points)
- n=8: all inputs and outputs are logged
- n=9: internal data is logged also (no, IBM don't
pay me comission on hdd sales!)

The default log level can be set in the Makefile. This may
cause or prevent some messages to be emitted before the first
successful interpretation of the configuration file, at which
point each particular sysmon component can be configured
individually.

Should you have a more elegant solution to the problem of
determining the log-level, or indeed to the problem of
determining a log-file name before it can be read from a
configuration file, please feel free to write to me.

- the data is processed along the following chain:

- n sensors : 1 watcher, all on one machine
- n watchers : 1 collector, all at one site
- 1 collector : 1 sender, both at the same site
- n senders : n receivers, each at differing sites
- 1 receiver : 1 database, both at the same site
- 1 database : 1 refinery, both at the same site
- 1 refinery : 1 server, both at the same site
- 1 server : n viewers, all at the same site

- sensors are invoked with a single integer argument s and
henceforth deliver one record every s seconds.

- records produced by the sensor are of the format:

{typetitleunitsvalue}+
- the 'type' may be one of { int, float, timestamp, text }.
- the 'title' may only contain alphanumerics.
- the 'units' may only contain alphabetics.
- the 'title' and 'units' may not be longer than 10 characters.
- the 'value' may not contain or .

- the watcher is a program which reads a configuration file
and simultaneously starts several sensors. If a sensor
dies, the watcher starts another copy. Sensor output is
syntax-checked, correct lines are prepended with the following
information and handed on:

<"text"><"machine">machine_name<"name">
<"text"><"sensor">sensor_name<"name">
<"timestamp"><"timestamp">timestamp<"epoch">

- the watcher establishes a tcp connection to its collector and
the two parties perform a challenge-response protocol for
authentication:

a) upon accpting a new connection, the collector
generates a random piece of data and sends it
to the new watcher.
b) the watcher constructs a buffer of the form
"".
c) the watcher computes a mac of this buffer and sends
it to the collector.
d) the collector constructs the same buffer, computes
a mac and compares the local and remote values.
e) upon success, the collector changes the state of
that particular source from "unauthenticated" to
"authenticated" and begins using its data.

- after the authentication procedure, the watcher maintains the
tcp connection to its collector and passes the constructed
lines of text. In case of collector failure, it buffers the
input in a cache and regularly tries to re-establish the
connection.

- it is intended to install no more than one watcher on a
particular machine and to install no more than one collector
on each site.

- the collector passes its input in complete lines to the sender.

- the sender performs a syntax-check on the data.

- the sender prepends the following to each line:

<"text"><"site">site_name<"name">

- the sender filters the data according to the 'machine' and
'site' fields and inserts it into multiple output buffers.
Each destination has its own output buffer.

- the sender accumulates data for each receiver until the
configured chunk size is reached or the time limit has been
exceeded (to avoid 'starvation': a crash has occurred, no
more data is flowing, but the data just before the crash
is vital). It then compresses, encrypts and base64-encodes
the data, and sends it off with an HTTP POST request.

- the encryption algorithm is hardened TEA, used in CBC mode.

- to avoid a known-plaintext attack at the beginning of the
stream, the first 8 bytes are stuffed with random data.

- the receiver reverses the process by decoding the
base64-data, skipping the random header, performing the
decryption and uncompressing the chunk.

- both, the sender and receiver perform record validation,
where a record is one line.

- the receiver reads the list of available database tables and
then processes each record in turn.

- when a record for an unknown table is encountered, the table
is created.

- when a record for a known table with different type is
encountered, the old table is renamed and a new table
is created, into which the record is subsequently inserted.

- the receiver not only inserts the records into their respective
tables, it also maintains a directory (relation 'tables') which
lists the available data sources, their types, and quite
importantly, the maximum of the timestamps inside each
particular table.

- all data insertion commands are bracketed by "begin;" and
"commit;"

- communication with the database backend is performed through
the postgres library or a set of shell scripts (tables.sh
nextval.sh update.sh). As mentioned above, a compile time
option in the "Makefile" controls which interface is used.
In case the shell scripts are used, the values of database
host/name/user/password are passed in environment variables.
I know that this is a security risk (try "ps axuewww" on
Linux) and thus I have added a further compile-time option
to disable this behaviour. External programs invoked via
"popen" have the same access permissions as the receiver
itself and thus may themselves read the receiver configuration
file. My examples don't, but then, I recommend the usage of
libpq anyway.

- this is where the source driven chain ends.

- the next module in the sysmon architecture is the refinery:
this program reads its configuration file, connects to the
database and offers a socket.

- only one connection is accepted on the socket at any one
time: this connection is to be used by the module "server".

- the protocol accepted by the refinery is a typical digest
authentication, the same as between the watcher and the
collector (please refer to the description above).

- after authentication, the refinery sends the list of
available outputs.

- the refinery has two types of answers: the list of
available outputs and a particular output. Both lists
are provided in the same format:

<"data"|"directory">n
data line 1
:
data line n
- for the list of available outputs, each data line has the
following format:
nameroleavailablemin_timestamp
typerecommended_delayheading
- the "name" is the label from the configuration file,
"available" is either the literal character 't' or the
literal character 'f'; the "min_timestamp" is the minimum
of the epoch values from all the tables that this output
depends on; the "type" is one of the following: "scalar",
"histogram", or "table"; the "recommended_delay" is the
recommended minimum number of seconds between accesses
to this output. The "heading" is a sequence of elements
<":">heading<"/">units.
- the availability of an output depends on the contents of
the database. The configuration file of the refinery
lists the dependencies for a particular output
- each of the three output types is constructed in the same
syntax: columns start with a and finish on the next
or . The first column is a description and the
following columns contain the data values.

- an output of type "scalar" contains exactly two columns:
the description and the data value. It also will not
contain more than one line. Naming this type of output
a "scalar" is a hint to the viewer further down the line,
so that a mismatch between the data and the viewer can be
avoided (a speed dial, or a thermometer style display
would be useless for table data).

- an output of type "histogram" will always contain two
columns: the first column will contain descriptions and
the second column will contain data values. A histogram
looks like a list of scalars.

- the final data type is that of a table. A table has at
least two columns: namely the first column with the
abscissa values and one or more data columns. The exact
number of columns can be learned from the data type. A
table will almost always have many rows.

- the protocol between the server and the refinery works as
follows:

a) the refinery sends a challenge
b) the server answers the challenge
c) the refinery either closes the stream or sends the
"directory"
d) the server sends requests whenever it wishes,
and receives either of the two output types as
an answer. The server thus MUST NOT expect to
receive an answer to its question.

- the refinery reads its configuration file on startup
and "stat()"s it regularly afterwards. Whenever the
config file is changed, the refinery re-reads it and
answers the next request with the "directory", instead
of sending an "output".

- the refinery expects requests in the following syntax:

namelinecount
- the refinery then computes the availability of each
output type and attempts to produce the desired output.
Should the name not exist in the directory, or should a
lookup in the relation "tables" show that not all
dependencies are satisfied, then the refinery returns
not "output", but "directory".
- The data lines for output of type "data" are the output
from the SQL command with columns prepended by and
rows terminated by . Before the command is executed,
an attempt is made to insert an integer using "sprintf"
into the command. This is used so that only the needed
amount of data is produced. It it thus recommended to
terminate all "select" statements as follows:

" where (... ,)*
and ts>(now()-'30 hours'::timespan)
order by ts desc (, ...)* limit %d;"

This ensures a deterministic ordering and an upper limit
on the rows.

- With this much input, the server is equipped to perform
its duty. The duty of the server consists in fetching
data from the refinery, caching it, and performing access
control for individual clients. The refinery marks each
output type with a specific role and the server manages
a list of users that are members of each role. Only if
the user connecting to the server is a member of the role
list, does the user see the output provided by the
refinery. The server also maintains a list of users and
their passwords.

- The protocol between the server and a viewer is the same
protocol as spoken between refinery and server. This time
the viewer initiates the connection, the server sends a
challenge and the viewer answers it. If the connection
persists (meaning: the server doesn't terminate the
conversation), then the viewer may send requests and get
either a directory or the requested data in return.

- One client to the server is called "viewer-1". This
utility reads its config file, connects to the server,
provides authentication credentials and then repeatedly
requests a particular output source. It strips that
data of the first line and the leading tab in each of
the successive lines and writes it to a file. It is thus
able to monitor several sources.