Transaction#

Note

The transaction module can consume a large amount of memory. Use caution when using this module on memory constrained systems.

The transaction module transforms and groups entries in the pipeline into single-entry “transactions” - groupings of entries - based on any number of keys. It is a powerful tool for capturing the activity of a given user, IP, etc., across multiple entries in a datastream.

Supported Options#

-e: The -e option operates on an enumerated value instead of on the entire record. Multiple EVs are supported by providing additional -e flags.
-rsep: The -rsep option sets the string to insert between transaction records. The default is “\n”.
-fsep: The -fsep option sets the string to insert between enumerated values within a given record. The default is “ “.
-o: The -o option sets the output EV to produce. The default is “transaction”.
-c: The -c option enables a count of the number of entries that make up a given transaction in the provided name. The default is “count”.
-maxsize: The -maxsize flag sets the maximum size, in kilobytes, of a given transaction before it is evicted from the tracking table (see “Memory considerations” below). The default is 500kb.
-maxstate: The -maxstate flag sets the maximum number of transactions to track. Once exceeded, the oldest transaction will be evicted (see “Memory considerations” below). The default is 200.
-maxcount: The -maxcount flag sets the maximum number of entries allowed per transaction. By default this value is unlimited. If set, once an individual transaction reaches this value, the transaction will be evicted.

All flags are optional.

Overview#

The transaction module groups entries into single entries based on a provided set of keys. For example, given a dataset with enumerated values “host”, “message”, and “action”, the query:

tag=data kv host action message | transaction -fsep " -- " host | table

Will collapse all entries with the same value for the EV “host” into a single entry. By default, transaction will group all EVs that are not part of the key into the output. In the example above, the EVs “host” and “message” will be grouped, using -fsep as a separator, and all entries that match this key will be further grouped by -rsep. To illustrate the example above, given the following entries:

Entry 1: host="foo" message="Host foo login" action="login"
Entry 2: host="foo" message="Host foo delete file X" action="delete"
Entry 3: host="bar" message="Host bar login" action="login"
Entry 4: host="foo" message="Host foo logout" action="logout"

Will be collapsed into two entries, one for “foo”, and another for “bar”:

Entry 1: transaction="login -- Host foo login
                      delete -- Host foo delete file X
                      logout -- Host foo logout"
Entry 2: transaction="login -- Host bar login"

To specify exactly which EVs to group, you can use one or more -e flags in the query. EVs will be grouped in the order provided. For example:

tag=data kv host action message user group | transaction -e action -e message host | table

Will only group EVs “action” and “message”, ignoring “user” and “group”.

Multiple keys can be provided, and records will be created based on the grouping of all provided keys. For example:

tag=data kv host action message user group | transaction host action user | table

Will group records with the same host, action, and user.

Memory considerations#

The transaction module must buffer all entries in the datastream in order to create transactions. For queries that produce large amounts of data, this can quickly exhaust the available memory on a system. In order to prevent this, the transaction module provides two flags, -maxsize, and -maxstate, to control how much and how long to retain data before passing it downstream in the pipeline.

When running, the transaction module keeps a table of records, with one record for every unique set of provided keys. When an entry matches the provided keys, it is added to other entries with the same match in the table (or creates a new record if it’s the first one encountered). Two checks are asserted every time an entry is added to the table:

If the size of a given record exceeds the -maxsize argument, the record is immediately “evicted” - meaning it is sent down the query pipeline and is removed from the table.
If the number of records exceeds the -maxstate argument, the least recently updated record is evicted.

If a record is evicted, and later an entry with a key matching that of the evicted record is encountered, a new record is created. If you notice “fragmentation” in your output, check the -maxsize and -maxstate flags.

Because the transaction module can easily exhaust all available memory on your Gravwell system, follow these general guidelines when writing queries with transaction:

Put the transaction module as late in the query as possible.
Work on the smallest time window possible for your query.
Start with small -maxsize and -maxstate values, and increase only if needed.
Instead of grouping all enumerated values, only group those of concern for your query by explicitly naming them with -e.