JSON Field Filtering Preprocessor#
This preprocessor will parse entry data as a JSON object, then extract specified fields and compare them against lists of acceptable values. The lists of acceptable values are specified in files on the disk, one value per line.
It can be configured to either pass only those entries whose fields match the lists, or to drop those entries which match the lists–whitelisting, or blacklisting. It can be set up to filter against multiple fields, requiring either that all fields must match (logical AND) or that at least one field must match (logical OR).
This preprocessor is particularly useful to narrow down a firehose of general data before sending it across a slow network link.
The JSON Field Filtering preprocessor Type is jsonfilter
.
Supported Options#
Field-Filter
(string, required): This specifies two things: the name of the JSON field of interest, and the path to a file which contains values to match against. For example, one might specifyField-Filter=ComputerName,/opt/gravwell/etc/computernames.txt
in order to extract a field named “ComputerName” and compare it against values in/opt/gravwell/etc/computernames.txt
. TheField-Filter
option may be specified multiple times in order to filter against multiple fields.Match-Logic
(string, optional): This parameter specifies the logic operation to use when filtering against multiple fields. If set to “and”, an entry is only considered a match when all specified fields match against the given lists. If set to “or”, an entry is considered a match when any field matches.Match-Action
(string, optional): This specifies the option which should be take for entries whose fields match the provided lists. It may be set to “pass” or “drop”; if omitted, the default is “pass”. If set to “pass”, entries which match will be allowed to pass to the indexer (whitelisting). If set to “drop”, entries which match will be dropped (blacklisting).
The Match-Logic
parameter is only necessary when more than one Field-Filter
has been specified.
Note
If a field is specified in the configuration but is not present on an entry, the preprocessor will treat the entry as if the field existed but did not match anything. Thus, if you have configured the preprocessor to only pass those entries whose fields match your whitelist, an entry which lacks one of the fields will be dropped.
Common Use Cases#
The json field filtering preprocessor can down select entries based on fields within the entries. This can be used to build blacklists and whitelists on data flows to ensure that specific data either does or does not make it to storage.
Example: Simple Whitelisting#
Suppose we have an endpoint monitoring solution which is sending thousands of events per second detailing things which are occurring across the enterprise. Due to the high event volume, we may decide we only want to index events with a certain severity. Luckily, the events include a Severity field:
{ "EventID": 1337, "Severity": 8, "System": "email-server-01.example.org", [...] }
We know the Severity field goes from 0 to 9, and we decide we want to only pass events with a severity of 6 or higher. We would therefore add the following to our ingester configuration file:
[preprocessor "severity"]
Type=jsonfilter
Match-Action=pass
Field-Filter=Severity,/opt/gravwell/etc/severity-list.txt
and set Preprocessor=severity
on the appropriate data input, for instance if we were using Simple Relay:
[Listener "endpoint_monitoring"]
Bind-String="0.0.0.0:7700
Tag-Name=endpoint
Preprocessor=severity
Finally, we create /opt/gravwell/etc/severity-list.txt
and populate it with a list of acceptable Severity values, one per line:
6
7
8
9
After restarting the ingester, it will extract the Severity
field from each entry and compare the resulting value against those listed in the file. If the value matches a line in the file, the entry will be sent to the indexer. Otherwise, it will be dropped.
Example: Blacklisting#
Building on the previous example, we may find that that our endpoint monitoring system is generating a lot of high-severity false positives from certain systems. We may determine that events with the EventID
field set to 219, 220, or 1338 and the System
field set to “webserver-prod.example.org” and “webserver-dev.example.org” are always false positives. We can define another preprocessor to get rid of these entries before they are sent to the indexer:
[preprocessor "falsepositives"]
Type=jsonfilter
Match-Action=drop
Match-Logic=and
Field-Filter=EventID,/opt/gravwell/etc/eventID-blacklist.txt
Field-Filter=System,/opt/gravwell/etc/system-blacklist.txt
If we now add this preprocessor to the data input configuration after the existing one, the ingester will apply the two filters in order:
[Listener "endpoint_monitoring"]
Bind-String="0.0.0.0:7700
Tag-Name=endpoint
Preprocessor=severity
Preprocessor=falsepositives
Last, we create /opt/gravwell/etc/eventID-blacklist.txt
:
219
220
1338
and /opt/gravwell/etc/system-blacklist.txt
:
webserver-prod.example.org
webserver-dev.example.org
This new preprocessor extracts the EventID
and System
fields from every entry which makes it past the first filter. It then compares them against the values in the files. Because we set Match-Logic=and
, it considers an entry a match if both field values are found in the files. Because we set Match-Action=drop
, any entry which matches on both fields will be dropped. Thus, an entry with EventID=220 and System=webserver-dev.example.org is dropped, while one with EventID=220 and System=email-server-01.example.org will not be dropped.