Search Processing Modules#
Search modules are modules that operate on data in a pass-through mode, meaning that they perform some action (filter, modify, sort, etc.) and pass the entries down the pipeline. There can be many search modules and each operates in its own lightweight thread. This means that if there are 10 modules in a search, the pipeline will spread out and use 10 threads. Documentation for each module will indicate if the module causes distributed searches to collapse and/or sort. Modules that collapse force the distributed pipelines to collapse, meaning that the module as well as all downstream modules execute on the frontend. When starting a search it’s best to put as many parallel modules as possible upstream of the first collapsing module, decreasing pressure on the communication pipe and allowing for greater parallelism.
Some modules significantly transform or collapse data, such as
count. Pipeline modules following these collapsing modules may not be dealing with Raw data or previously created Enumerated Values. In short, things like
count by Src turn data into the collapsed results with entries such as
10.0.0.1 3. To illustrate, run the search
tag=* limit 10 | count by TAG | raw to see the raw output from the count module or
tag=* limit 10 | count by TAG | table TAG count DATA to observe the raw data has been condensed as seen by the table module.
Some flags appear in several different search modules and have the same meaning throughout:
-e <source name>specifies that the module should attempt to read its input data from the given enumerated value rather than from the entry’s data field. This is useful in for modules like json, where the JSON-encoded data may have been extracted from a larger data record, for example the following search will attempt to read JSON fields from the payloads of HTTP packets:
tag=pcap packet tcp.Payload | json -e Payload user.email
-vindicates that the normal pass/drop logic should be inverted. For example the grep module normally passes entries which match a given pattern and drop those which do not match; specifying the
-vflag will cause it to drop entries which match and pass those which do not.
-sindicates a “strict” mode. If a module normally allows an entry to proceed down the pipeline if any one of several conditions are met, setting the strict flag means an entry will proceed only if all conditions are met. For example, the require module will normally pass an entry if it contains any one of the required enumerated values, but when the
-sflag is used, it will only pass entries which contain all specified enumerated values.
-pindicates “permissive” mode. If a module normally drops entries when patterns and filters do not match, the permissive flag tells the module to let the module go through. The regex and grok modules are good examples where the permissive flag can be valuable.
Universal Enumerated Values#
The following enumerated values are available for every entry. They’re actually convenient names for properties of the raw entries themselves, but can be treated as enumerated value names.
SRC – the source of the entry data.
TAG – the tag attached to the entry.
TIMESTAMP – the timestamp of the entry.
DATA – the actual entry data.
NOW – the current time.
These can be used just like user-defined enumerated values, thus
table foo bar DATA NOW is valid. They do not need to be explicitly extracted anywhere; they are always available.
Search module documentation#
abs - calculate the absolute value of an enumerated value.
alias - create copies of enumerated values with new names.
anko - run arbitrary code in the pipeline.
anonymize - anonymize IP addresses.
awk - execute AWK code.
base64 - encodes or decodes base64 strings.
communityid - calculate Zeek community ID values.
diff - compare fields between entries.
dns - do DNS and reverse DNS lookups.
enrich - manually attach enumerated values to entries.
entropy - calculate entropy of enumerated values.
eval - evaluate arbitrary logic expressions.
filetype - detect filetypes of binary data.
first/last - take the first or last entry.
fuse - join data from disparate data sources.
geodist - compute distance between locations.
geoip - look up GeoIP locations.
grep - search for strings in entries.
hexlify - encode data into ASCII hex representation, or vice versa.
ip - convert & filter IP addresses.
ipexist - check if IP address exists in a lookup table.
iplookup - enrich entries by looking up IP addresses in a table which can contain CIDR subnets rather that individual IPs.
join - join two or more enumerated values into a single enumerated value.
langfind - classify the language of text.
length - compute the length of entries or enumerated values.
limit - limit the number of entries which will pass further down the pipeline.
location - convert individual lat/lon enumerated values into a single Gravwell Location enumerated value.
lookup - enrich entries by looking up keys in a table.
lower - convert text to lower-case.
maclookup - look up manufacturer, address, and country information based on a MAC address.
Math Modules (list) - perform math operations.
count - count entries.
max - find a maximum value.
mean - find a mean value.
min - find a minimum value.
stddev - calculate standard deviation.
sum - sum up enumerated values.
unique - eliminate duplicate entries.
variance - find variance of enumerated values.
nosort - disable sorting in the pipeline.
packetlayer - parse portions of a packet.
printf - format text in the pipeline.
regex - match and extract data using regular expressions.
require - drop any entries which lack a given enumerated value.
slice - low-level binary parsing & extraction.
sort - sort entries by a given key.
split - split a single entry into multiple entries.
src - filter based on the SRC field of entries.
stats - perform math operations.
strings - find strings from binary data.
subnet - extract & filter based on IP subnets.
taint - taint tracking.
time - convert strings to time enumerated values, and vice versa.
transaction - group multiple entries into single-entry “transactions” based on keys.
truncate - truncate entries or enumerated values to a specified number of characters.
unescape - convert escaped text into an unescaped representation.
upper - convert text to upper-case.
words - highly optimized search for individual words.