Words#
Words is a simple text matching engine that searches for a text word that is delimited by split characters. The words module is functionally equivalent to grep -s -w
and is designed to interact with the fulltext accelerator. Words supports UTF8 character encoding and will normally behave well with binary data, this means it is possible to look for the word “foo” in a pcap stream. However, it is important to understand how the words module breaks on word boundaries. If the word foo
is adjacent to the byte 0x44 in a binary data stream the words module will identify the word as Dfoo
and will not match the query term foo
. Words is a great first level filter when operating on unknown data.
The words module does not support wildcards, if you need word matching with wildcard support check out the grep module using the -w
flag.
The words
module allows multiple patterns to be specified and defaults to as strict mode, this means that every pattern must match for the entry to be passed down the pipeline. If you need an any
matching behavior the -or
flag specifies that if any word matches the entry will be passed down the pipeline. The words
module also supports an inverted logic so that you can search for entries that do not contain words.
Supported options#
-v
: “Inverse” match. For instance,words -v bar
would drop any records containing the word “bar” and pass on any records that do not contain the word “bar”, if there are multiple words specified all words must not exist in the entry.-e <arg>
: Operate on an enumerated value instead of on the entire record. For example, a pipeline that showed packets that contain HTTP text but aren’t destined for port 80 would betag=pcap packet ipv4.DstPort!=80 tcp.Payload | words -e Payload GET HTTP 1.1"
-or
: Any match. If any pattern matches pass the entry on, when combined with the negate flag drop any entry that has a missing word.
Parameter Structure#
words <argument list> <search parameter>
Example Search#
To find any Apache logs containing the words Mozilla
and Firefox
:
tag=apache words Mozilla Firefox
To find packets destined for port 80 whose payloads DO NOT contain the words HTTP or 1.1:
tag=pcap packet tcp.Port==80 tcp.Payload | words -e Payload -v HTTP 1.1
Match any Reddit post which contains words Gravwell
or gravwell
:
tag=reddit json Body | words -e Body -or Gravwell gravwell
Grab only user agents that contain Mozilla and Windows
tag=apache words Mozilla Windows
Working With Word Matches#
The word match system is designed to match complete words. Words is designed to create some additional specificity when selecting values, lets look at some example data to see what will and will not match.
16.246.30.72 - - [08/May/2017:15:20:35 -0600] "DELETE /search/tag/list HTTP/1.0" 200 5032 "http://nguyen.biz/category/tags/tag/home.htm" "Opera/8.74.(Windows 98; Win 9x 4.90; it-IT) Presto/2.9.173 Version/11.00"
Lets look at a few invocations of words
to see what would and would not match:
Words Invocation |
MATCHES |
Explanation |
---|---|---|
words Ver |
NO |
The words module will NOT match |
words 16.246.30.72 |
YES |
The words will match IPs, because the “.” character is not a boundary delimiter |
words 8.74 |
YES |
The words will match the “8.74” value even though the word “8.74” has a trailing “.” character. This is because the “.” character is considered a trim character and will be removed from matches. This is so that you can match natural language words when punctuation is used (like “.” and “,” and “;”). |
words Version |
YES |
The words module WILL match because Version is a full word, the |
words 11.00 |
YES |
The word will match, the |
words “Version/11.00” |
ERROR |
The words module will throw an error, you cannot have word boundary characters in a match |
words “Ver*” |
NO |
The words module will not match because the words module does not treat the “” character as a wildcard, it’s looking for the complete word “Ver” |