Words#

Words is a simple text matching engine that searches for a text word that is delimited by split characters. The words module is functionally equivalent to grep -s -w and is designed to interact with the fulltext accelerator. Words supports UTF8 character encoding and will normally behave well with binary data, this means it is possible to look for the word “foo” in a pcap stream. However, it is important to understand how the words module breaks on word boundaries. If the word foo is adjacent to the byte 0x44 in a binary data stream the words module will identify the word as Dfoo and will not match the query term foo. Words is a great first level filter when operating on unknown data.

The words module does not support wildcards, if you need word matching with wildcard support check out the grep module using the -w flag.

The words module allows multiple patterns to be specified and defaults to as strict mode, this means that every pattern must match for the entry to be passed down the pipeline. If you need an any matching behavior the -or flag specifies that if any word matches the entry will be passed down the pipeline. The words module also supports an inverted logic so that you can search for entries that do not contain words.

Supported options#

  • -v: “Inverse” match. For instance, words -v bar would drop any records containing the word “bar” and pass on any records that do not contain the word “bar”, if there are multiple words specified all words must not exist in the entry.

  • -e <arg>: Operate on an enumerated value instead of on the entire record. For example, a pipeline that showed packets that contain HTTP text but aren’t destined for port 80 would be tag=pcap packet ipv4.DstPort!=80 tcp.Payload | words -e Payload GET HTTP 1.1"

  • -or: Any match. If any pattern matches pass the entry on, when combined with the negate flag drop any entry that has a missing word.

Parameter Structure#

words <argument list> <search parameter>

Working With Word Matches#

The word match system is designed to match complete words, using the exact same word-breaking rules as the fulltext accelerator. A query word matches only when it appears in the data bounded on both sides by a split character (or the start/end of the data). Most punctuation is a split character, but a handful of characters — . (period), : (colon), - (hyphen), _ (underscore), and @ (at sign) — are treated as part of a word. This is what allows IP addresses, decimals, hostnames, and email addresses to be matched as single words. Leading and trailing ., :, ;, and - are trimmed from a word, so a sentence-ending period does not prevent a match.

The words module cannot match a substring of a word. Because . and : are not split characters, 192.168.1.100:8080 is a single word; a search for 192.168.1.100 will NOT match it. For substring matching, see the grep module. For a complete list of split and trim characters, see the fulltext word extraction documentation.

Words is designed to create some additional specificity when selecting values, lets look at some example data to see what will and will not match.

16.246.30.72 - - [08/May/2017:15:20:35 -0600] "DELETE /search/tag/list HTTP/1.0" 200 5032 "http://nguyen.biz/category/tags/tag/home.htm" "Opera/8.74.(Windows 98; Win 9x 4.90; it-IT) Presto/2.9.173 Version/11.00"

Lets look at a few invocations of words to see what would and would not match:

Words Invocation

MATCHES

Explanation

words Ver

NO

The words module will NOT match the Version/11.00 text because Ver is not a complete word; the words module cannot match a subset of a word

words 16.246.30.72

YES

The words module will match IPs, because the . character is not a split character

words 8.74

YES

The words module will match the 8.74 value in Opera/8.74.(Windows even though it is followed by a .. The / and ( are split characters and the trailing . is a trim character, so the word 8.74 is isolated.

words Version

YES

The words module WILL match because Version is a full word, the / character is a split character

words 11.00

YES

The word will match. The . character is not a split character, so 11.00 is a complete word bounded by the / before it and the " after it

words Windows

YES

Windows is a complete word; the ( before it and the space after it are split characters

words “8.74.(Windows”

ERROR

The words module will throw an error because the search term contains the split character (; you cannot have a split character inside a single match word

words “Version/11.00”

ERROR

The words module will throw an error, you cannot have a split character (here, /) in a match word

words “Ver*”

ERROR

The words module does not support wildcards, and * is itself a split character, so the search term is rejected.