Regex#
Regex is a pipeline module that uses regular expressions to match text data. It is an extremely powerful way of matching complex patterns and extracting enumerable fields from text. For those unfamiliar with regular expressions, a decent starting point is the Wikipedia article; you can also use a regular expression playground to experiment.
Think of the regex
module in similar terms to the grep command in Linux: any entries which do not match the regular expression will be dropped (unless the -p
flag is set). Building regular expressions is well outside the scope of this document, but one extremely important feature is the (?P<foo>\S+)
style syntax, which will assign any matched group into an enumerated value; in this case, it will capture and extract a sequence of non-space characters into an enumerated value named “foo”.
For example, the following search will enumerate the method, user, and ip from an sshd Accepted log entry.
".*sshd.*Accepted (?P<method>\S*) for (?P<user>\S*) from (?P<ip>[0-9]+.[0-9]+.[0-9]+.[0-9]+)"
Because regular expressions can get very long, the regex module takes the -r
flag, which specifies a resource containing a regular expression. When populating the resource, do not include “wrapping quotes” around the whole expression as you would when typing directly into a search: e.g. ".*ssh.*Accepted"
becomes .*ssh.*Accepted
. This is because the quotes are normally stripped out by the search parser prior to being handed to the regex module.
Syntax#
An invocation of the regex
module consists of the module name, any arguments which are being used, the regular expression, and then optionally any filters to be applied.
regex <argument list> <regular expression> [filter arguments]
Supported Options#
-e <arg>
: The “-e” option operates on an enumerated value instead of on the entire record. For example, a pipeline that showed packets not headed for port 80 but that have HTTP text would betag=pcap packet ipv4.DstPort!=80 tcp.Payload | regex -e Payload ".*GET \/ HTTP\/1.1.*"
-r <arg>
: The “-r” option specifies that the regular expression statement is located in a resource file.-v
: The “-v” option tells regex to operate in inverse mode, dropping any entries which match the regex and passing entries which do not match.-p
: The “-p” option tells regex to allow entries through if the regular expression does not match at all. The permissive flag does not change the operation of filters.
Note
Storing especially large regular expressions in resource files can clean up queries, and allows for easy reuse. If -r
is specified, do not specify a regular expression in the query – instead the contents of the resource will be used. Handy!
Raw strings#
To facilitate using escape sequences in regular expressions, you can use backticks to prevent Gravwell from unescaping your input. For example:
tag=syslog grep sshd | regex `shd.*Accepted (?P<method>\S*) for (?P<user>\S*) from (?P<ip>[0-9]+.[0-9]+.[0-9]+.[0-9]+)`
If you used double quotes instead of backticks, \S
would have to be double-escaped as \\S.
This feature is convenient in that it allows you to copy regular expressions directly from regular expression playgrounds and other sources.
Inline Filtering#
The regex module supports inline filtering for down-selecting data directly within the regex module. The inline filtering also enables regex to engage accelerators to dramatically reduce the amount of data that needs to be processed. Inline filtering is achieved in the same manner as other modules by using comparison operators. If a filter is enabled that specifies equality (“equal”, “not equal”, “contains”, “not contains”) any entry that fails the filter specification will be dropped entirely. If a field is specified as not equal “!=” and the field does not exist, the field is not extracted but the entry won’t be dropped entirely.
Operator |
Name |
Description |
---|---|---|
== |
Equal |
Field must be equal |
!= |
Not equal |
Field must not be equal |
~ |
Subset |
Field contains the value |
!~ |
Not Subset |
Field does NOT contain the value |
Filtering Examples#
The following query extracts auth methods, usernames, and IP addresses from SSH logs and filters down to only those entries where the username is “root” and the IP is in the 192.168.0.0/16 subnet.
tag=syslog regex `shd.*Accepted (?P<method>\S*) for (?P<user>\S*) from (?P<ip>[0-9]+.[0-9]+.[0-9]+.[0-9]+)` user==root ip ~ "192.168"
Example Search#
The following query extracts the authentication method, username, and source IP address from SSH logs as enumerated values named method
, user
, and ip
, which are then displayed in a table.
tag=syslog grep sshd | regex `shd.*Accepted (?P<method>\S*) for (?P<user>\S*) from (?P<ip>[0-9]+.[0-9]+.[0-9]+.[0-9]+)`
| table method user ip
Full regular expression syntax#
The following is copied from the re2 documentation (see their license)
Regular expressions are a notation for describing sets of character strings. When a particular string is in the set described by a regular expression, we often say that the regular expression matches the string.
The simplest regular expression is a single literal character. Except
for the metacharacters like *+?()|
, characters match themselves. To
match a metacharacter, escape it with a backslash: \+
matches a
literal plus character.
Two regular expressions can be alternated or concatenated to form a new regular expression: if e1 matches s and e2 matches t, then e1|e2 matches s or t, and e1e2 matches st.
The metacharacters *
, +
, and ?
are repetition operators: e1*
matches a sequence of zero or more (possibly different) strings, each of
which match e1; e1+ matches one or more; e1? matches
zero or one.
The operator precedence, from weakest to strongest binding, is first
alternation, then concatenation, and finally the repetition operators.
Explicit parentheses can be used to force different meanings, just as in
arithmetic expressions. Some examples: ab|cd
is equivalent to
(ab)|(cd)
; ab*
is equivalent to a(b*)
.
The syntax described so far is most of the traditional Unix egrep regular expression syntax. This subset suffices to describe all regular languages: loosely speaking, a regular language is a set of strings that can be matched in a single pass through the text using only a fixed amount of memory. Newer regular expression facilities (notably Perl and those that have copied it) have added many new operators and escape sequences, which make the regular expressions more concise, and sometimes more cryptic, but usually not more powerful.
This page lists the regular expression syntax accepted by RE2. Note that this syntax is a subset of that accepted by PCRE, roughly speaking, and with various caveats.
It also lists some syntax accepted by PCRE, PERL, and VIM.
Kinds of single-character expressions |
examples |
---|---|
any character, possibly including newline (s=true) |
|
character class |
|
negated character class |
|
Perl character class (link) |
|
negated Perl character class |
|
ASCII character class (link) |
|
negated ASCII character class |
|
Unicode character class (one-letter name) |
|
Unicode character class |
|
negated Unicode character class (one-letter name) |
|
negated Unicode character class |
|
Composites |
|
---|---|
|
|
|
|
Repetitions |
|
---|---|
|
zero or more |
|
one or more |
|
zero or one |
|
|
|
|
|
exactly |
|
zero or more |
|
one or more |
|
zero or one |
|
|
|
|
|
exactly |
|
(≡ |
|
(≡ |
|
(≡ |
|
(≡ |
Implementation restriction: The counting forms x{n,m}
, x{n,}
, and
x{n}
reject forms that create a minimum or maximum repetition count
above 1000. Unlimited repetitions are not subject to this restriction.
Possessive repetitions |
|
---|---|
|
zero or more |
|
one or more |
|
zero or one |
|
|
|
|
|
exactly |
Grouping |
|
---|---|
|
numbered capturing group (submatch) |
|
named & numbered capturing group (submatch) |
|
named & numbered capturing group (submatch) |
|
named & numbered capturing group (submatch) [(NOT SUPPORTED)] |
|
non-capturing group |
|
set flags within current group; non-capturing |
|
set flags during re; non-capturing |
|
comment [(NOT SUPPORTED)] |
|
branch numbering reset [(NOT SUPPORTED)] |
|
possessive match of |
|
possessive match of |
|
non-capturing group [(NOT SUPPORTED)] [VIM] |
Flags |
|
---|---|
|
case-insensitive (default false) |
|
multi-line mode: |
|
let |
|
ungreedy: swap meaning of |
Flag syntax is xyz
(set) or -xyz
(clear) or xy-z
(set xy
, clear
z
).
Empty strings |
|
---|---|
|
at beginning of text or line ( |
|
at end of text (like |
|
at beginning of text |
|
at ASCII word boundary ( |
|
not at ASCII word boundary |
|
at beginning of subtext being searched [(NOT SUPPORTED)] [PCRE] |
|
at end of last match [(NOT SUPPORTED)] [PERL] |
|
at end of text, or before newline at end of text [(NOT SUPPORTED)] |
|
at end of text |
|
before text matching |
|
before text not matching |
|
after text matching |
|
after text not matching |
|
before text matching |
|
before text matching |
|
before text not matching |
|
after text matching |
|
after text not matching |
|
sets start of match (= \K) [(NOT SUPPORTED)] [VIM] |
|
sets end of match [(NOT SUPPORTED)] [VIM] |
|
beginning of file [(NOT SUPPORTED)] [VIM] |
|
end of file [(NOT SUPPORTED)] [VIM] |
|
on screen [(NOT SUPPORTED)] [VIM] |
|
cursor position [(NOT SUPPORTED)] [VIM] |
|
mark |
|
in line 23 [(NOT SUPPORTED)] [VIM] |
|
in column 23 [(NOT SUPPORTED)] [VIM] |
|
in virtual column 23 [(NOT SUPPORTED)] [VIM] |
Escape sequences |
|
---|---|
|
bell (≡ |
|
form feed (≡ |
|
horizontal tab (≡ |
|
newline (≡ |
|
carriage return (≡ |
|
vertical tab character (≡ |
|
literal |
|
octal character code (up to three digits) |
|
hex character code (exactly two digits) |
|
hex character code |
|
match a single byte even in UTF-8 mode |
|
literal text |
|
backreference [(NOT SUPPORTED)] |
|
backspace [(NOT SUPPORTED)] (use |
|
control char ^K [(NOT SUPPORTED)] (use |
|
escape [(NOT SUPPORTED)] (use |
|
backreference [(NOT SUPPORTED)] |
|
backreference [(NOT SUPPORTED)] |
|
backreference [(NOT SUPPORTED)] |
|
backreference [(NOT SUPPORTED)] |
|
named backreference [(NOT SUPPORTED)] |
|
subroutine call [(NOT SUPPORTED)] |
|
subroutine call [(NOT SUPPORTED)] |
|
named backreference [(NOT SUPPORTED)] |
|
named backreference [(NOT SUPPORTED)] |
|
lowercase |
|
uppercase |
|
lowercase text |
|
reset beginning of |
|
named Unicode character [(NOT SUPPORTED)] |
|
line break [(NOT SUPPORTED)] |
|
upper case text |
|
extended Unicode sequence [(NOT SUPPORTED)] |
|
decimal character 123 [(NOT SUPPORTED)] [VIM] |
|
hex character FF [(NOT SUPPORTED)] [VIM] |
|
octal character 123 [(NOT SUPPORTED)] [VIM] |
|
Unicode character 0x1234 [(NOT SUPPORTED)] [VIM] |
|
Unicode character 0x12345678 [(NOT SUPPORTED)] [VIM] |
Character class elements |
|
---|---|
|
single character |
|
character range (inclusive) |
|
Perl character class |
|
ASCII character class |
|
Unicode character class |
|
Unicode character class |
Named character classes as character class elements |
|
---|---|
|
digits (≡ |
|
not digits (≡ |
|
not digits (≡ |
|
not not digits (≡ |
|
named ASCII class inside character class (≡ |
|
named ASCII class inside negated character class (≡ |
|
named Unicode property inside character class (≡ |
|
named Unicode property inside negated character class (≡ |
Perl character classes (all ASCII-only) |
|
---|---|
|
digits (≡ |
|
not digits (≡ |
|
whitespace (≡ |
|
not whitespace (≡ |
|
word characters (≡ |
|
not word characters (≡ |
|
horizontal space [(NOT SUPPORTED)] |
|
not horizontal space [(NOT SUPPORTED)] |
|
vertical space [(NOT SUPPORTED)] |
|
not vertical space [(NOT SUPPORTED)] |
ASCII character classes |
|
---|---|
|
alphanumeric (≡ |
|
alphabetic (≡ |
|
ASCII (≡ |
|
blank (≡ |
|
control (≡ |
|
digits (≡ |
|
graphical (≡ |
|
lower case (≡ |
|
printable (≡ |
|
punctuation (≡ |
|
whitespace (≡ |
|
upper case (≡ |
|
word characters (≡ |
|
hex digit (≡ |
Unicode character class names–general category |
|
---|---|
|
other |
|
control |
|
format |
|
unassigned code points [(NOT SUPPORTED)] |
|
private use |
|
surrogate |
|
letter |
|
cased letter [(NOT SUPPORTED)] |
|
cased letter [(NOT SUPPORTED)] |
|
lowercase letter |
|
modifier letter |
|
other letter |
|
titlecase letter |
|
uppercase letter |
|
mark |
|
spacing mark |
|
enclosing mark |
|
non-spacing mark |
|
number |
|
decimal number |
|
letter number |
|
other number |
|
punctuation |
|
connector punctuation |
|
dash punctuation |
|
close punctuation |
|
final punctuation |
|
initial punctuation |
|
other punctuation |
|
open punctuation |
|
symbol |
|
currency symbol |
|
modifier symbol |
|
math symbol |
|
other symbol |
|
separator |
|
line separator |
|
paragraph separator |
|
space separator |
Unicode character class names–scripts |
---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vim character classes |
|
---|---|
|
identifier character [(NOT SUPPORTED)] [VIM] |
|
|
|
keyword character [(NOT SUPPORTED)] [VIM] |
|
|
|
file name character [(NOT SUPPORTED)] [VIM] |
|
|
|
printable character [(NOT SUPPORTED)] [VIM] |
|
|
|
whitespace character (≡ |
|
non-white space character (≡ |
|
digits (≡ |
|
not |
|
hex digits (≡ |
|
not |
|
octal digits (≡ |
|
not |
|
word character [VIM] |
|
not |
|
head of word character [(NOT SUPPORTED)] [VIM] |
|
not |
|
alphabetic [(NOT SUPPORTED)] [VIM] |
|
not |
|
lowercase [(NOT SUPPORTED)] [VIM] |
|
not lowercase [(NOT SUPPORTED)] [VIM] |
|
uppercase [(NOT SUPPORTED)] [VIM] |
|
not uppercase [(NOT SUPPORTED)] [VIM] |
|
|
|
ignore case [(NOT SUPPORTED)] [VIM] |
|
match case [(NOT SUPPORTED)] [VIM] |
|
magic [(NOT SUPPORTED)] [VIM] |
|
nomagic [(NOT SUPPORTED)] [VIM] |
|
verymagic [(NOT SUPPORTED)] [VIM] |
|
verynomagic [(NOT SUPPORTED)] [VIM] |
|
ignore differences in Unicode combining characters [(NOT SUPPORTED)] [VIM] |
Magic |
|
---|---|
|
arbitrary Perl code [(NOT SUPPORTED)] [PERL] |
|
postponed arbitrary Perl code [(NOT SUPPORTED)] [PERL] |
|
recursive call to regexp capturing group |
|
recursive call to relative group |
|
recursive call to relative group |
|
PCRE callout [(NOT SUPPORTED)] [PCRE] |
|
recursive call to entire regexp (≡ |
|
recursive call to named group [(NOT SUPPORTED)] |
|
named backreference [(NOT SUPPORTED)] |
|
recursive call to named group [(NOT SUPPORTED)] |
|
conditional branch [(NOT SUPPORTED)] |
|
conditional branch [(NOT SUPPORTED)] |
|
make regexps more like Prolog [(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
set newline convention [(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
[(NOT SUPPORTED)] |
|
set \R convention [(NOT SUPPORTED)] [PCRE] |
|
[(NOT SUPPORTED)] [PCRE] |