This section contains more detailed instruction for configuring and running Gravwell ingesters, which gather incoming data, package it into Gravwell entries, and ship it to Gravwell indexers for storage. The ingesters described in these pages are primarily designed to capture live data as it is generated; if you have existing data you want to import, check out the migration documents.
The Gravwell-created ingesters are released under the BSD open source license and can be found on Github. The ingest API is also open source, so you can create your own ingesters for unique data sources, performing additional normalization or pre-processing, or any other manner of things. The ingest API code is located here.
In general, for an ingester to send data to Gravwell, the ingester will need to know the “Ingest Secret” of the Gravwell instance, for authentication. This can be found by viewing the
/opt/gravwell/etc/gravwell.conf file on the Gravwell server and finding the entry for
Ingest-Auth. If the ingester is running on the same system as Gravwell itself, the installer will usually be able to detect this value and set it automatically.
The Gravwell GUI has an Ingesters page (under the System menu category) which can be used to easily identify which remote ingesters are actively connected, for how long they have been connected, and how much data they have pushed.
The replication system does not replicate entries larger than 999MB. Larger entries can still be ingested and searched as usual, but they are omitted from replication. This is not a concern for 99.9% of use cases, as all the ingesters detailed in this page tend to create entries no larger than a few kilobytes.
Subscribe and ingest from Amazon SQS queues.
Consume from Azure Event Hubs.
Ingest collectd samples.
Watch and ingest files on disk, such as logs.
Watch and ingest files on Windows, such as logs and EVTX files.
Fetch and ingest entries from Google Compute Platform PubSub Streams.
Create HTTP listeners on multiple URL paths.
Periodically collect SDR and SEL records from IPMI devices.
Create a Kafka Consumer that ingests into Gravwell. Can be paired with the Gravwell Kafka Federator.
Ingest from Amazon’s Kinesis Data Streams service.
Ingest from Microsoft’s Graph API.
Collect Netflow and IPFIX records.
Ingest PCAP on the wire.
Ingest Microsoft o365 Logs.
Issue queries and ingest data from Google Stenographer.
Ingest data directly from Amazon S3 buckets, including Cloudtrail logs.
Ingest large records into a single entry.
Ingest Shodan streaming API events.
Ingest any text over TCP/UDP, syslog, and more.
Receive and ingest SNMP trap messages.
Collect Windows events.
Global Configuration Parameters#
Most of the core ingesters support a common set of global configuration parameters. The shared Global configuration parameters are implemented using the ingest config package. Global configuration parameters should be specified in the Global section of each Gravwell ingester config file. The following Global ingester parameters are available:
The Ingest-Secret parameter specifies the token to be used for ingest authentication. The token specified here MUST match the Ingest-Auth parameter for Gravwell indexers.
The Connection-Timeout parameter specifies how long we want to wait to connect to an indexer before giving up. An empty timeout means that the ingester will wait forever to start. Timeouts should be specified in duration of minutes, seconds, or hours.
Connection-Timeout=30s Connection-Timeout=5m Connection-Timeout=1h
The Insecure-Skip-TLS-Verify token tells the ingester to ignore bad certificates when connecting over encrypted TLS tunnels. As the name suggests, any and all authentication provided by TLS is thrown out the window and attackers can easily Man-in-the-Middle TLS connections. The ingest connections will still be encrypted, but the connection is by no means secure. By default TLS certificates are validated and the connections will fail if the certificate validation fails.
The Rate-Limit parameter sets a maximum bandwidth which the ingester can consume. This can be useful when configuring a “bursty” ingester that talks to the indexer over a slow connection, so the ingester doesn’t hog all the available bandwidth when it is trying to send a lot of data.
The argument should be a number followed by an optional rate suffix, e.g.
10Mbit. The following suffixes exist:
kbit, kbps, Kbit, Kbps: “kilobits per second”
KBps: “kilobytes per second”
mbit, mbps, Mbit, Mbps: “megabits per second”
MBps: “megabytes per second”
gbit, gbps, Gbit, Gbps: “gigabits per second”
GBps: “gigabytes per second”
Rate-Limit=1Mbit Rate-Limit=2048Kbps Rate-Limit=3MBps
The ingest system supports a transparent compression system that will compress data as it flows between ingesters and indexers. This transparent compression is extremely fast and can help reduce load on slower links. Each ingester can request a compressed uplink for all connections by setting the
Enable-Compression parameter to
true in the global configuration block.
The compression system is opportunistic in that the ingester requests compression but the upstream link gets the final say on whether compression is enabled; if the upstream endpoint does not support compression or has been configured to disallow it the link will not be compressed.
Compression will increase the CPU and memory requirements of an ingester, if the ingester is running on an endpoint with minimal CPU and/or memory compression may reduce throughput. Compression is best suited for WAN connections, enabling compression on a Unix named pipe just incurs CPU and memory overhead with no added benefit.
Cleartext-Backend-Target specifies the host and port of a Gravwell indexer. The ingester will connect to the indexer using a cleartext TCP connection. If no port is specified the default port 4023 is used. Cleartext connections support both IPv6 and IPv4 destinations. Multiple Cleartext-Backend-Targets can be specified to load balance an ingester across multiple indexers.
Cleartext-Backend-Target=192.168.1.1 Cleartext-Backend-Target=192.168.1.1:4023 Cleartext-Backend-Target=DEAD::BEEF Cleartext-Backend-Target=[DEAD::BEEF]:4023
Encrypted-Backend-Target specifies the host and port of a Gravwell indexer. The ingester will connect to the indexer via TCP and perform a full TLS handshake/certificate validation. If no port is specified the default port of 4024 is used. Encrypted connections support both IPv6 and IPv4 destinations. Multiple Encrypted-Backend-Targets can be specified to load balance an ingester across multiple indexers.
Encrypted-Backend-Target=192.168.1.1 Encrypted-Backend-Target=192.168.1.1:4023 Encrypted-Backend-Target=DEAD::BEEF Encrypted-Backend-Target=[DEAD::BEEF]:4023
Pip-Backend-Target specifies a Unix named socket via a full path. Unix named sockets are ideal for ingesters that are co-resident with indexers as they are extremely fast and incur little overhead. Only a single Pipe-Backend-Target is supported per ingester, but pipes can be multiplexed alongside cleartext and encrypted connections.
The Ingest-Cache-Path enables a local cache for ingested data. When enabled, ingesters can cache locally when they cannot forward entries to indexers. The ingest cache can help ensure you don’t lose data when links go down or if you need to take a Gravwell cluster offline momentarily. Be sure to specify a Max-Ingest-Cache value so that a long-term network failure won’t cause an ingester to fill the host disk. The local ingest cache is not as fast as ingesting directly to indexers, so don’t expect the ingest cache to handle 2 million entries per second the way the indexers can.
The ingest cache should not be enabled for the File Follower ingester. Because this ingester reads directly from files on the disk and tracks its position within each file, it does not need a cache.
Max-Ingest-Cache limits the amount of storage space an ingester will consume when the cache is engaged. The maximum cache value is specified in megabytes; a value of 1024 means that the ingester can consume 1GB of storage before it will stop accepting new entries. The cache system will NOT overwrite old entries when the cache fills up. This is by design, so that an attacker can’t disrupt a network connection and cause an ingester to overwrite potentially critical data at the point the disruption happened.
Max-Ingest-Cache=32 Max-Ingest-Cache=1024 Max-Ingest-Cache=10240
Cache-Depth sets the number of entries to keep in an in-memory buffer. The default value is 128, and the in-memory buffer is always enabled, even if Ingest-Cache-Path is disabled. Setting Cache-Depth to a large value enables absorbing burst behavior on ingesters as the expense of more memory consumption.
Cache-Mode sets the behavior of the backing cache (enabled by setting Ingest-Cache-Path) at runtime. Available modes are “always” and “fail”. In “always” mode, the cache is always enabled, allowing the ingester to write entries to disk any time the in-memory buffer (set with Cache-Depth) is full. This can occur on a dead or slow indexer connection, or when the ingester is attempting to push more data than is possible over the connection it has to the indexer. By using “always” mode, you ensure the ingester will not drop entries or block data ingest at any time. Setting Cache-Mode to “fail” changes the cache behavior to only enable when all indexer connections are down.
Ingesters can log errors and debug information to log files to assist in debugging installation and configuration problems. An empty Log-File parameter disables file logging.
The Log-Level parameter controls the logging system in each ingester for both log files and metadata that is sent to indexers under the “gravwell” tag. Setting the log level to INFO will tell the ingester to log in great detail, such as when the File Follower follows a new file or Simple Relay receives a new TCP connection. On the other end of the spectrum, setting the level to ERROR means only the most critical errors will be logged. The WARN level is appropriate in most cases. The following levels are supported:
Log-Level=Off Log-Level=INFO Log-Level=WARN Log-Level=ERROR
The Source-Override parameter will override the SRC data item that is attached to each entry. The SRC item is either an IPv6 or IPv4 address and is normally the external IP address of the machine on which the ingester is running.
Source-Override=10.0.0.1 Source-Override=0.0.0.0 Source-Override=DEAD:BEEF::FEED:FEBE
Many ingesters can emit entries on the
gravwell tag for the purposes of auditing, health and status, and general ingest infrastructure logging. Typically, these entries will use the source IP address of the ingester as seen from the indexer for the SRC field. However, it can be useful to override the source IP field for only the entries that are actually generated by the ingester. A good example would be using the
Log-Source-Override on the Gravwell Federator to change the SRC field for health and status entries, but not every entry that transits the Federator.
Log-Source-Override configuration parameter requires an IPv4 or IPv6 value as a parameter.
Log-Source-Override=10.0.0.1 Log-Source-Override=0.0.0.0 Log-Source-Override=DEAD:BEEF::FEED:FEBE Log-Source-Override=::1
Data Consumer Configuration#
Besides the global configuration options, each ingester which uses a config file will need to define at least one data consumer. A data consumer is a config definition which tells the ingester:
Where to get data
What tag to use on the data
Any special timestamp processing rules
Overrides for fields such as the SRC field
The Simple Relay ingester and the HTTP ingester define “Listeners”; File Follow uses “Followers”; the netflow ingester defines “Collectors”. The individual ingester sections below describe the ingester’s particular data consumer types and any unique configurations they may require. The following example shows how the File Follower ingester defines a “Follower” data consumer to read data from a particular directory:
[Follower "syslog"] Base-Directory="/var/log/" File-Filter="syslog,syslog.[0-9]" #we are looking for all authorization log files Tag-Name=syslog Assume-Local-Timezone=true #Default for assume localtime is false
Note how it specifies the data source (via the
File-Filter rules), which tag to use (via
Tag-Name), and an additional rule for parsing timestamps in the incoming data (
All ingesters attach a timestamp to each entry sent to an indexer. Most ingesters extract timestamps from the data being ingested, such as the timestamp field in Syslog, and ingesters will extract timestamps as appropriate to the data. When an ingester cannot extract a timestamp, or the input data does not have a timestamp at a known position in the input data, the ingester will attempt to find a timestamp (see the list of timestamp formats) using a number of formats.
If the ingester still cannot find a valid timestamp, the current time will be applied to the entry.
When an ingester attempts to find a timestamp based on the list of timestamp formats, it will always try the last successful format first. For example, if an entry has a timestamp
02 Jan 06 15:04 MST, the ingester will attempt to parse the next entry with the same timestamp format. If it does not match, then the ingester will attempt all other timestamp formats.
There are several ways to change the behavior of how timestamps are parsed, detailed in the next section. Additionally, fully custom timestamp formats can be provided in some ingesters.
Dealing with time zones can be one of the most challenging and frustrating aspects of ingestion. If a log’s timestamp includes an explicit UTC offset (e.g. “-0700”), things are relatively easy, but many log formats do not include any time zone information at all! Sometimes, the system generating the log entry is in a local time zone, while the Gravwell ingester’s system is set to UTC, or vice versa.
If you believe you have configured your ingester properly, but you’re not seeing any data in a query, try expanding your query timeframe to include the future using the “Date Range” timeframe selection: just set the End Date to some time tomorrow. If the Gravwell ingest system is set to a US time zone, but the logs are in UTC time with no offset included, the incoming data will be ingested in the “future”.
Timezone-Override parameter (described below) is the surest way to fix time zone problems. If your data has a UTC timestamp but the system clock is set to another time zone, set
Timezone-Override="Etc/UTC". If your data is in US Eastern time, but the system clock is set to UTC, set
Timezone-Override="America/New_York", and so on.
Time Parsing Overrides#
Most ingesters attempt to apply a timestamp to each entry by extracting a timestamp from the data. There are several options which can be applied to each data consumer for fine-tuning of this timestamp extraction:
Ignore-Timestamps=truewill make the ingester apply the current time to each entry rather than attempting to extract a timestamp. This can be the only option for ingesting data when you have extremely incoherent incoming data.
Assume-Local-Timezone(boolean): By default, if a timestamp does not include a time zone the ingester will assume it is a UTC timestamp. Setting
Assume-Local-Timezone=truewill make the ingester instead assume whatever the local computer’s timezone is. This is mutually exclusive with the Timezone-Override option.
Timezone-Overridetells the ingester that timestamps which don’t include a timezone should be parsed in the specified timezone. Thus
Timezone-Override=US/Pacificwould tell the ingester to treat incoming timestamps as if they were in US Pacific time. See this page for a complete list of acceptable timezone names (in the ‘TZ database name’ column). Mutually exclusive with Assume-Local-Timezone.
Timestamp-Format-Override(string): This parameter tells the ingester to look for a specific timestamp format in the data, e.g.
Timestamp-Format-Override="RFC822". Refer to the timegrinder documentation for a full list of possible overrides, with examples.
The Kinesis and Google Pub/Sub ingesters do not provide the
Ignore-Timestamps option. Kinesis and Pub/Sub include an arrival timestamp with every entry; by default, the ingesters will use that as the Gravwell timestamp. If
Parse-Time=true is specified in the data consumer definition, the ingester will instead attempt to extract a timestamp from the message body. See these ingesters’ respective sections for additional information.
Custom timestamp formats are supported on many ingesters, see Custom Time Formats for more information.
The “Source-Override” parameter instructs the consumer to ignore the source of the data and apply a hard coded value. It may be desirable to hard code source values for incoming data as a method to organize and/or group data sources. “Source-Override” values can be IPv4 or IPv6 values.
Source-Override=192.168.1.1 Source-Override=127.0.0.1 Source-Override=[fe80::899:b3ff:feb7:2dc6]
The Gravwell ingest API and core ingesters are fully open source under the BSD 2-Clause license. This means that you can write your own ingesters and integrate Gravwell entry generation into your own products and services. The core ingest API is written in Go, but the list of available API languages is under active expansion.
A very basic ingester example (less than 100 lines of code) that watches a file and sends any lines written to it up to a Gravwell cluster can be seen here
Keep checking back with the Gravwell GitHub page, as the team is continually improving the ingest API and porting it to additional languages. Community development is fully supported, so if you have a merge request, language port, or a great new ingester that you have open sourced, let Gravwell know! The Gravwell team would love to feature your hard work in the ingester highlight series.