Data Ageout#
By default, Gravwell will never delete data. It demands that the user specify policies for each storage well describing how data may be migrated and/or deleted, a process we refer to as “ageout”. The ageout policies control data retention, storage well utilization, and compression. Each well supports a hot location and an optional cold storage location with a set of parameters which determine how data is moved from one storage system to the other. An ideal Gravwell storage architecture is comprised of relatively small pools of high-speed storage for “hot” stores and a high volume/low cost storage pool to be used for longer term “cold” storage. NVME-based flash and/or XPoint drives make a great hot well, while magnetic RAID arrays, NAS, or SAN pools work well for cold pools.
The unit of storage within a well is the shard, representing approximately 1.5 days (217 seconds) of data. The active shard is the shard which is receiving entries for the current time. Every 217 seconds, a new active shard is created and the previous active shard becomes eligible for ageout. Ageout always affects an entire shard. Because of this, even with the most strict ageout policies each well must have enough space for about 1.5 days of data–keep this in mind when planning! If you specify a storage limit of 1GB but then ingest 10GB of data into one shard in an hour, your well will occupy 10GB until the active shard can be aged out! Searching is not impeded during ageout, nor is ingestion.
Ageout policies can be defined via three parameters:
Time
Total Storage
Storage Availability
The time-based parameter allows you to specify data retention policies to adhere to policies, contractual agreements, or legal requirements. For instance, corporate policy may state that web proxy logs should be stored for no more than 30 days.
The total storage parameter specifies an upper storage bound for a well, instructing Gravwell to only ageout or discard data when the amount of stored data exceeds the storage bounds.
The storage availability parameter instructs Gravwell to maintain at least a minimum amount of available storage, aging out data as needed to free up space. Storage availability constraints are useful for when you want to allow Gravwell to use free storage on a device, but discard data if the device ever drops below some availability threshold.
Multiple constraints can be added to a single well. For instance, you might specify that data should be kept for up to 90 days, unless the total exceeds 100GB.
The ageout system is designed to optimize data storage as it transfers entries from hot to cold pools. The optimization localizes entries of the same time range and tag, reducing head movement on traditional spinning disks. Combined with compression, the optimization phase can significantly improve storage utilization and search performance on cold data.
The ageout system can be configured to delete old data, so it is critically important that the configuration be vetted prior to engaging well ageout.
Basic Configuration#
Each well will always have a hot storage location defined via the Location
directive. It may optionally have a cold storage location defined via Cold-Location
. Ageout is configured by setting constraints and defining any desired rules for moving or deleting data from hot to cold storage, and from cold storage to either archives or deletion. Be aware that unless deletion rules are explicitly provided, Gravwell will never delete data. The default configuration does include ageout rules, so be sure to review the config file (and maybe set up some custom wells) if you’ve got particular retention requirements.
Attention
Ageout configurations are on a per well basis. Each well operates independently and asynchronously from all others. If two wells are sharing the same volume, enabling ageout directives based on storage reserve can cause one well to aggressively migrate and/or delete data due to the incursion by another.
Note
If data is actively coming into a storage shard that is marked for ageout or is actively being queried, the ageout system will defer aging out the shard to a later time.
Hot/Cold Deletion Rules#
When data ages out from the hot well, it is moved to the cold well if a cold well is defined. If no cold well is defined, data will be left in the hot well unless deletion is explicitly authorized via the Delete-Cold-Data
option.
Similarly, when data ages out of the cold well (“frozen”) it will not be deleted unless the Delete-Frozen-Data
parameter is specified. If you need long-term archives, frozen data can be sent to the Gravwell archiving service before deletion.
Examples of these rules can be seen in the configuration snippets in the following sections.
Warning
You must specify either Delete-Cold-Data
(if using only a hot well) or Delete-Frozen-Data
(if using both hot & cold wells) if you want data to be actually removed from the disk. These settings require explicit authorization from the user, because our general policy is to never delete anything unless asked. If you do not include the appropriate deletion parameter in your well config, data will never be deleted and your disk will eventually fill up!
Time-Based Ageout Rules#
Time-based ageout manages data based on time retention requirements. For example, an organization may have requirements that all logs be kept for 90 days. The time-based ageout constraint is best used on data pools where policy and/or legal requirements dictate log retention times. Time based ageout durations can be specified in days and weeks using the following case-insensitive abbreviations:
“d” - days
“days” - days
“w” - weeks
“weeks” - weeks
Note
A shard will only be aged-out when its most recent entries exceed the duration.
An example well configuration using only a hot pool of data and deleting data with a timestamp that is more than 30 days old:
[Storage-Well "syslog"]
Location=/mnt/xpoint/gravwell/syslog
Tags=pcap
Hot-Duration=30D
Delete-Cold-Data=true
An example configuration in which data is moved from the hot pool to the cold pool after 7 days, and deleted from the cold pool after 90 days:
[Storage-Well "syslog"]
Location=/mnt/xpoint/gravwell/syslog
Cold-Location=/mnt/storage/gravwell_cold/syslog
Tags=pcap
Hot-Duration=7D
Cold-Duration=90D
Delete-Frozen-Data=true
Note
In the above configuration, data will be deleted permanently when the entry timestamps are 90 days old, having spent 7 days in the hot pool and 83 days in the cold pool. The Hot-Duration
and Cold-Duration
values are not cumulative; they specify the maximum age of entries.
The Time based ageout is invoked once per day, sweeping each pool for shards that can be aged out. By default the sweep happens at midnight UTC, but the execution time can be overridden in the well configuration with the Ageout-Time-Override directive. The override directive is specified in 24 hour UTC time.
An example configuration that overrides the ageout time checks to occur at 7PM UTC:
[Storage-Well "syslog"]
Location=/mnt/xpoint/gravwell/syslog
Cold-Location=/mnt/storage/gravwell_cold/syslog
Tags=syslog
Tags=switchlogs
Hot-Duration=7D
Cold-Duration=90D
Delete-Frozen-Data=true
Ageout-Time-Override=19:00
Warning
Data age is calculated as the time difference between now and the timestamp on the entry; if an entry is ingested with a timestamp 90 days in the past, its age is considered to be 90 days. Ingesting historical data with a time based ageout may cause the data to be aged out immediately.
Total Storage-Based Ageout Rules#
Total storage constraints allots a specific amount of storage in a volume regardless of time spans. Storage constraints allow a Gravwell indexer to make aggressive and full use of high speed storage pools which may be of limited size (such as NVME flash). The indexer will keep entries in the storage pool, as long as the well isn’t consuming more than allowed. Storage constraints allow for unexpected bursts of ingest without disrupting data storage. For example, if an indexer has 1TB of high speed flash storage which typically handles 7 days of hot storage but an unexpected data event causes 600GB of ingest in a single day, the indexer can age out the older data to the cold pool without disrupting the hot pool’s ability to take in new data. Shards are prioritized by time; the oldest shards are aged out first for both hot and cold pools.
An example well configuration that keeps up to 500GB of data in a hot pool, deleting old data when the 500GB limit is exceeded:
[Storage-Well "windows"]
Location=/mnt/xpoint/gravwell/windows
Tags=windows
Tags=sysmon
Max-Hot-Storage-GB=500
Delete-Cold-Data=true
An example well configuration where the hot pool keeps approximately 50GB and then ages out into a cold pool which keeps up to 10TB:
[Storage-Well "windows"]
Location=/mnt/xpoint/gravwell/windows
Cold-Location=/mnt/storage/gravwell_cold/windows
Tags=windows
Tags=sysmon
Max-Hot-Storage-GB=50
Max-Cold-Storage-GB=10000
Delete-Frozen-Data=true
Attention
Storage-based constraints are not an instant hard limit. Be sure to leave a little room so that when a storage allotment is exceeded, the indexer can ageout data while still ingesting. For example, if a hot storage device can hold 512GB and the system typically ingests 100GB per day, setting the storage limit to 490GB should provide enough headroom so that the hot pool won’t completely fill up while the indexer is migrating data.
Storage Available-Based Ageout Rules#
Attention
Be careful when configuring storage available-based ageout rules. These rules apply to the available storage of the disk partition the well is configured on. If disk space is used by another data source, or even a process outside of Gravwell, the rules will still be enforced. This means that external data can cause data deletion when using this option.
Attention
Be careful when configuring storage available-based ageout rules with multiple wells. If multiple wells are configured on the same disk partition, it’s possible to have well configurations “compete” for available disk space. This is especially true if two wells have different storage available-based ageout rules. This means that wells on the same disk that use this option can cause each other to delete data.
Well storage constraints can also be applied based on availability of storage. Some wells may be low priority, consuming storage only when it is available. Using the storage reserve directives allows a well to consume as much space as it wants, so long as some percentage of available storage is maintained on the volume. A storage availability rule essentially allows the well to act as a “second class” well, consuming disk space only when no one else is. Specifying Hot-Storage-Reserve=5
ensures that should the hot storage volume drop below 5% free space the well will begin migrating or deleting its oldest shards. The reserve directives apply to the underlying volume hosting the storage location, meaning that if the volume is also hosting other wells or other arbitrary file storage, the well can pull back its storage usage as needed.
An example well configuration which will use the hot location as long as there is 30% free space on the volume, and will use the cold volume as long as there is 10% free space:
[Storage-Well "doorlogs"]
Location=/mnt/xpoint/gravwell/doorlogs
Cold-Location=/mnt/storage/gravwell_cold/doorlogs
Tags=badgeaccess
Hot-Storage-Reserve=30
Cold-Storage-Reserve=10
Delete-Frozen-Data=true
Attention
The Gravwell ageout system which operates on storage reserves is operating entirely orthogonal to outside influences, if a well is configured to respect a 50% storage ceiling and an outside application fills the volume to 60%, Gravwell will delete all entries outside the active shard. Wells configured with storage reserved should be treated as expendable.
Forcing a Required Retention Period#
Controlling ageout using storage constraints allows Gravwell to manage storage by moving and/or deleting data only when needed. However, it may be desirable to enforce a line in the sand in addition to standard storage-based controls. The Required-Retention
stanza can specify a period that directs the configured well to refuse to delete data, across hot or cold storage, if the specific retention period is not maintained.
For example, the following configuration defines that the well should keep up to 100G in hot storage and maintain at least 10% of spare storage in cold. The Required-Retention=30d
also informs the well that it is not allowed to delete shards within the 30 days, regardless of storage availability.
[Storage-Well "doorlogs"]
Location=/mnt/xpoint/gravwell/doorlogs
Cold-Location=/mnt/storage/gravwell_cold/doorlogs
Tags=badgeaccess
Max-Hot-Storage-GB=100
Cold-Storage-Reserve=10
Required-Retention=30d
Delete-Frozen-Data=true
Be aware that ingest may stop if you do not have enough storage to cover the specified retention period. If there is no room for data and Gravwell is not allowed to delete, the only option is to stop taking on more data. Required-Retention is designed to provide a safety mechanism for compliance or regulatory requirements while also letting Gravwell keep as much data as it can.
Attention
The Required-Retention flag will check specified retention periods against time based ageout. Setting Required-Retention=90d
and Cold-Duration=80d
will result in an error as the required retention directly conflicts with the maximum duration value specified in the Cold storage well.
Caveats and Important Notes#
Ageout constraints are applied to entire shards, so if a single shard grows beyond the size of a data constraint the shard will age out in its entirety once the shard is idle.
Time-based constraints require that the entire shard fall outside the specified time window. As such, time constraints that are less than 1 day have no meaning, and hot pools must be able to hold at least 2 days worth of data.
Take care when combining time-based constraints with total storage constraints. If Hot-Duration=7D
and Cold-Duration=90D
are specified, data will be deleted after 90 days. However, if Max-Hot-Storage-GB=2
and Cold-Duration=90D
are specified, data will move from the hot well to the cold well when the hot well exceeds 2GB, and data will be deleted from the cold well when it is 90 days old.
Transparent Compression + Docker Caveats#
Transparent compression has important implications for ageout rules involving total storage. A file with transparent compression enabled takes up less space on disk than the uncompressed version, but it is difficult to find out precisely how much disk space it is actually using. Consider a well configured with Max-Hot-Storage-GB=5
; if transparent compression is enabled, we would prefer to ageout data only when 5GB of actual disk space are used, rather than when the size of the uncompressed data reaches 5GB.
Gravwell attempts to query the actual on-disk size of shards on BTRFS systems with compression enabled. On a typical Linux system, this will work fine. However, when using Docker containers, extra care must be taken. If Docker’s runtime directory (default /var/lib/docker
) is on a BTRFS filesystem, wells can use transparent compression, but the on-disk size query will fail because certain IOCTL calls are disabled by default. You can enable the appropriate IOCTL calls for your Docker container by passing the --cap-add=SYS_ADMIN
flag to the command you use to instantiate the indexer container.
Deletion Safety Nets#
Ageout policies can and will delete data. It is critically important that administrators vet configurations and hardware resources to ensure that both hot and cold storage pools can accommodate the requested ageout constraints. Well configurations require that deletion be explicitly authorized via a “Delete-X-Data” directive (Delete-Cold-Data
for hot pools and Delete-Frozen-Data
for cold pools).
The Delete-X-Data directives are used as safety nets to ensure administrators think about data deletion during configuration. For example, if a well is configured to have a hot pool, no cold pool, and a hot retention of 7 days, Gravwell will attempt to delete data from the hot pool that is older than 7 days. However, if the well does not have a configuration directive of “Delete-Cold-Data=true” the ageout system WILL NOT delete the data; instead it will complain in the logs and essentially disable ageout.
Cold pools contain the same directive requirement, meaning that Gravwell will not delete data from the cold pool as it ages out of the cold pool unless the “Delete-Frozen-Data=true” directive is set. Note that if data that is 120 days old is ingested into the hot pool and the cold pool retention configuration is 90 days, and the Delete-Cold-Data directive is set to true the ageout system will bypass the cold pool entirely and directly delete the data.
Time-based ageout relies on the system clock to determine when data should be aged out, meaning that an improper system clock could cause data to be aged out prematurely. Administrators must ensure that system clocks are well maintained and if NTP is in use, it is a trusted source. If an attacker can gain control over the system clock of a Gravwell indexer for reasonable periods of time, they can cause ageouts. Time is the anchor in Gravwell and it must be protected with the same vigor as the storage pools.
Ageout Rule Interactions#
Ageout rules can be stacked and combined to provide robust control over storage resources. For example, Gravwell highly recommends a small pool of high speed storage for use in the hot pool and a large low cost array of disks for the cold pool. Storage constraints can be combined to utilize the small high speed pool at will, while still adhering to data retention policies.
Data ageout constraints operate independently of one another in a first come first serve basis. If a time constraint of 7 days is applied to a well in addition to a storage constraint of 100GB, shards will be aged out at 7 days or when the pool exceeds 100GB, whichever comes first. While useful to combine multiple constraints for a single pool in a well, be aware that an overly aggressive ageout configuration can cause unexpected behavior. For example if a well has a 7 day retention, but also a Hot-Storage-Reserve of 5% Gravwell will attempt to meet the 5% retention independently of the 7 day retention. Each ageout directive acts entirely independently, setting a retention size or storage reserve can and will override any time based directives, and vice versa.
Example well configuration which uses up to 100GB of hot storage and retains data for 90 days:
[Storage-Well "weblogs"]
Location=/mnt/xpoint/gravwell/web
Cold-Location=/mnt/storage/gravwell_cold/web
Tags=apache
Tags=nginx
Tags=iis
Max-Hot-Data-GB=100
Cold-Duration=90D
Delete-Frozen-Data=true
Example well configuration which tries to keep 10% free on the hot storage volume while retaining data for 30 days:
[Storage-Well "weblogs"]
Location=/mnt/xpoint/gravwell/web
Cold-Location=/mnt/storage/gravwell_cold/web
Tags=apache
Tags=nginx
Tags=iis
Hot-Storage-Reserve=10
Cold-Duration=30D
Delete-Frozen-Data=true
Example well configuration which keeps 7 days or 100GB in the hot pool and keeps all data in the cold pool as long as 20% is available:
[Storage-Well "weblogs"]
Location=/mnt/xpoint/gravwell/web
Cold-Location=/mnt/storage/gravwell_cold/web
Tags=apache
Tags=nginx
Tags=iis
Hot-Storage-Reserve=100
Hot-Duration=7D
Cold-Storage-Reserve=20
Delete-Frozen-Data=true