Configure Optimal Utilization of the Available Disk Space | Troubleshooting UI Alerts

Summary

Full UI Alert: "Sophie is running low on disk, data ingestion will soon be stopped"

This alert means disk-space doesn’t meet your current utilisation. Administrators need to configure optimal utilisation of the available disk space.

Please see a few methods below to follow when disk-space doesn’t meet your current utilisation.

Release

Sophie standalone versions 3.4.x - 3.7.x

Instructions

Sophie will alert you when allocated disk memory is nearly full. Failing to resolve this error can result in possible downtime/negative impact on the system.

Find the root cause of the memory issue to resolve this error

If there is not enough space & the data currently in the system is essential (there is no redundant data) allocating more space will resolve this issue. See blow on how to check what is consuming disk space.

If there is enough space allocated & memory used is still nearly full (or completely utilized), you will need to clean up space and remove unnecessary data. In this situation it is important to understand what redundant data is taking up space to avoid this error from reoccurring.

Investigate

Check Disk Space Statistics

Enter the following command to view where your disk space is utilised to help locate where the memory issue is

df -h

To Cleanup any unnecessary data that may be taking up space

View disk contents by order of largest (to see what is taking up most space) run the following command

sudo du -h /* --max-depth=1 | sort -h

Review if the content is necessary or redundant. Remove any files/content you do not need

Remove old images using the following commands

Removal of necessary images will corrupt the application. ***Only remove unused images.

Check what images exist

docker images

Delete old images

docker image rm <imageID> <imageID>

Delete achieved logs from journal

sudo journalctl --vacuum-size=1G

To Configure Optimal Utilisation of the Available Disk Space

If Sophie is running on low disk, perform the following steps:

Verify Direct-Attached Storage

First, make sure you're running on a locally-attached disk. Running with spinning disks, or even SAN, has a severe impact on performance.

To verify whether data spins run the following command from the terminal:

curl 127.0.0.1:9200/_nodes/stats/fs?pretty

If fs.data.spins is true, then data, indeed, spin.

Another option to verify this is by running the following: (replace sde with the disk holding the Elastic data)

cat /sys/block/sde/queue/rotational

1 = spinning (bad)

0 = SSD (good)

Steps to Improve Throughput and Disk Usage

Remove data you don't need:

At your Sophie instance, go to Settings->General->Storage. Check the "heaviest" sources, for each "heavy" source:

- Review their structure - are there any properties you can remove? Decreasing the number of propertied in the stored-events has the biggest impact.
- Do you need the rawMessage property? It's used in notifications, for free-text correlations, and it is somewhat helpful in Kibana. But it doubles the size of a document. If the structuring is good, remove this field by going to the source-settings, then set elastic.store_raw_event to false.

Sub-sample

- First, try to be selective with what you drop. For example, you might prefer to drop low-severity events in the data-input.
- Consider sub-sampling (i.e. taking one every X events). This can be controlled per-source via the elasticsearch.subsampling_ratio setting.

Steps to Improve the Throughput

Optimize the bulk-indexing interval

Adjust the following general-settings:

- elasticsearch.bulk_actions
- elasticsearch.bulk_concurrent_requests

Use the operational-dashboard to measure the effect, the objective is to get the documents.write metric as high as possible.

Note that at some point, you might start seeing errors in the Elastic logs - which means you're bombarding it with more than it can take.

Keep indices size at no more than the size of the RAM allocated to Elastic

e.g. if Elastic has 30GB RAM, keep your indices smaller than that.

If some of your indices grow larger, then consider either:

- - Increasing the number of shards (even if working with a single instance).
  - change the index rotation to be hourly instead of daily.

Both of these settings can be found under the source-setting (elasticsearch.number_of_shards and elasticsearch.index_time_interval).

Increasing the number of shards is almost always better, but if the daily volume of a source is more than x20 the size of the memory allocated to Elastic, switch to hourly indices.

Remove/disable read-heavy modules

The heavy readers are:

- - Custom Alerts (especially ones querying event-* or with lengthy periods)
  - ARCA modules (entity analysis, highlight analysis) Also, each alert creation involves querying Elastic, so make sure you're not generating too many alerts (a small number of incidents might be misleading. Check the number of alerts-per-incident). If there are many hundreds of daily alerts, consider tweaking the anomaly-detection engine.

Steps to reduce disk usage

Compress large indices

Under source-settings, change elasticsearch.index_codec to be best_compression. Note that this will only take effect for new indices.

Assessing disk performance

There are several ways for doing this, but the recommended one is to run iostat -xd while the system is running.

Check the disk that is running Elastic, and look for the r_await and w_await columns. Decent values are up to very few milliseconds.

Ten milliseconds or more means the disk is too slow.