[READMORE] You can split a single InfluxDB measurement into smaller measurements for performance tuning or for application aspects you may not have planned while initially storing data.
influx shell, run the following statement:
SELECT * INTO <destination_measurement> FROM <source_measurement> [WHERE <criteria>] GROUP BY *
Don't forget to use the
GROUP BY * at the end, or tags from the source measurement won't be copied over.
We use InfluxDB as the backend for a customer-facing component that summarizes a large dataset updated on a daily basis. The use-case is read-intensive by nature and uses statistical features in influx to provide a high-level view of underlying data. Writes occur daily in large batches. The collected raw data in first recorded in RDS/PostgreSQL due to the mission-critical nature of the dataset and later shipped to Influx by a separate process. While the uptime of Influx is important, it's also critical for us to reload Influx from scratch in case something untoward occurs. We periodically drain the Postgres data into flat files and archive them in S3 so that our storage costs are balanced.
The InfluxDB instance is hosted on a single node AWS
m5.large machine (2 vCPU, 8GB RAM) for it's memory, compute, and network throughput features. CPU does hit 100% frequently during queries but response time is good for some ugly queries. We had issues managing write-throughput on a smaller instance and writes erroring out with
failed to write point batch to database /
cache maximum memory size exceeded. Tweaking
cache-max-memory-size and reducing our fields and tags helped, but as the dataset grew, query performance wasn't good enough. Moving to a large instance was a helpful quick fix.
The critical fix that halved our query time however was splitting our measurements. Among some of InfluxDB's most interesting features is the ability to query multiple measurements with a single query using a regex in the
So instead of:
SELECT <fields> FROM <measurement> WHERE type =~ /^(1|2)$/ GROUP BY __
you can run:
SELECT <fields> FROM /^measurement_(1|2)$/ GROUP BY type, __
This may or may not make sense depending on how/what data you're storing, but it's something to consider when deciding your schema. The good news is that you can experiment with the split at any time using
SELECT * INTO as shown above and update your queries to run on multiple measurements.