Dealing with Prometheus queries can sometimes feel like navigating a maze. You’ve likely found yourself wrestling with complex PromQL, trying to extract the exact metrics you need. But there’s a way to simplify this. A solution that helps you work with precomputed data. It involves using Prometheus Recording Rules. These rules let you store results from common queries into new time series. Making your queries simpler, faster, and more efficient.
What are Prometheus Recording Rules?
Think of Prometheus Recording Rules as a way to make custom metrics that are based on existing ones. They work by taking the results of a PromQL query and storing them as a new time series. These new time series can be then queried with much less effort. They allow you to compute expensive, complex metrics ahead of time, and then query the result as a simpler, pre-computed value. This not only makes your queries run faster, but it also simplifies your dashboard and alert definitions.
Recording Rules work by taking the results of a query and storing them in a new metric. This new metric is like any other metric in Prometheus. It can be queried, graphed, and used for alerts. The difference is that this metric’s data is derived from a query rather than from a direct scrape. Recording Rules also allow you to create metrics that aren’t directly available from your targets. Or metrics that require a good bit of calculation to be useful.
Why are Prometheus Recording Rules Useful?
The main advantage is simplicity and speed. You can take a complex query, run it once, and store the result. You can then query that result as a simple metric. This can make dashboard creation and alert definition much more straightforward and less resource-intensive.
Faster Queries
Without Recording Rules, your dashboards could be running complex PromQL queries every few seconds. This can tax your Prometheus server. And could also slow down your dashboards. By using Recording Rules, you can compute these complex queries once, and then your dashboards can query a much lighter metric. This gives you a smoother, faster dashboard.
Simplified Dashboards
Using Recording Rules helps you simplify your dashboards by cleaning up complex PromQL from the graphs. Now you can just reference pre-computed metrics, making your dashboards easier to read, share, and maintain. This is especially handy when you are working with teams. Everybody will know what metrics they are using.
Better Alerting
Complex alert rules can be a challenge to debug. By using Recording Rules, you can break down complex alert logic into simpler pieces. This makes your alerts easier to understand and maintain. And you don’t have to rewrite the same logic over and over again.
Conserving Resources
Recording Rules are not just about ease and speed. They are also about efficiency. By running complex queries only once, you save a huge amount of processing power on your Prometheus servers. This allows you to handle more data, and do so without extra hardware.
How do Prometheus Recording Rules Work?
Recording Rules live in a configuration file. You’ll have to tell Prometheus to look for your configuration file with Recording Rules. Here’s a quick look at how they work.
Configuration
First, you define your recording rules in a YAML file. Each rule defines:
– a new metric name
– the PromQL query that produces that metric
– a set of labels that help you identify the metric
Evaluation
Prometheus periodically runs the queries defined in your recording rules. And then, it stores the results as a new time series. The evaluation frequency can be customized, but it is typically every minute, to align with the usual scrape interval of Prometheus.
Storage
The results of your recording rules are stored like any other metric in Prometheus. That means, they are available for querying, graphing, and use in alerts, just like any other metric.
Querying
You can query the new metrics by their given name as you would any other metric. This simplifies dashboard creation and alerts. By querying a single metric rather than a complex query each time.
Creating Prometheus Recording Rules
Let’s get down to the actual way of doing things. The structure of a recording rule file is simple and easy to understand.
Recording Rule File Structure
You start with a group. Each group of rules defines a set of related recording rules. All recording rules in a group are evaluated at the same time, so they should be related to each other in some way.
A rule_files
section in the Prometheus config tells Prometheus where your rule files are.
Here is a basic structure example:
rule_files:
- "rules/*.yml"
groups:
- name: my_recording_rules
rules:
- record: my_new_metric
expr: sum(my_metric)
- record: my_second_new_metric
expr: avg(my_other_metric)
In this example:
* rule_files
defines that Prometheus should read all .yml files in the rules/ directory.
* groups
tells Prometheus to evaluate the rules under my_recording_rules
.
* rules
is a list of recording rules that follow:
* record
is the name of the metric, this is the metric name you query later.
* expr
is the PromQL query that computes the result.
Defining a Simple Recording Rule
Now, let’s dive a little deeper into a simple recording rule and how it is defined:
groups:
- name: my_recording_rules
rules:
- record: my_summed_metric
expr: sum(my_original_metric)
In this example:
– record
: This indicates the name of the new metric (my_summed_metric
) that you will be able to use in your dashboards or alerts.
– expr
: This defines the PromQL expression (sum(my_original_metric)
) which calculates the sum of all values for the metric my_original_metric
.
This new metric, my_summed_metric
, can then be queried as if it was a native Prometheus metric.
Using Labels in Recording Rules
Labels are a key part of working with Prometheus, Recording Rules are no different. Labels help you give context to your metrics. Let’s extend the above example by adding labels:
groups:
- name: my_recording_rules
rules:
- record: my_summed_metric
expr: sum(my_original_metric) by (instance, job)
The by (instance, job)
part ensures that the sum is calculated for each combination of instance
and job
labels.
So, this query:
sum(my_original_metric)
will output a single value for all instances and jobs. This will output the sum of my_original_metric
from every instance and every job that sends this metric to Prometheus. This is not that useful, as you can’t know which instances or which jobs are sending this metric.
While this query:
sum(my_original_metric) by (instance, job)
will output a value for every instance and every job combination. So for the instance my-server-a
and job my-backend
, and also for instance my-server-b
and job my-frontend
, and so on. This makes the metric useful, as you can know where the metric value is coming from.
This gives you a more granular view of your data. And makes it easier to filter the results based on your specific needs. You also get all of the features of Prometheus labels. Like filtering and aggregation.
Working with Rates
Working with rates and derivatives, especially when it comes to metrics related to counters, are important. These types of metrics don’t represent an actual number, but instead represent a number that can only increase. So when you want to graph them or generate an alert, you have to calculate the rate of increase.
Here is how you can calculate a metric that represents the rate of increase per second of another metric, in a Recording Rule:
groups:
- name: my_recording_rules
rules:
- record: my_metric_rate
expr: rate(my_counter_metric[5m])
In this example:
–rate(my_counter_metric[5m])
calculates the rate of change per second of my_counter_metric
over the last 5 minutes. This gives you the rate of increase of this metric per second.
This rate is useful for graphing and alerting. As a counter value itself isn’t very descriptive and it is hard to alert using it.
Complex Recording Rule Example
Let’s see a more complex real-world example of using a Recording Rule that makes a composite metric out of multiple other metrics.
groups:
- name: my_complex_rules
rules:
- record: my_requests_per_second
expr: sum(rate(my_http_requests_total[5m])) by (job, instance)
- record: my_average_request_duration
expr: sum(rate(my_http_request_duration_seconds_sum[5m])) by (job, instance) / sum(rate(my_http_request_duration_seconds_count[5m])) by (job, instance)
- record: my_requests_duration_per_second
expr: my_requests_per_second * my_average_request_duration
In this example:
– my_requests_per_second
is computed from my_http_requests_total
, and it represents the number of requests per second per job and instance.
– my_average_request_duration
is computed from the my_http_request_duration_seconds_sum
and my_http_request_duration_seconds_count
metrics, representing the average duration of the requests in seconds per job and instance.
– my_requests_duration_per_second
takes those metrics and multiplies them to output the total amount of time requests take up per second, per job and instance.
You can use my_requests_duration_per_second
to generate alerts for slow services and even show them in dashboards.
Best Practices for Prometheus Recording Rules
Even though Recording Rules are a powerful way to improve your monitoring, you need to follow some best practices to get the most out of them.
Naming Conventions
Use descriptive names for your new metrics, so that they can be easily identified and understood. For example, instead of new_metric_1
, use http_requests_per_second
. Make the naming scheme consistent and easy to understand for the whole team.
Avoid Redundancy
Ensure you’re not recomputing data that already exists. It is better to reuse existing recording rules rather than create similar rules. This can make your recording rules easier to understand, debug, and share with the team.
Keep Rules Simple
Start with simpler rules and increase complexity if necessary. This makes it easier to debug and manage the rules. Complex rules can be hard to read and debug.
Review and Update
Periodically review and update your rules to make sure they match your monitoring needs. You might not need all of your rules as you change your systems, so deleting old rules is also a good practice.
Document Your Rules
Add a comment on each rule, explaining what it is doing. The team will appreciate the time you take to do this. Clear documentation will help you and others understand the rules later, and can help you debug things.
Test Your Rules
Before deploying, make sure your rules are tested. You can use the Prometheus query editor to test them. It can also help you understand if the queries return the correct information.
Debugging Prometheus Recording Rules
Debugging Recording Rules can be a headache if not handled well. Here are some things you can check.
Query the Metric
If your recording rules aren’t working as they should, start by querying the new metric directly in the Prometheus query explorer. Check if the values are being recorded at all. If the metric isn’t being generated, then you have to look at the query itself.
Check your Expr
Check your expr to see if it returns the value you expect. By querying the same expression inside your recording rule, you will be able to debug if it returns the proper data. You can test this in the Prometheus query explorer, or even by using a tool like promtool
.
Check the Labels
Check to see if your labels are correct. Make sure the by() part contains the correct labels, as these labels will define how your metrics are broken down.
Review the Prometheus Logs
Check the Prometheus logs to see if there are errors related to recording rules. This will be helpful when debugging syntax errors or any other configuration issue. You should also check the logs for errors related to query timeouts.
When to Use Prometheus Recording Rules
Knowing when to use Recording Rules is key to making the most of them. Here are some common use cases:
Complex Calculations
When you need to calculate complex metrics out of other metrics, Recording Rules are the way to go. You can use rates, averages, and a host of other functions to get the exact metrics you want.
Aggregated Metrics
When your use case requires you to aggregate metrics from many sources into one, Recording Rules can help. This includes things like summing metrics by different labels, or dividing metrics to create a rate.
Simplifying Queries
When your PromQL queries get too complex for dashboards and alerts, Recording Rules can be a great help. You can create a new metric that represents what you need.
Historical Data
If your use case needs you to generate historical metrics out of existing metrics, Recording Rules are the right tool. You can generate a new metric that represents the average or max of a metric for the last hour, day or even week.
When Not to Use Prometheus Recording Rules
Like every tool, Recording Rules are not meant for every use case. Here are some instances when they might not be the best option:
Simple Queries
If the query is already simple, there is little reason to create a recording rule for it. Creating a recording rule for a simple query may even add some overhead and make things more difficult.
Ad-Hoc Analysis
For ad-hoc analysis, Recording Rules are not really needed. It is better to use the PromQL query tool directly in the Prometheus explorer for a one time query.
Highly Volatile Metrics
Recording rules are meant for longer time series metrics. If the metrics are very volatile, recording rules may become too expensive to calculate. And may impact the overall performance of your Prometheus instance.
Combining Recording Rules with Other Prometheus Features
Recording Rules are a good feature on its own. But when combined with other Prometheus features they become an indispensable part of your monitoring toolkit.
Alerting
Recording Rules can make complex alert logic simpler. By pre-computing metrics, your alerts can become faster to evaluate, and easier to understand.
Dashboards
Recording Rules reduce the complexity of dashboards, making them easier to read and manage. This greatly enhances the experience of users that work with your dashboards.
Service Discovery
Service discovery combined with Recording Rules, allows you to dynamically generate new metrics for new services. This dynamic setup will save you a lot of time.
Federation
Using federation with Recording Rules you can aggregate data from multiple Prometheus servers. This allows you to generate a global view of your systems.
Advanced Techniques for Prometheus Recording Rules
Here are some advanced techniques that you can use to take Recording Rules to the next level.
Dynamic Labels
You can generate new labels based on existing ones, using functions like label_replace
. This allows you to re-label existing metrics, which can help when you want to aggregate your data using specific labels.
Multi-Tenancy
You can use Recording Rules to generate metrics per tenant. This is useful for managed services and multi-tenant systems. Where you want to break down your metrics per tenant that is consuming the service.
Custom Functions
If you have a very specific use case that needs it, you can create custom functions to use in your Recording Rules. Though this can get very complex, so you should only do it when necessary.
Real-World Examples of Prometheus Recording Rules
Here are some more real world examples of Recording Rules in different environments and setups:
Web Application Monitoring
For a web application you can use Recording Rules to compute metrics like:
– Requests per second
– Error rates per endpoint
– Average request duration per endpoint
– Number of active users
Database Monitoring
For a database you can use Recording Rules to compute metrics like:
– Number of active connections
– Average query time
– Number of slow queries
– Cache hit rate
Infrastructure Monitoring
For your infrastructure you can compute metrics such as:
– CPU usage by host
– Memory usage by host
– Network throughput
– Disk utilization
Batch Job Monitoring
For batch jobs you can use Recording Rules to compute things like:
– Number of jobs run per hour, day and week
– Average job execution time
– Number of successful and failed jobs
Prometheus Recording Rules: A Summary
Prometheus Recording Rules are a vital part of managing and monitoring your systems effectively. By pre-computing your metrics, you can streamline dashboards, simplify alerting, and save precious resources on your Prometheus server. Use this guide to help you use recording rules and enhance your monitoring setup. They are well worth the effort and will reward you greatly.
Mastering Prometheus Queries: Is Recording Rules The Missing Piece?
Now that you’ve explored the world of Prometheus Recording Rules, you can see how they can change the way you monitor your systems. They allow you to extract insights from complex metrics and provide you with actionable data. If you’ve been struggling with complex queries and slow dashboards, it’s clear that mastering Recording Rules can be the missing puzzle piece in taking your monitoring to the next level.