๐Ÿ”” Labels, alerts & notifications

Get actionable insights based on all telemetry signals - traces, metrics, and logs - by setting up custom labels, alert rules and notifications from Helios.

Once telemetry signals are in Helios, it's possible to gain observability and set up monitoring by performing the following steps:

  1. Save a label based on interesting and insightful search query
  2. Configure alert rules to fine-tune the logic that defines the behavior that should activate an alert
  3. Set notification frequency and channels, as desired

Customized and actionable observability plays a pivotal role in today's fast-paced and data-driven distributed applications landscape. By tailoring monitoring solutions to the unique needs of an organization, engineering teams can gain invaluable insights into the performance, availability, and health of their applications. The collection of granular and relevant data points based on instrumented data enables proactive identification of bottlenecks, anomalies, and potential issues before they escalate.

Labels

A label is essentially a saved search query that a user finds interesting enough to save, name and track over time.

Each label has a dedicated overview, showing its info - including its alerts - as well as widgets to display matches and frequency.

To create a label you can either:

  1. Hit the 'label' icon on the right of the search box; or
  2. Add a new label directly from the list of labels

A label is created on a specific environment and shows the relevant data from it.

๐Ÿ’ก

An example for a label is searching for all 400 HTTP response codes that occur in the prod environment.

Alerts

Once a label is set, it's possible to configure different alert rules to define when a search query becomes interesting to track, from a business / engineering perspective.

An alert becomes active when the rules defining it are met. It remains active until the definition for the (sliding) time period is no longer met.

The evaluation whether alert rules apply is done on new data received from the moment the alert is saved.

Note that the the alert business logic does not define the notification itself; that is set in the third section in the label setting.

๐Ÿ“˜

Alerts on Lambda metrics

Alerts on AWS Lambda metrics are generated based on the AWS integration. Create a label for a specific Lambda, or any Lambda, and then set the desired alert conditions and aggregation logic.

๐Ÿ’ก

An example for an alert rule is when there are 3 or more instances in 15 minutes across all services in the prod where there's a 400 HTTP response code.

A label overview for all HTTP 400 errors in 'prod', including an alert

Alert use cases

  1. Monitoring PostgreSQL queries starting to take longer across the application: Set the threshold that matches across any service to track a system-wide trend.
    [block:image]
    {
    "images": [
    {
    "image": [
    "https://files.readme.io/4a81117-Monitoring_PostgreSQL_queries_starting_to_take_longer_across_the_application.png",
    null,
    "Monitoring PostgreSQL queries starting to take longer across the application"
    ],
    "align": "center",
    "sizing": "400px",
    "caption": "Monitoring PostgreSQL queries starting to take longer across the application"
    }
    ]
    }
    [/block]
  2. Monitoring when a specific Lambda is starting to fall behind: Set the threshold that matches on a specific service, to focus on a local (service-specific) pattern rather than a system-wide phenomena.
    [block:image]
    {
    "images": [
    {
    "image": [
    "https://files.readme.io/61ee709-Monitoring_when_a_specific_Lambda_is_starting_to_fall_behind.png",
    null,
    "Monitoring when a specific Lambda is starting to fall behind"
    ],
    "align": "center",
    "sizing": "400px",
    "caption": "Monitoring when a specific Lambda is starting to fall behind"
    }
    ]
    }
    [/block]
  3. Tracking the performance of a newly-launched API: Set the threshold that matches on a specific API you wish to keep track of and see how and what errors are encountered.
    [block:image]
    {
    "images": [
    {
    "image": [
    "https://files.readme.io/a495e01-Tracking_the_performance_of_a_newly-launched_API.png",
    null,
    "Tracking the performance of a newly-launched API"
    ],
    "align": "center",
    "sizing": "400px",
    "caption": "Tracking the performance of a newly-launched API"
    }
    ]
    }
    [/block]

Notifications

Each alert can also be reported via standard channels (such as Slack and PagerDuty), based on your settings.

You can choose to never get a notification (if you choose to track the alerts proactively in Helios), each time the alert becomes active, or also when the alert is resolved.

The notification includes a link to the specific label, and when applicable also an example for a matching trace for which the alert rules were met.

๐Ÿ’ก

An example for a notification setting is to get a message in the Slack channel #sandbox-alert-notifications each time the alert becomes active, and when it's resolved.

Notification on distributed tracing alert

Settings

At any point, it's possible to update the settings of either the label, alerts and/or notification rules. New logic is applied from that moment on.

Setting a label, including alert conditions and notification rules

Setting a label, including alert conditions and notification rules