Kafka OpenTelemetry instrumentation

Helios' instrumentation of Kafka enables developers to visualize, troubleshoot and test their applicative flows that include Kafka operations.

Kafka instrumentation is supported out of the box with the Helios OpenTelemetry SDK.

Below are a few examples of how you can leverage this E2E observability in Helios.

Tracing & visualization

When searching for the Kafka component you will see all services and traces that include handling Kafka messages. The traces are grouped by APIs (if a message is consumed by the service) and Outgoing operations (if a message is produced by the service).

Kafka auto-instrumentation leveraging OpenTelemetry

An example for Kafka auto-instrumentation in the Helios Sandbox

When using trace visualization, it's very easy to see the E2E flow of operations (spans) handled by different Kafka topics.

An example for an E2E trace visualization in the [Helios Sandbox](https://sandbox.gethelios.dev?actionTraceId=1ae65064a86c3e01e0801ca4308ddaf0&activeSpanId=ca0a0df25564fc71), highlighting the Kafka operations (spans) and all relevant info regarding the message

An example for an E2E trace visualization in the Helios Sandbox, highlighting the Kafka operations (spans) and all relevant info regarding the message

Observability over message queue latency

In addition to tracing E2E applicative flows, Helios provides you with the insight and tools to investigate slow processing times of messages across your distributed application.

SDK-level queue latency calculation

๐Ÿ“˜

Currently supported in the Helios OpenTelemetry SDK for Node.js version 1.0.84 or newer, and Python version 1.0.100 or newer.

Queue latency is calculated at the instrumentation level and thus can be handled as all other span properties are, through the labels, alerts and notifications mechanism in Helios.

Alert to label dashboard indicates queue latency

From notification to the label dashboard in the Helios Sandbox with complete E2E context of the flow with the long queue latency

Queue latency is also easily available on the trace visualization, for any span consuming a message from a queue (and specifically, Kafka).

The [E2E trace visualization](https://sandbox.gethelios.dev?actionTraceId=b8baa990765f265c86ad8c90a5c788aa&activeSpanId=90719a18e0d98946) in the Helios Sandbox, instrumenting also Kafka operations and displaying the message queue latency

The E2E trace visualization in the Helios Sandbox, instrumenting also Kafka operations and displaying the message queue latency

Also, each API that is essentially the service consuming a message - now includes the queue latency distribution widget as part of the API dashboard. Clicking on each bar takes you to the relevant traces with the corresponding queue latency time.

The queue latency distribution widget appears for each API involving consuming a message from a queue (and specifically, Kafka)

The queue latency distribution widget appears for each API involving consuming a message from a queue (and specifically, Kafka)

Server-side queue latency calculation

Finally, you can also use a dedicated view in Helios to analyze message queue times - the time that passes from writing the message to it being read - across various topics and identify bottlenecks or just unwanted delays. This view is available regardless of

Observability over Kafka message queue time between write and read leveraging distributed tracing

Messages across various Kafka topics in the Helios Sandbox are grouped in a way that allows instant visibility into queue time and a direct link to the relevant trace for E2E visualization

Flow replay

It's easy to replay flows triggered by Kafka directly from the traces in Helios.

Flow replay for opertaions triggered by Kafka

1-click to generate flow replay code

Test generation

Kafka operations can be included in end-to-end tests generated in Helios. Kafka spans in any trace can be configured and set as validation checkpoints when generating test code.

Generating an E2E test based on OpenTelemetry distributed tracing

E2E tests including Kafka operations configured as validation checkpoints and allowing control over the various messaging properties and payloads


๐Ÿ–ค

Helios is a a dev-first observability platform that helps dev and ops teams shorten the time to find and fix issues in distributed applications. Built on OpenTelemetry, Helios provides traces and correlates them with logs and metrics, enabling end-to-end app visibility and faster troubleshooting.

Get started for free with Helios to simplify and enrich OpenTelemetry distributed tracing.