๐Ÿ”ฎ API observability

Apply OpenTelemetry benefits beyond distributed tracing to the single operation - the API - for better troubleshooting, issue analysis and discovery of interactions between your microservices.

Helios applies observability also to API monitoring by leveraging OpenTelemetry and existing services instrumentation to also provide actionable insights into how your APIs are used.

API observability in Helios includes:

  1. A dynamic API catalog that is generated based on real traffic instrumented in your application.
  2. An API dashboard for every single API and other interfaces between services; it includes recent operations (spans) as well as key metrics, errors and stats.
  3. An auto-generated API spec for all HTTP interactions, built based on the actual API calls made when running the application.

Dynamic API catalog

Once your application is fully instrumented in Helios - you can find a dynamic catalog of your APIs across all services. This catalog may include documented and undocumented APIs, for internal use or customer-facing.

Auto-generated API catalog in Helios based on instrumented traffic as seen in the [Helios Sandbox](https://sandbox.gethelios.dev/)

Auto-generated API catalog in Helios based on instrumented traffic as seen in the Helios Sandbox

Each service really has three types of operations listed: APIs (external to the service), internal operations (internal to the service), and then 3rd party APIs. Each interaction discovered by the instrumentation is classified accordingly and displayed here.

It's easy to use the 'API errors' toggle to identify all operations where an error has occurred, if further investigation is needed.

API dashboard

Each API has its own dashboard that displays, in addition to the full context in the distributed traces under 'Newest traces', also the newest spans (runs of the API) and a few key widgets:

  1. Newest spans shows the newest spans and highlights span duration and error
  2. Span duration distribution aggregates all spans under the current filter conditions into different duration buckets so that it's easy to identify the norm and the outliers
  3. HTTP response status code over time + distribution (as shows below) is available for HTTP calls; for other types of operations there is an Errors over time trend.
Each API has its own dashboard, displaying recent (instrumented) spans and key stats as well as easy access to errors

Each API has its own dashboard, displaying recent (instrumented) spans and key stats as well as easy access to errors

Auto-generated API specification

Given the thousands of calls going through all microservices in a distributed app for any API, it's possible to deduce what is the API spec - defacto - as seen in the field. For each supported interaction (currently HTTP only), you can access the API overview from each discovered API.

The API overview is essentially the inferred API specification in the OpenAPI spec (previously known as the Swagger Specification). It also includes real examples and can be downloaded for further use.

๐Ÿ“˜

Learn more about how applying API observability based on instrumented data helps developers debug issues faster and boost productivity when building distributed applications.