Complete observability over Databricks Notebooks in Helios

Leverage Helios to see how flows propagate through the components of your application - including microservices and notebooks, how they are connected, and what is triggering and is triggered by the notebook.

Add Helios to your Databricks cluster

Install the Helios SDK as a library on your cluster.

The, edit the cluster configuration and add the following environment variables under Advanced options > Spark > Environment variables:

AUTOWRAPT_BOOTSTRAP=helios
HS_TOKEN=<API_TOKEN> # TODO: Replace value with API token from Helios.
HS_SERVICE_NAME=<SERVICE_NAME> # TODO: Replace value with service name.
HS_ENVIRONMENT=<ENVIRONMENT_NAME> # TODO: Replace value with service environment.
HS_DATABRICKS=True

E2E flows in Helios

Run a job on the cluster and see your Databricks data in Helios.
It is recommended to add custom spans to your Databricks notebooks for increased visibility in Helios.

Context propagation to Databricks jobs

In case you are using the Jobs API to run your Databricks Jobs (for example - this API endpoint ) - you can propagate the context of your application and all the flow will show as a single trace in Helios.
Code example for Python:

from opentelemetry.propagate import inject
from opentelemetry.context import get_current
import requests

token = '<<YOUR_DATABRICKS_TOKEN>>'
endpoint = 'https://<<YOUR_DATABRICKS_DOMAIN>>/api/2.1/jobs/run-now'
headers = {"Authorization": f"Bearer {token}"}
notebook_params = {}
current_context = get_current()
inject(notebook_params, context=current_context)
r = requests.post(endpoint, headers=headers, json={"job_id": <<JOB_ID>>, "notebook_params": notebook_params})

๐Ÿ’š

Databricks

Databricks combines data warehouses & data lakes into a lakehouse architecture.