Complete observability over Databricks Notebooks in Helios
Leverage Helios to see how flows propagate through the components of your application - including microservices and notebooks, how they are connected, and what is triggering and is triggered by the notebook.
Add Helios to your Databricks cluster
Install the Helios SDK as a library on your cluster.
The, edit the cluster configuration and add the following environment variables under Advanced options > Spark > Environment variables:
AUTOWRAPT_BOOTSTRAP=helios
HS_TOKEN=<API_TOKEN> # TODO: Replace value with API token from Helios.
HS_SERVICE_NAME=<SERVICE_NAME> # TODO: Replace value with service name.
HS_ENVIRONMENT=<ENVIRONMENT_NAME> # TODO: Replace value with service environment.
HS_DATABRICKS=True
E2E flows in Helios
Run a job on the cluster and see your Databricks data in Helios.
It is recommended to add custom spans to your Databricks notebooks for increased visibility in Helios.
Context propagation to Databricks jobs
In case you are using the Jobs API to run your Databricks Jobs (for example - this API endpoint ) - you can propagate the context of your application and all the flow will show as a single trace in Helios.
Code example for Python:
from opentelemetry.propagate import inject
from opentelemetry.context import get_current
import requests
token = '<<YOUR_DATABRICKS_TOKEN>>'
endpoint = 'https://<<YOUR_DATABRICKS_DOMAIN>>/api/2.1/jobs/run-now'
headers = {"Authorization": f"Bearer {token}"}
notebook_params = {}
current_context = get_current()
inject(notebook_params, context=current_context)
r = requests.post(endpoint, headers=headers, json={"job_id": <<JOB_ID>>, "notebook_params": notebook_params})
Databricks
Databricks combines data warehouses & data lakes into a lakehouse architecture.
Updated 6 months ago