Cost optimization, that ever-relevant and oh-so-delightful topic. No one likes paying extra, but in the whirlwind of development, it often gets overlooked. It's only when management or business controllers send angry emails about soaring expenses that we finally snap to attention. But how can we ensure that those angry emails are never sent when using Databricks?
Simple Cost Optimization Checklist
Cost optimization is an art, but we can start with a fundamental checklist to get you going. Additionally, we'll delve deeper into monitoring and cost management.
Always start with the smallest possible cluster & SQL warehouse. If you need more power, scale up gradually and methodically.
Always use job clusters for workflows. The cost-optimized serverless option is currently in private preview for workflows, but it will be a highly interesting option once it becomes available. Keep in mind, interactive clusters are significantly more expensive than job clusters.
Always validate cluster configurations - default settings might not be optimal for your needs. Pay particular attention to the auto-termination value; a setting of 20 minutes is typically suitable for a cluster, while 5-10 minutes is often ideal for a SQL warehouse. Additionally, ensure that enhanced autoscaling is enabled.
By implementing cluster policies, you can gain the necessary control over cluster creation.
Optimize your code (easier said than done, but AI assistants can help).
Serverless options (clusters, SQL warehouses & endpoints) are worth testing. The more you use serverless, the better Databricks can optimize the virtual machine pool for your needs.
Monitor actively and react quickly
Offical page of Databricks pricing | Databricks
Databricks System Tables - A Gift from Heaven
System tables are a Databricks-hosted analytical store of your account’s operational data found in the system catalog. They offer historical observability across your account, enabling you to analyze and monitor past activities and performance. Naturally, costs are also logged there, and the best part is that they are recorded with the highest level of detail possible! System tables must first be activated, but this can be easily done with a few REST API calls. Once activated, they are available in the Unity Catalog under your designated catalog. The image below shows which system tables are currently live. Here, we will focus specifically on cost-related tables.
You can find more detailed documentation here: Monitor usage with system tables - Azure Databricks | Microsoft Learn
Within the billing schema, you can find the `list_prices` and `usage` tables. The `list_prices` table provides current DBU costs at the component level in USD. The `usage` table details exactly how costs are generated. By combining these two tables, you can accurately determine the total costs of your operations. To make your life even easier, here is the ready-to-use SQL query for that:
SELECT
u.workspace_id,
u.record_id,
u.sku_name,
u.billing_origin_product as sku_type,
u.usage_date,
u.usage_start_time,
u.usage_quantity,
u.identity_metadata.run_as as run_as,
u.usage_metadata.cluster_id,
u.usage_metadata.job_id,
u.usage_metadata.warehouse_id,
u.usage_metadata.instance_pool_id,
u.usage_metadata.node_type,
u.usage_metadata.job_run_id,
u.usage_metadata.notebook_id,
u.usage_metadata.dlt_pipeline_id,
u.usage_metadata.endpoint_name,
u.usage_metadata.endpoint_id,
u.usage_metadata.dlt_update_id,
u.usage_metadata.dlt_maintenance_id,
u.usage_metadata.run_name,
u.usage_metadata.job_name,
u.usage_metadata.notebook_path,
u.usage_metadata.central_clean_room_id,
p.currency_code,
p.pricing.default as pricing,
u.usage_quantity * p.pricing.default as cost,
u.custom_tags as custom_tags
FROM
system.billing.usage u
JOIN
system.billing.list_prices p
ON
u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
WHERE
u.usage_start_time >= p.price_start_time
AND u.usage_end_time < COALESCE(p.price_end_time, '2999-12-31')
AI/BI dashboards enable close to real-time and advanced monitoring
The only limit here is your imagination. Databricks also provides prebuilt cost dashboards, which are accessible at the account level. Here is a glimpse of our AI/BI dashboards (formerly Lakeview dashboards) on costs using demo data. This dashboard has provided us with full transparency, enabling us to control costs, optimize component usage, and predict future expenses. It has become essential for effective cost optimization and management.
But why stop on monitoring?
This is where the fun begins. Why stop on monitoring when you can build an AI assistant to handle Databricks cost optimization and monitoring for you? At Ikidata, we've developed KRATTI, an AI assistant that manages this process for us. KRATTI has saved us a significant amount of money because, when testing new things on Databricks, it's easy to be unaware of how costs are generated, or you might accidentally leave something running. While dashboards are a great first step, AI assistants dedicated to cost optimization represent the future. See below how KRATTI automatically detects the cause of cost spikes and provides solutions for fixing them:
Written by Aarni Sillanpää
Hopefully, the money saved will go towards the holiday party budget.