top of page

Databricks Cost Optimization

Oct 2, 2024

3 min read


Cost optimization, that ever-relevant and oh-so-delightful topic. No one likes paying extra, but in the whirlwind of development, it often gets overlooked. It's only when management or business controllers send angry emails about soaring expenses that we finally snap to attention. But how can we ensure that those angry emails are never sent when using Databricks?
Databricks cost saving possibilities

Simple Cost Optimization Checklist


Cost optimization is an art, but we can start with a fundamental checklist to get you going. Additionally, we'll delve deeper into monitoring and cost management.


  1. Always start with the smallest possible cluster & SQL warehouse. If you need more power, scale up gradually and methodically.

  2. Always use job clusters for workflows. The cost-optimized serverless option is currently in private preview for workflows, but it will be a highly interesting option once it becomes available. Keep in mind, interactive clusters are significantly more expensive than job clusters.

  3. Always validate cluster configurations - default settings might not be optimal for your needs. Pay particular attention to the auto-termination value; a setting of 20 minutes is typically suitable for a cluster, while 5-10 minutes is often ideal for a SQL warehouse. Additionally, ensure that enhanced autoscaling is enabled.

  4. By implementing cluster policies, you can gain the necessary control over cluster creation.

  5. Optimize your code (easier said than done, but AI assistants can help).

  6. Serverless options (clusters, SQL warehouses & endpoints) are worth testing. The more you use serverless, the better Databricks can optimize the virtual machine pool for your needs.

  7. Monitor actively and react quickly


Databricks System Tables - A Gift from Heaven


System tables are a Databricks-hosted analytical store of your account’s operational data found in the system catalog. They offer historical observability across your account, enabling you to analyze and monitor past activities and performance. Naturally, costs are also logged there, and the best part is that they are recorded with the highest level of detail possible! System tables must first be activated, but this can be easily done with a few REST API calls. Once activated, they are available in the Unity Catalog under your designated catalog. The image below shows which system tables are currently live. Here, we will focus specifically on cost-related tables.


Available Databricks system tables
System Tables

You can find more detailed documentation here: Monitor usage with system tables - Azure Databricks | Microsoft Learn


Within the billing schema, you can find the `list_prices` and `usage` tables. The `list_prices` table provides current DBU costs at the component level in USD. The `usage` table details exactly how costs are generated. By combining these two tables, you can accurately determine the total costs of your operations. To make your life even easier, here is the ready-to-use SQL query for that:

        SELECT   
            u.workspace_id,   
            u.record_id,   
            u.sku_name,   
            u.billing_origin_product as sku_type,
            u.usage_date,   
            u.usage_start_time,   
            u.usage_quantity,   
            u.identity_metadata.run_as as run_as,
            u.usage_metadata.cluster_id,
            u.usage_metadata.job_id,
            u.usage_metadata.warehouse_id,
            u.usage_metadata.instance_pool_id,
            u.usage_metadata.node_type,
            u.usage_metadata.job_run_id,
            u.usage_metadata.notebook_id,
            u.usage_metadata.dlt_pipeline_id,
            u.usage_metadata.endpoint_name,
            u.usage_metadata.endpoint_id,
            u.usage_metadata.dlt_update_id,
            u.usage_metadata.dlt_maintenance_id,
            u.usage_metadata.run_name,
            u.usage_metadata.job_name,
            u.usage_metadata.notebook_path,
            u.usage_metadata.central_clean_room_id,
            p.currency_code,   
            p.pricing.default as pricing,   
            u.usage_quantity * p.pricing.default as cost,   
            u.custom_tags as custom_tags   
        FROM   
            system.billing.usage u    
        JOIN   
            system.billing.list_prices p    
        ON   
            u.sku_name = p.sku_name   
            AND u.cloud = p.cloud   
            AND u.usage_unit = p.usage_unit    
        WHERE   
            u.usage_start_time >= p.price_start_time   
            AND u.usage_end_time < COALESCE(p.price_end_time, '2999-12-31')  

AI/BI dashboards enable close to real-time and advanced monitoring


The only limit here is your imagination. Databricks also provides prebuilt cost dashboards, which are accessible at the account level. Here is a glimpse of our AI/BI dashboards (formerly Lakeview dashboards) on costs using demo data. This dashboard has provided us with full transparency, enabling us to control costs, optimize component usage, and predict future expenses. It has become essential for effective cost optimization and management.

AI/BI Dashboards on Databricks Cost Optimization
AI/BI Dashboards on Databricks Cost Management

But why stop on monitoring?


This is where the fun begins. Why stop on monitoring when you can build an AI assistant to handle Databricks cost optimization and monitoring for you? At Ikidata, we've developed KRATTI, an AI assistant that manages this process for us. KRATTI has saved us a significant amount of money because, when testing new things on Databricks, it's easy to be unaware of how costs are generated, or you might accidentally leave something running. While dashboards are a great first step, AI assistants dedicated to cost optimization represent the future. See below how KRATTI automatically detects the cause of cost spikes and provides solutions for fixing them:






Aarni Sillanpää

Written by Aarni Sillanpää

Hopefully, the money saved will go towards the holiday party budget.


Follow Ikidata on LinkedIn

More information about KRATTI

From Words to Action





Commenting has been turned off.
bottom of page