It’s a cold and wintry Sunday in Helsinki – the perfect moment for a coding session. I recently had a debate with a friend who claimed it’s impossible to automate the creation of an AI/BI dashboard in just one day. I beg to differ. Challenge accepted! It’s time to put that claim to the test and see just how much can be accomplished in a single day.
AI/BI dashboards have made a powerful entrance into the dashboard market, challenging tools like Power BI and similar solutions with their ease of use. Databricks offers an intuitive and user-friendly way to create a very basic AI/BI dashboard directly from the UI, while also allowing you to leverage AI as an assistant during the dashboard creation process. However, that’s not enough for me – in the spirit of automation, I want to automate the entire process.
The goal I set was to determine whether a GenAI agent could effortlessly automate the entire lifecycle — from validating dataset selection to creating a new AI/BI dashboard — while providing full flexibility in managing the process. In other words, could AI handle the data analysis, generate plans detailing which data is used and how, and independently create a complete dashboard based on that data?
To make the challenge a bit more fun, here were the rules:
Only one day to complete it. A hard limit.
Must operate entirely within the Databricks environment.
Only 3 cups of coffee allowed. No more.
The implementation must run on Azure OpenAI GPT-4o mini (cost optimization)
I’ll walk you through the day’s timeline in chronological order, along with some insights into the thought process behind it. If you’re in a hurry, here’s the latest iteration of the solution architect diagram so you don’t need to scroll all the way to the end.
Iteration 1 - The beginning
This is going to be an easy challenge...
I’ve worked with Databricks AI/BI Dashboards quite a lot already, so I assumed this would be a piece of cake. In the original plan, the architecture was much simpler, focusing largely on fine-tuning prompt engineering and automating the associated feedback loop. But let’s get to the point. To start, I checked Databricks' "API paradise" to figure out how a dashboard could be created programmatically and what parameters could be passed. This part was straightforward, but the problems began with the parameters. The payload example "{\"pages\":[{\"name\":\"b532570b\",\"displayName\":\"New Page\"}]}" left much to the imagination when it came to creating a complete dashboard. I tried to find more detailed source code or documentation, but with little success. And given it was a Sunday, it was unlikely that Databricks employees would respond to any questions, no matter how politely they were asked. Given the time constraints, I decided to create a visually comprehensive dashboard from which I could extract the corresponding code using export method. At this point, I hadn’t delved deeply into the actual dashboard code, assuming the AI would be able to interpret it out-of-the-box.
Next up was data crunching. For this PoC, I chose to work with one Delta table at a time. Since the LLM model doesn’t inherently understand the context of the data, I decided to enrich it by fetching both the table and column descriptions to aid in the analysis. After that, "adding a couple of prompt chains, and voilà – it would be ready!" The plan for iteration 1 was complete.
Iteration 2 - Wait a second
If only brute force were the answer to everything.
After quick tests on Iteration 1, it became clear there was room for improvement. Like in many organizations, not all my tables had table and column descriptions (yes, I know, shame on me). To resolve this, I added automated description generation using GenAI for any missing descriptions in the selected table.
After that, I shifted my focus to prompt engineering. "A couple of prompt attempts, and I’ll be able to move on" — oh, how wrong was I. I was using table and column information as input for LLM model to help it understand the context of the data. Analyzing the data was a critical requirement; otherwise, the charts would have been generated with entirely random data. It quickly became evident that this process needed to be split into two separate tasks to function properly. The first prompt was designed to analyze the data and plan the dashboard content (which charts to use, where to place them, what data to include, etc.). This prompt quickly grew into quite the monster, and hallucinations began to appear early on. However, after numerous iterations of trial and error, I managed to optimize the first prompt. Next, it was time to tackle the second prompt: "Convert the plan into AI/BI dashboard code for a REST API call — here’s the example code..."
Well, this is where the real challenges began. Decoding wasn’t exactly an out-of-the-box task. To make matters worse, the REST API calls didn’t provide any meaningful error messages, so brute-forcing code development through an automated feedback loop was out of the question. If I had access to the source code, I could have trained the model on top of that data or built RAG (retrieval-augmented generation) solution, but, you know, accessibility and time limitations. It seemed like prompting was my only viable option here. And oh, did I try everything. I was literally burning tokens at this point. 🔥 I experimented with multiple models (GPT-4o, o1, o1-mini, Llama 3.3, Claude 3.5 Sonnet…), meta-prompting (letting the LLM optimize the prompt for itself and using another LLM for the task while chaining the process) and automated feedback loops and ReACT-approaches. I even used my "Who Wants to Be a Millionaire" lifelines and consulted my brother's baby, but to no avail. Nothing seemed to work. Sometimes, it almost worked — but "almost" isn’t good enough for reliability. Time was running out and so were my ideas. It was clear I needed to try something completely different—so I hit the gym.
Iteration 3 – Enlightenment
Sometimes, it’s best to pause and let the idea find you.
While swear... sweating at the gym, it hit me like a bolt of lightning from a clear blue sky: I should start using my own brain instead of outsourcing all the thinking to AI. What a genius idea! Inspired by this revelation, I decided to change my approach slightly and dive deeper into the AI/BI dashboard code to better understand its logic. Based on that, I could build dynamic tools for the agent to use, eliminating the need to cross my fingers and hope for no random hallucinations with every run. Time was running out, so I had to start prioritizing ideas. It was time to do something inherently Finnish — head to the sauna in peace.
After a relaxing sauna session, I got to work on breaking down the dashboard code. It was neatly standardized, making it highly automatable. You know the saying: if it's standardized, it can be automated. Instead of focusing solely on prompt engineering, I built individual tools for each widget that the agent could use. This allowed the agent to construct the charts itself, ensuring functionality through parameterization. The process worked like a charm. Once all the building blocks for the dashboard were ready, the agent could assemble them and create the complete dashboard. I won’t include the same solution architect diagram again — this time, I have something better: a video. Here’s the latest version, where the GenAI agent independently handles the entire process from start to finish. FYI, I intentionally added sleepers to slow down the processing so you can take a moment to enjoy the text outputs as well.
Post-Game Analysis on Databricks AI/BI Dashboard Automation
The challenge was accepted. And won.
Although this was a very quick PoC completed in extremely limited time, I couldn’t help but get excited about the possibilities. Databricks AI/BI dashboard creation automation can be taken to great lengths with GenAI agents, freeing up resources to focus on utilizing reports rather than creating them! You can imagine the kind of ROI (return on investment) this represents, as a massive amount of time and resources is spent on report creation. By leveraging GenAI, even one-off reports can be created automatically — something that previously wasn’t feasible from a resource perspective. With that said, I hope this experiment gave you a bit of inspiration for exploring the fascinating world of Databricks. For me, this was a fun experiment, and I’m excited to see what the next challenge will be.
Written by Aarni Sillanpää
A challenge a day keeps the doctor away