Building a vibe code setup in Databricks during a train trip — can it be done?

Apr 21

7 min read

Vibe coding is an AI-dependent programming technique where a person describes a problem in a few sentences as a prompt to a large language model (LLM) tuned for coding.

I was on my way to Oulu by train, enjoying the Easter holiday. Of course, the restaurant carriage was full and I forgot to bring a good book (accidentally grabbed a bad one I received as a Christmas gift, oh noes). As the old saying goes, never waste time on a bad book. After the fun I had with the previous agentic challenge a few months ago, I couldn’t think of a better way to spend my time than doing a second round. So instead of marveling at Finnish nature’s beauty, it was time for some action!

With everyone hyping up “vibe coding” these days, what better challenge than building my own setup on Databricks? While Databricks Assistant already provides fantastic support as your personal "Jarvis", I wanted to try something new. My goal was to create a simple "vibe coding setup" that runs on Databricks. To keep things straightforward, this setup involves an agent operated within a notebook. The agent should be capable of independently creating and modifying entire data projects, be easily fine-tuned, taught new tricks and most importantly, handle coding tasks for me. Here’s the list of functionalities I wanted the agent to have:

Create/Delete files
Generate close-to-production-grade code independently
Update existing files based on my requests
Run produced notebooks autonomously to test the code
Create a new feature branch from the latest commit and automatically generate a pull request to the main branch on DevOps

I guess it could be described as "Text-to-DataEngineerProcess", which helps to liberate data engineers from basic tech tasks, allowing them to focus on solving business challenges. In case you weren't aware, Databricks offers an excellent foundation for developing GenAI agents. Keeping solutions close to the data makes development truly enjoyable. And just like last time, it's time to add a little extra spice to this challenge:

Development must be done during a train trip before reaching Oulu (5.5 hours)
Everyone in Finland knows how unreliable train internet can be, even today...
And since the restaurant carriage was full, no coffee at all😲

Here is the story of how vibe coding setup was built on Databricks

At this point, I played the Mission Impossible theme, opened my laptop and got to work. I almost reached the end of the song before the internet cut out for the first time. Hopefully, they won't face the same kind of issues in the upcoming final movie. But as you know, the first step is always to have a battle plan. Due to tight deadline, the implementation had to be quick-to-deploy and scalable, since the list of features was long. Here you see the solution architecture, which we'll walk through step by step.

Solution architecture

Building the executor agent

I’ll keep this step short and sweet so you don’t fall asleep while reading. The implementation includes just one agent, boosted with a light chain-of-thought system prompt. This helps the agent mull over the given problem in the desired way, laying the groundwork for its autonomous work with a feedback loop. I also created the necessary tools, enabling the agent to access all the functionalities needed to execute the required actions. Now, the agent can analyze a request, call tools, interpret their responses (and thus improve its own answers) and try again and again until the goal is reached or the constraints trigger a stop. The short-term memory stores the session, allowing the agent to develop its responses independently. Setting up these loops is straightforward, and this feature makes the agent seem almost “magical” to outsiders. You can take this to incredible heights today, turning the agent into a work of art, but it’s always best to optimize according to the problem at hand.

I divided the necessary functionalities into four different tools, which allow the agent to create new files with code, delete files (sometimes you gotta clean up the chaos), run notebooks (to autonomously validate the code’s functionality), and make pull requests to DevOps by pushing files with code there. Also, I restricted the use of the agent strictly to the notebook level.

It all begins with managing files

Initially, it's helpful to break down the situation as concretely as possible. In this challenge, that means I want to automate coding aka manage files using an GenAI agent, nothing more complicated than that. Databricks offers excellent REST API endpoints with comprehensive documentation, making this challenge feasible. Firstly, I consolidated the agent’s "playground" into a single repository, which allowed me to easily retrieve all files from that path with a single REST call. Next, I exported each file in binary format using REST. After a quick string conversion, the information is passed in a simple prompt extension about which files exist and what they contain. Easy as pie. Nowadays, the context window is so large that I kept things straightforward and didn’t build a more advanced logic. For example, it would be possible to include file names in the prompt, and the agent can fetch the code in binary form using a tool as needed, saving a lot of prompt space.

Now that the agent constantly knows which files are available thanks to prompt enrichment, it can easily modify existing code files or create new ones with the desired code snippets. In practice, it triggers the create_code_file tool, which requires the code input and file name as parameters. That’s it. The same logic applies for deleting files. And if you want to modify or delete multiple files at once, it can trigger the same tool multiple times within the same action loop. Simple, yet highly efficient.

Testing, hmm I think I have heard about that word

Of course, I also wanted the agent to test its own code. This way, it couldn't just throw together random code and say like a used car salesman, "It works, it works, just trust me." There are a couple of tricks to achieve this, and I decided to go forward with Databricks notebooks. Therefore, I limited python code files to the notebook format, which allowed me to give the agent the ability to run them easily through a tool. If it fails, the agent receives an error message that can be used to troubleshoot and fix the problem in the code. But as everyone knows, AI never makes mistakes, so that part was probably a bit unnecessary.

At this point, I took another look at the dining car situation. Turns out the line had grown even longer. Great, still no coffee.

DevOps repo integration

Here, I need to start with a small confession. Lately, I've been involved in automating DevOps repositories, which meant there was already existing code in place rather than starting from scratch. Because of that, I decided to add this new functionality as well. It's much more enjoyable to let the agent create its own pull requests to the repo, saving me some effort in the process. If only the agent could also optimize coffee brewing and serving... Speaking of which, I came across this incredible robot buddy during my trip to South Korea. I wish we had something like that in Finland too.

Anyway, the agent is now capable of taking the latest commit from the main branch, creating a new feature branch from it, pushing the desired files along with their code, and submitting a pull request to the main branch, complete with comments. All of this is achieved through a few simple parameterized REST calls from the agent.

Monitoring & content visualization

By this stage, time was running out. Was it the case that visualization and colors are the most important and the actual content doesn’t matter as much? I guess I made a rookie mistake here and left this part for last. I streamlined the process by using MLflow Tracing autologging functionality from the beginning (which works really nicely, I highly recommend it) instead of manually coding intermediate steps. Regarding logging, I kept more detailed logs during development, and in the final stages, I only displayed the message content. As for response formatting and fine-tuning, well time time was running out.

Show time!🍿

Vibe coding in action

And now, it's time to show a short video demo how it looks in action. Unfortunately, I forgot to record the first demo run, which was even better. It included some errors that the agent was able to fix on its own. I was actually quite surprised at how much could be achieved with just a small amount of effort and tweaking. Of course, this is limited to notebook format in Python and largely relies on the AI model in use. In this case, GPT-4.1 mini from Azure. A/B testing of the models was left for another time, but it's easy to switch between using parametrized values. Eagerly awaiting the availability of Anthropic Claude 3.7 Sonnet in Azure Databricks in West Europe. Personally, I just happen to like (Azure) OpenAI models a lot (this is not a sponsored comment, but I won't say no to free credits).

At this point, I can say that the challenge has been successfully completed. The solution is not quite ready for a beauty pageant or conquering the world, but it has proven to work. Databricks Assistant is a great helper within the internal ecosystem, but you can't (yet) delegate entire data projects to it in the same way. But with your own custom agents, it’s possible to complement it and some of the features may be added to the Assistant in the future.

This is just a beginning

But what's next? I see the future in "Agentic Vibe Coding." In reality, you don't need a human constantly prompting the AI model in frustration. Instead, you set up one agent to create the solution and another to monitor and provide feedback. This way, you externalize the frustration — wait, I mean the feedback and troubleshooting process. If your data is already in Databricks, it makes perfect sense to start building your army of agents there, keeping the architecture simple and manageable. Of course, AI can't solve the most complex problems, but especially in the data world, much of the work is trivial and repetitive — tasks that can and should be automated with agents.

Ikidata is a pioneer in Agentic Automation on Databricks, delivering deep insights into this emerging technology from technical, architectural, and business perspectives. We make it simple to turn your ideas into reality.