Databricks Apps - Revolutionizing User-Friendly Internal Tools

Oct 23, 2024

4 min read

How to provide businesses with user-friendly tools to simplify their daily operations? This is a common challenge. Needs can change quickly and if development takes too long, the original requirement may no longer be relevant. Therefore, agile and quick development is necessary to meet needs while they are still relevant.

Databricks Apps - from simplicity to efficiency

Databricks Apps is designed to empower data and AI teams to easily build and deploy applications within the Databricks Data Intelligence Platform, emphasizing speed, security, and simplicity. Developers can use popular frameworks like Dash, Gradio, and Streamlit, and benefit from serverless deployment, built-in governance, and pre-built Python templates. The platform supports seamless integration with IDEs such as Visual Studio Code and PyCharm, and provides robust security measures, including granular access controls, secure communication, and comprehensive data governance through Unity Catalog. This approach eliminates the need for additional infrastructure, allowing developers to focus on delivering impactful, production-ready solutions across various use cases, from natural language interfaces to AI-powered chatbots, all while ensuring data security and compliance.

Databricks Apps - internal tools which are easy to find and use

Databricks Apps can be easily found and created using the UI on the Compute page under the Apps section, as shown in the screenshot. These apps can also be conveniently managed via the REST API, enabling agile development and seamless integration into a robust CI/CD process for production. It's possible to develop code locally on your own machine, save it in Git repositories, and then deploy it to production running in Databricks. This provides the freedom to develop in a flexible manner—for example, you can build an initial implementation and visualizations on your local machine and then test the functionalities with real data through Databricks' UI. This approach allows for rapid iteration and validation, ensuring an efficient and effective development process.

Why is this such a big deal?

Previously, Databricks did not offer easy and user-friendly apps for non-technical users. While notebooks and workflows could be made interactive and parameterized, integrating them into business processes was often questionable. The user experience was never at the required level, even though the functionalities were there. As a result, adopting new tools and ways of working was sometimes challenging. Now, you can easily create secure interfaces with your own visualizations and color schemes. Naturally, the company's color themes are a crucial factor in adopting new tools 😉

On top of that, management is streamlined and native access to data is available via Unity Catalog. As part of the Databricks data intelligence platform, it simplifies the architecture ensuring data doesn't leak accidentally locally and enables the logging of user behavior. Instead of focusing on integrations between different systems, development efforts can be directed toward building applications, such as Databricks Apps and internal tools, that serve the business by offering value and tools precisely when and where they are needed. This allows for the creation of user interfaces tailored to various business needs, making it easy to develop versatile tools for different purposes.

From words to action - Workflow Revival Engine

As everyone knows, failed data pipelines can be a significant source of frustration and gray hairs. To address this problem, I quickly developed the Workflow Revival Engine using Databricks Apps and Gradio. Here's how it works:

𝐂𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐚𝐭𝐢𝐨𝐧: The user can select the number of days to look back for failed workflows and choose the orchestration type to use, with support for both Databricks and Azure Data Factory orchestrations.
𝐅𝐞𝐭𝐜𝐡𝐢𝐧𝐠 𝐃𝐚𝐭𝐚: The app then retrieves all failed workflows based on these parameters. This list of workflows can be used as a task list for further actions.
𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐚𝐧𝐝 𝐑𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐚𝐭𝐢𝐨𝐧: For each failed workflow, all necessary metadata is fetched (including affected delta tables). GenAI will automatically analyze the situation and recommend whether a repair run can be executed or if manual intervention is required, providing an explanation for its recommendation.
𝐀𝐜𝐭𝐢𝐨𝐧: The user can choose to either ignore the failed workflow, activate a repair run or start a manual fixing process with a single click. The manual fixing process includes automated messaging to Teams & Slack and the creation of a DevOps bug ticket.

This streamlined approach makes managing failed data pipelines more efficient and less stressful, allowing users to quickly address issues and maintain smooth data operations. With these convenient internal tools, the responsibility for maintaining processes can be delegated to the appropriate personnel, preventing developers from becoming bottlenecks.

I've also created a version where GenAI runs this process completely autonomously, but sometimes it's better to have a human in the loop for validation.

What does this enable?

In a nutshell, this facilitates a more efficient integration of the data world with business operations. Data platforms can more easily be harnessed to build automation and internal tools that directly support business activities, potentially allowing us to finally move away from the last remnants of Excel. Of course, this requires ongoing guidance and training for end-users, but with Databricks now providing an excellent technological foundation, it's all about implementation and execution from here on.