Simplifying GenAI Architecture with Databricks Mosaic AI Gateway

Sep 29, 2024

5 min read

By GenAI, Large Language Models (LLM) are specifically referred to.

GenAI has burst onto the scene and companies are racing to adopt it in hopes of gaining a competitive edge. We're past the initial test projects, and the next big step is making it part of the ecosystem. This is where solid and robust architecture becomes crucial. After all, you wouldn't build a house on quicksand, right?

Traditional architectural problems often include unnecessary complexity, siloed systems and poor integration between components. GenAI is no exception. It must be seamlessly integrated into the core data ecosystem. All extraneous elements should be cleaned up to maintain control from the perspectives of maintenance, costs, access right management, and security. Keeping it streamlined ensures that everything runs like a well-oiled machine. Therefore, the entire setup should be made as simple and clear as possible—ideally, all within a single platform.

One ecosystem to rule them all

I can confidently say that Databricks is the perfect choice for this role. It has evolved from a data platform into a comprehensive full-scale data intelligence platform. Chances are, your data platform is already built on Databricks (and if it isn't, you might want to do something about it.). Let's break down how it meets basic requirements, one by one.

Native access to data - Well, this was an easy task to tackle. In addition, with built-in Vector Search specifically designed for RAG needs, you can easily automate data updates. This makes the development process incredibly fast and effortless.
Cost and usage transparency - Databricks provides all needed metadata of costs and usage via system tables. In 15min the newest data is available to be used for monitoring or other purposes.
Agile Development - You can easily build modularized solutions (I prefer using Python). With Mosaic AI Model serving, it's a breeze to switch between LLM models, making model comparison actually enjoyable. And let's not forget MLflow, which is natively integrated (well, MLflow is developed by Databricks).
Cost-effectiveness - Costs are generated solely based on usage, and you can easily track their accumulation through system tables. You have options like price-per-token or self-hosting with fixed hourly rates. The sizes of LLM models can also be optimized according to use cases, providing excellent cost optimization opportunities for users. It's like having a buffet where you only pay for what you eat, and you get to choose between a fancy restaurant or a cozy home-cooked meal!
Security & IAM - Instead of using an API key, Mosaic AI offers user-level permissions. This makes it easy to monitor how users are interacting with LLM models and eliminates the risk of an API key leaking to outsiders. It's like having personalized VIP passes for each user, ensuring that only the right people get into the club.
Scalability - Far too often, scalability is forgotten and then it is addressed too late. With Databricks, costs scale with usage, and the scalability is top-notch. You can set up own dedicated model serving endpoints for your GenAI use cases and keep access controls absolutely mint.

Databricks Mosaic AI Gateway

How can you keep everything under control and easily manageable without any nasty surprises? Enter Databricks Mosaic AI Gateway—the ultimate solution for smooth sailing. Mosaic AI Gateway is a centralized service designed to streamline the usage and management of generative AI models within an organization. It enhances governance, monitoring, and production readiness for model serving endpoints, while also securing and managing AI traffic to democratize and accelerate AI adoption. By simplifying processes and adding essential security and governance layers, Mosaic AI Gateway ensures a smooth and secure AI integration experience.

Databricks GenAI architecture built on top of Mosaic AI Gateway — Simple is beautiful

Handling Control with Mosaic AI Gateway

Mosaic AI Gateway can be configured on a model endpoint level. It offers you:

A ability to monitor AI traffic almost in real-time. Inference tables log delivery is currently on a best-effort basis, with documentation indicating that logs should be available within one hour of a request. During my experimentation, I observed a delay of approximately 15 minutes, which I found to be reasonable.
Rate limits! This is fantastic because you can set them at both personal and endpoint levels! No more stressing about endless loops or skyrocketing costs from unexpected heavy usage. Now you can finally kick back and relax! 😎
Guardrails to secure traffic to any model API, enforcing safety policies and protecting sensitive information in real-time. Both input and output guardrails are included which are working great! Unfortunately the built-in PII detection is optimized for U.S. use cases only, making it currently quite useless here in Finland.
Inference tables - this would require it's own article. I can't emphasize enough how significant this addition is. Once activated, inference tables allow you to log all usage, easily monitor it through automation, and effortlessly track the performance of LLM models. This opens up substantial opportunities, but more on that later ;)

Metadata is base foundation in monitoring

One of the best things with Databricks is transparency and metadata! All the information is available and now it's easy to create AI/BI dashboards (formerly Lakeview dashboards) for monitoring purposes. And a cool thing which I found was that Genie (Databricks' AI assistant) was finally natively integrated to published dashboards 🎉Here you can see quickly created dashboard which monitors external LLM model usage in real-time.

Databricks AI/BI Dashboard on LLM usage and cost behaviour monitoring — Monitoring AI/BI Dashboard

And We Still Haven't Discussed Automation!

In true Databricks fashion, you have multiple options for using the Mosaic AI Gateway. You can either manage it through the UI or take the programmatic route to build fully automated solutions. The latter is incredibly effortless and highly recommended, allowing you to centrally control best practices and updates — such as the essential guardrails that need regular updates. This ensures that best practices are consistently followed without any lapses.

Final Thoughts

Now that we've covered the basics, is the missing piece simplifying GenAI Architecture with Databricks Mosaic AI Gateway? For Databricks, it certainly fills the gap effectively. Thanks to its support for external models, integrating third-party LLM models is a breeze.

No more distributing risky API keys for LLM models; you can now authorize access at an individual level to external models. This boosts security to the required level. Guardrails complement this by preventing users from misusing or using LLM models in undesirable ways, enhancing their reliability and functionality.

Cost management is easy with rate limits, and you can monitor expenses in real-time using system tables on the Databricks side. Calculating the costs of external LLM models is also straightforward based on tokens used. Inference tables are offered as an out-of-the-box solution which is a real game-changer. When combined with system tables, you can easily monitor usage at an individual level, costs, and the quality of LLM models—in short, you can oversee the entire process.

The GenAI architecture built on Databricks is incredibly efficient and reliable. The single-ecosystem approach ensures both robustness and ease of use. Instead of a black box, you get full transparency throughout the process, and most importantly, the ability to automate the entire workflow. Although we're just at the starting line, the overall setup is already highly impressive!

Ikidata is a pioneer in GenAI Agent automation, providing deep insights into this emerging technology from technical, architectural, and business perspectives. We make it simple to bring your ideas to life.