Skip to content

vragovvolo/GenAI_Upperbound

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenAI UpperBound Workshop

Purpose of the Repo

GenAI UpperBound is a hands-on workshop designed to showcase how Databricks can be used to build Generative AI applications for business analytics. It uses a fictional bakery dataset - the UpperBound Bakehouse - to demonstrate three key capabilities on Databricks:

  • Natural language querying of data with Databricks AI/BI Genie,
  • Using AI Functions in SQL for tasks like text analysis and generation, and
  • Building an AI Agent (powered by a large language model) that can use tools to interact with data.

The workshop is aimed at first-time users (e.g. data analysts or data scientists new to generative AI) and workshop organizers who want an end-to-end example. By following this material, users will learn how to ask questions in plain English and have Genie translate them to SQL queries on the data, how to embed generative AI directly in SQL workflows, and how to create an intelligent agent that can augment its responses with actions like database lookups via tools. The end goal is to lower the barrier to data insights, showing how non-technical users can leverage AI on Databricks for faster, smarter business analytics.

Workshop Flow Overview

The workshop is organized into three main parts, each building on the previous:

  • Part 1: Natural Language SQL with Genie – Interact with the UpperBound Bakehouse data using Databricks’ AI/BI Genie (a natural language query interface).
  • Part 2: AI Functions Notebook – Dive into a notebook (02_ai_functions.ipynb) that demonstrates using built-in AI functions in Databricks SQL for generative tasks on the data.
  • Part 3: Agent Playground – Use the code in the agents_workshop directory to create and run an AI agent that can answer questions by calling tools (e.g. running SQL queries) with an LLM (Meta’s Llama 70B Instruct model) as its reasoning engine. Each part is explained in detail below, including step-by-step usage and key concepts covered.

Part 1: Natural Language Queries with Databricks Genie

In the first part of the workshop, participants explore the UpperBound Bakehouse dataset through natural language questions using Databricks Genie. Genie is a conversational BI tool that allows users to interact with data as if they were chatting with an analyst. It automatically converts your question into SQL, runs it, and returns the results and query for transparency.

Step-by-Step (Genie NLQ on Databricks Express):

  • Setup a SQL Warehouse: Ensure you have a SQL Warehouse running in your Databricks workspace (Databricks Express users can create a free Serverless Starter warehouse). Genie needs an active SQL warehouse to execute queries. We recommend a serverless warehouse for quick startup and auto-scaling performance.
  • Load the Data: Make sure the UpperBound Bakehouse dataset is available as a table in your workspace. It comes pre-installed on Databricks Express machine.
  • Launch AI/BI Genie: In the Databricks UI, navigate to the SQL workspace and open the Genie chat interface. If a Genie space for the bakehouse data is not already configured, you can create one by selecting the UpperBound Bakehouse table(s) as the data source for your Genie chat space.
  • Ask Questions in Natural Language: Begin typing questions about the data. For example, “What were the total sales last week?” or “List the top 5 best-selling products by revenue.” Genie will interpret each question and translate it into an SQL query against the Bakehouse dataset. You will see the generated SQL and the results table.
  • Review and Refine: Examine Genie's results. You can ask follow-up questions or refine your query in conversation. Genie maintains context, so you could ask, “break that down by product category” right after a previous question, and it will adjust the SQL accordingly. If Genie is unsure or the question is ambiguous, it may ask clarifying questions. This step highlights how Genie uses your table metadata (columns, descriptions, etc.) to improve accuracy.
  • Insights and Discussion: This part of the workshop is interactive – try various queries to uncover insights in the Bakehouse data (e.g., peak sales days, monthly revenue trends, etc.). Participants see how Genie empowers non-SQL users to get answers from data simply by using natural language, with the platform handling the SQL generation and execution behind the scenes.

By the end of Part 1, users will understand the purpose of Genie and experience how natural language querying works in practice. They’ll also appreciate the importance of having well-defined data (with clear table names, column descriptions, and an active warehouse) for Genie to be effective.

Part 2: AI Functions Notebook (02_ai_functions.ipynb)

The second part of the workshop moves into a Databricks notebook environment to showcase AI Functions – built-in SQL functions that invoke generative AI models on your data. In the repository, the notebook 02_ai_functions.ipynb provides a guided walkthrough of several examples using these functions on the UpperBound Bakehouse dataset. Key topics and steps in this notebook include:

  • Introduction to AI Functions: The notebook starts by explaining what Databricks AI Functions are. In short, they are pre-packaged functions that let you perform tasks like text summarization, sentiment analysis, translation, and more, directly within SQL queries. This means you can apply AI to data in a familiar SQL workflow without needing external APIs or complex ML infrastructure.
  • Summarization and Generation (ai_gen / ai_summarize): One example in the notebook shows how to use ai_summarize or the general ai_gen function to generate a summary of text. If the Bakehouse dataset contains textual data (for instance, product descriptions or customer reviews), you can do something like: SELECT ai_summarize(review_text) AS summary FROM bakehouse_reviews;. This will call a state-of-the-art generative model to produce a concise summary of each review. The model is hosted by Databricks, so you don’t need to deploy anything – the function call routes the request to a powerful LLM in the background.
  • Sentiment Analysis (ai_analyze_sentiment): The notebook demonstrates using ai_analyze_sentiment on text data. For example, if you have customer feedback comments, a simple SQL query can classify each comment’s sentiment as positive, negative, or neutral using an AI model. This illustrates how quickly you can enrich your data with insights using one line of SQL.
  • Custom Q&A with ai_query: The ai_query function is a general-purpose AI function that lets you apply a chosen model to a custom prompt and data. In the workshop, you’ll see how ai_query can be used to answer a question by combining natural language prompting with a specific model. For instance, the notebook might show a query like: SELECT ai_query('databricks-meta-llama-3-3-70b-instruct', CONCAT('What was the total revenue in Q4 2024? ', CAST(sum(sales) as string)), ...) to illustrate how an LLM can be prompted within SQL to compute or explain results. This is a powerful concept: it blends the line between SQL and natural language by letting you invoke the LLM directly from your query.
  • AI Function Performance and Usage Tips: Finally, the notebook provides notes on using AI Functions effectively. Since these functions run on managed models, they can handle batch processing (for example, summarizing thousands of rows) in a scalable way. The workshop may discuss how to adjust parameters like temperature or max_tokens in the function calls, and mention that using these functions requires an appropriate Databricks Runtime version (DBR 15.4 LTS or above) and Unity Catalog security (since the models are governed and accessed through the platform).

By running through 02_ai_functions.ipynb, participants get hands-on experience with embedding generative AI into their data pipelines. They see the immediacy of results – for example, applying a large 70B parameter model to summarize text or answer questions in a SQL query – without needing any model deployment of their own. This part of the workshop underscores how Databricks makes advanced AI accessible with simple functions.

Part 3: Agent Playground (Building an AI Agent with Tools)

The final part of the workshop is the Agent Playground, found in the agents_workshop directory of the repository. Here, participants will build and interact with a custom AI agent that can use external tools to answer user queries. This agent is powered by a large language model (the Meta Llama 3.3 70B Instruct model) and is enhanced with tools that allow it to perform actions like querying the database. This section ties together what was learned in Parts 1 and 2, demonstrating how an LLM can plan and execute complex tasks on Databricks. Key components and steps in the Agent Playground include:

  • Agent & LLM Setup: The workshop will show how to initialize an AI agent using an open-source foundation model. We use Databricks’ hosted Llama 70B Instruct model as the brain of the agent. This model is large and powerful, enabling the agent to understand complex questions and instructions. Databricks provides it as databricks-meta-llama-3-3-70b-instruct which we can call without manual deployment (similar to how AI Functions call it). The code will illustrate connecting to this model, either through the LangChain library or via the Databricks API, so the agent can generate and reason with high-quality responses.
  • Tool Creation: A highlight of this part is learning to create tools that the agent can use. In our case, a primary tool is likely a SQL query tool linked to the data provided in the repo. For example, using a framework like LangChain, we define a tool that when invoked will execute a SQL query (or a Databricks notebook command) on the dataset and return the results. The agent can use this tool to fetch actual data instead of relying solely on its trained knowledge. We might also define other simple tools, such as a calculator or a web search (if appropriate), but the core idea is to show at least one custom tool integration. By doing this, participants see how agents can extend beyond just text generation – they can retrieve real data or perform computations via tools.
  • Agent Interaction (Playground): Once the agent is set up, the workshop allows participants to interact with it in a conversational manner. Users can ask the agent questions such as, “SHould the latest order be returned?”. The agent will decide how to answer – it might use the SQL tool to query the dataset. This showcases the power of agents with tools, where the LLM can dynamically fetch information and not just rely on static memory.
  • Experimentation: In the Playground, attendees are encouraged to ask various questions and even intentionally challenge the agent. For instance, asking a question that requires multiple steps will cause the agent to potentially use the tool, get data, and maybe use the tool again or do calculations, before responding. This lets users observe how the agent “thinks” (often the agent’s reasoning steps can be displayed) and how it handles different scenarios. The workshop might demonstrate how to adjust the agent if needed – for example, adding another tool or refining the prompt if the agent makes mistakes.

By the end of Part 3, participants will have built a functional AI agent that leverages both an LLM and the ability to act on data. This ties back to the earlier parts: the agent essentially combines natural language understanding (like Genie does) with actual data querying (like we did manually and with AI Functions). This illustrates the future of intelligent data assistants: they can converse naturally, but also reach out to databases, documents, or other APIs to get facts and figures before giving answers. For workshop organizers, this part is often the most engaging, as it feels like creating a simple “AI analyst bot”.

About

GenAI Upperbound Workshop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •