Building an AI Agent for Natural Language to SQL Query Execution on Live Databases

Level: Intermediate Company/Institute: UW-Madison Room: {'en': 'B09'} Time: 2025-09-03T08:40:00+00:00

Abstract

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

Prerequisites

- Python programming experience - Basic understanding of API interactions - Basic familiarity with SQL and database management - Laptop with [Git](https://git-scm.com/) and [UV](https://github.com/astral-sh/uv), and your preferred Python IDE (I recommend VSCode) installed. No prior experience with LLMs, RAG, or advanced NLP is required.

Description

Overview

Natural‑language interfaces unlock database insights for non‑technical users. This tutorial provides a practical implementation for building these systems reliably and effectively.

Participants will build an AI agent system that can:

Route intelligently between SQL generation and ReAct chat agent workflows
Ingest and understand database schemas with domain knowledge
Retrieve relevant context and similar query examples using RAG with vector similarity
Generate accurate SQL with validation and safety guardrails
Execute queries safely with human-in-the-loop approval
Present results in an understandable format
Track costs and monitor performance using LangSmith
Manage session-based memory and conversation context

We'll use the Kaggle dataset "Brazilian E-Commerce dataset by Olist" as our working example, demonstrating how to handle multiple tables across two schemas with complex relationships. This dataset will be hosted on an EC2 AWS instance for live interaction during the tutorial.

This tutorial addresses real-world database complexity with production-grade considerations. Participants will start from a repository with backbone code and implement the key components during the session. By the end, attendees will have a working system they can adapt to their own datasets.

Tools and Frameworks

This tutorial will leverage modern tools and frameworks for efficient development:

AI and Agent Frameworks:
- LangChain for agent components and LLM interactions
- LangGraph for agent orchestration and workflow management
- LangSmith for comprehensive cost tracking and monitoring
- OpenAI models with examples of alternatives

Database and Vector Store:
- SQLAlchemy for database interactions and schema retrieval
- PostgreSQL as the database engine for the live dataset
- PGVector for similarity-based query retrieval

Development:
- YAML for configuration management
- pyproject.toml for standardized project configuration
- UV reliable package management and Ruff for code formatting/linting

Speaker

Cainã Max Couto da Silva

Postdoctoral Data Scientist

I’m a data scientist and AI engineer with 10+ years of experience across academic research and industry, building GenAI and machine learning solutions for research labs, startups, and Fortune 500 companies. I’m also a passionate educator, contributing to data training programs as a professor and consultant, and an active open-source contributor and speaker at conferences like SciPy and PyData.

View Full Conference Program