Tutorial
This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this session, attendees will have created a powerful and extensible system they can adapt to their own data sources.
- Python programming experience - Basic understanding of API interactions - Basic familiarity with SQL and database management - Laptop with [Git](https://git-scm.com/) and [UV](https://github.com/astral-sh/uv), and your preferred Python IDE (I recommend VSCode) installed. No prior experience with LLMs, RAG, or advanced NLP is required.
Natural‑language interfaces unlock database insights for non‑technical users. This tutorial provides a practical implementation for building these systems reliably and effectively.
Participants will build an AI agent system that can:
We'll use the Kaggle dataset "Brazilian E-Commerce dataset by Olist" as our working example, demonstrating how to handle multiple tables across two schemas with complex relationships. This dataset will be hosted on an EC2 AWS instance for live interaction during the tutorial.
This tutorial addresses real-world database complexity with production-grade considerations. Participants will start from a repository with backbone code and implement the key components during the session. By the end, attendees will have a working system they can adapt to their own datasets.
This tutorial will leverage modern tools and frameworks for efficient development:
AI and Agent Frameworks:
- LangChain for agent components and LLM interactions
- LangGraph for agent orchestration and workflow management
- LangSmith for comprehensive cost tracking and monitoring
- OpenAI models with examples of alternatives
Database and Vector Store:
- SQLAlchemy for database interactions and schema retrieval
- PostgreSQL as the database engine for the live dataset
- PGVector for similarity-based query retrieval
Development:
- YAML for configuration management
- pyproject.toml
for standardized project configuration
- UV reliable package management and Ruff for code formatting/linting
Postdoctoral Data Scientist
I’m a data scientist and AI engineer with 10+ years of experience across academic research and industry, building GenAI and machine learning solutions for research labs, startups, and Fortune 500 companies. I’m also a passionate educator, contributing to data training programs as a professor and consultant, and an active open-source contributor and speaker at conferences like SciPy and PyData.