Tutorial
Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”
Basic SQL and Python skills
The goal of this tutorial is to help Python users understand and use DuckDB not just as a DataFrame interface, but as a fully featured analytics database embedded in their Python workflows. We'll highlight real-world patterns where DuckDB shines compared to traditional libraries, especially for medium-scale datasets that don’t justify a full data warehouse.
You’ll learn:
- When and why to reach for DuckDB instead of Pandas/Polars
- How DuckDB handles local files (CSV, Parquet, JSON, Postgres database, and more)
- Using DuckDB to build lightweight, SQL-based data pipelines
- Techniques for caching intermediate data in-process
- How to analyze data from remote sources via HTTP or S3
- Tips for using DuckDB with Jupyter, dbt, or your favorite Python tools
Data engineer
I'm Mehdi, also known as mehdio, a data enthusiast with nearly a decade of experience in data engineering for companies of all sizes. I'm not your average data guy—I inject humor and fun into my work to make complex topics easier to digest. When I'm not actively contributing to the data community through my blog, YouTube, and social media, you can find me off-beat, marching to the beat of my own data drum. Recently, I joined Motherduck as a developer advocate, where I bring my data engineering expertise to supercharge DuckDB.