Advanced Polars: Lazy Queries and Streaming Mode

Level: Advanced Company/Institute: xtream

Abstract

Do you find yourself struggling with Pandas' limitations when handling **massive datasets** or **real-time data streams**? Discover **Polars**, the lightning-fast DataFrame library built in Rust. This talk presents two advanced features of the next-generation dataframe library: **lazy queries** and **streaming mode**. Lazy evaluation in Polars allows you to build complex data pipelines without the performance bottlenecks of eager execution. By deferring computation, Polars optimises your queries using techniques like **predicate and projection pushdown**, reducing unnecessary computations and memory overhead. This leads to significant performance improvements, particularly with datasets larger than your system’s physical memory. Polars' **`LazyFrame`s** form the foundation of the library’s streaming mode, enabling **efficient streaming pipelines**, real-time transformations, and seamless integration with various data sinks. This session will explore use cases and technical implementations of both lazy queries and streaming mode. We’ll also include **live-coding demonstrations** to introduce the tool, showcase best practices, and highlight common pitfalls. Attendees will walk away with practical knowledge of lazy queries and streaming mode, ready to apply these tools in their daily work as data engineers or data scientists.

Prerequisites

Working experience and knowledge of Python. Good knowledge of at least one dataframe library (Pandas or Polars). Experience (and headaches) in data wrangling.

Description

Do you find yourself struggling with Pandas' limitations when handling massive datasets or real-time data streams?

Discover Polars, the lightning-fast DataFrame library built in Rust. This talk presents two advanced features of this next-generation dataframe library: lazy queries and streaming mode.

Lazy evaluation in Polars allows you to build complex data pipelines without the performance bottlenecks of eager execution. By deferring computation, Polars optimises your queries using techniques like predicate and projection pushdown, reducing unnecessary computations and memory overhead. This leads to significant performance improvements, particularly with datasets larger than your system’s physical memory. For instance, if you need to filter a large CSV file based on certain criteria, Polars can push down the filter operation to the scan level, reading only the necessary rows from the file. This can drastically reduce the amount of data loaded into memory and speed up query execution.

Polars' LazyFrames form the foundation of the library’s streaming mode, enabling efficient streaming pipelines, real-time transformations, and seamless integration with various data sinks. This means you can process data in chunks as it arrives, without having to load the entire dataset into memory. Imagine building a real-time analytics dashboard that processes data from a Kafka stream, applying aggregations and transformations on the fly, and updating the dashboard with the latest insights. Polars' streaming mode makes this possible with minimal effort.

This session will explore use cases and technical implementations of both lazy queries and streaming mode. We’ll also include live-coding demonstrations to introduce the tool, showcase best practices, and highlight common pitfalls.

Attendees will walk away with practical knowledge of lazy queries and streaming mode, ready to apply these tools in their daily work as data engineers or data scientists.

Talk Outline

Introduction and motivation for for lazy queries and streaming mode (5 mins)
Lazy queries: discussion, implementation and live coding (10 mins)
Streaming mode: implementation, main settings, most common issues, and live coding (10 mins)
Q&A (5 mins)

Speaker

Emanuele Fabbiani

Head of AI

Emanuele is an engineer, researcher, and entrepreneur with a passion for artificial intelligence. He earned his PhD by exploring time series forecasting in the energy sector and spent time as a guest researcher at EPFL in Lausanne. Today, he is co-founder and Head of AI at xtream, a boutique company that applies cutting-edge technology to solve complex business challenges. Emanuele is also a contract professor in AI at the Catholic University of Milan. He has published eight papers in international journals and contributed to over 30 international conferences worldwide. His engagements include AMLD Lausanne, ODSC London, WeAreDevelopers Berlin, PyData Berlin, PyData Paris, PyCon Florence, the Swiss Python Summit in Zurich, and Codemotion Milan. Emanuele has been a guest lecturer at Italian, Swiss, and Polish universities.

View Full Conference Program