PyData Berlin 2025 - Sessions

Browse all talks and tutorials

🎫 Get Your Ticket
Community & Diversity Talk

Building a Thriving Tech Ecosystem: The Role of PyLadies in Fostering Growth and Inclusion

By: Gertrude Abagale
The global tech ecosystem continues to grow, yet challenges like limited mentorship, a lack of role models, and fragmented community support hinder progress, especially for underrepresented groups. Py...
Community & Diversity Talk

Not Just Code: Building Communities That Don’t Burn People Out

By: AISHAT MUIBUDEEN (Maya)
Open source runs on passion, but passion is not a renewable resource. This talk will explore the hidden emotional and social costs of contributing to open-source projects. From burnout to invisibility...
Computer Vision (incl. Generative AI CV) Talk

Lane detection in self-driving using only NumPy

By: Emma Saroyan
Are you a scientist or a developer looking to understand how to use NumPy to solve computer vision problems? NumPy is a Python package that provides the multidimensional array object which you can us...
Computer Vision (incl. Generative AI CV) Talk

Spot the difference: 🕵️ using foundation models to monitor for change with satellite imagery 🛰️

By: Ferdinand Schenck
Energy infrastructure is vulnerable to damage by erosion or third party interference, which often takes the form of unsanctioned construction. In this talk we discuss our experiences using deep learni...
Data Handling & Engineering Talk

Bye-Bye Query Spaghetti: Write Queries You'll Actually Understand Using Pipelined SQL Syntax

By: Tobias Lampert
Are your SQL queries becoming tangled webs that are difficult to decipher, debug, and maintain? This talk explores how to write shorter, more debuggable, and extensible SQL code using **Pipelined SQL*...
Data Handling & Engineering Tutorial

Deep Dive into the Synthetic Data SDK

By: Tobias Hann
In January the Synthetic Data SDK was introduced and it quickly is gaining traction as becoming the standard Open Source library for creating privacy-preserving synthetic data. In this hands-on tutori...
Data Handling & Engineering Talk

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

By: Konrad Richter
At GetYourGuide, we transformed experimentation from a centralized, closed system into a democratized, self-service platform accessible to all analysts, engineers, and product teams. In this talk, we'...
Data Handling & Engineering Talk

Docling: Get your documents ready for gen AI

By: Michele Dolfi, Christoph Auer
Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of...
Data Handling & Engineering Talk

Exploring Millions of High-dimensional Datapoints in the Browser for Early Drug Discovery

By: Tim Tenckhoff, Matthias Orlowski
The visual exploration of large, high-dimensional datasets presents significant challenges in data processing, transfer, and rendering for engineering in various industries. This talk will explore inn...
Data Handling & Engineering Talk (long)

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

By: Orell Garten
Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and Duc...
Data Handling & Engineering Talk

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

By: Alexander C.S. Hendorf
Using AI agents and automation, PyCon DE & PyData volunteers have transformed chaos into streamlined conference ops. From YAML files to LLM-powered assistants, they automate speaker logistics, FAQs, v...
Data Handling & Engineering Tutorial

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

By: Mehdi Ouazza
Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll lear...
Data Handling & Engineering Tutorial

See only what you are allowed to see: Fine-Grained Authorization

By: Maria Knorps
Managing who can see or do what with your data is a fundamental challenge, especially as applications and data grow in complexity. Traditional role-based systems often lack the granularity needed for ...
Data Handling & Engineering Talk

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

By: Vinayak Nair
I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying ...
Education, Career & Life Tutorial

Probably Fun: Games to teach Machine Learning

By: Kristian Rother, Shreyaasri Prakash
In this tutorial, you will play several games that can be used to teach machine learning concepts. Each game can be played in big and small groups. Some involve hands- on material such as cards, some ...
Ethics & Privacy Talk

The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions

By: Adrin
The EU AI Act is already partly in effect which prohibits certain AI systems. After going through the basics, we cover some of the less talked about aspects of the Act, introducing entities involved i...
Ethics & Privacy Talk

What’s Really Going On in Your Model? A Python Guide to Explainable AI

By: Yashasvi Misra
As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeh...
Generative AI Talk

A quarter decade of learnings from scaling RAG to millions of users

By: Jakob Pörschmann
Drawing on experience at Google designing 50+ RAG applications rolled out to millions of users, this talk presents a practical RAG design blueprint. We'll dissect key decision points for building robu...
Generative AI Talk

Building Intelligent Systems with Agentic AI and Knowledge Graphs

By: Emily White, Vinay Babu
This talk explores how we're applying agentic AI, ontology, and GraphRAG to tackle some complex data integration challenges in the trucking and logistics industry. We'll detail the development of a sy...
Generative AI Tutorial

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases

By: CainĂŁ Max Couto da Silva
This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, executes them on live databases, and returns coheren...
Generative AI Talk (long)

From Manual to LLMs: Scaling Product Categorization

By: Giampaolo Casolla, Ansgar GrĂĽne
How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust mult...
Generative AI Talk

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

By: Laura
With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scien...
Generative AI Talk

One API to Rule Them All? LiteLLM in Production

By: Alina Dallmann
Using LiteLLM in a Real-World RAG System: What Worked and What Didn’t LiteLLM provides a unified interface to work with multiple LLM providers—but how well does it hold up in practice? In this talk...
Infrastructure - Hardware & Cloud Talk

Edge of Intelligence: The State of AI in Browsers

By: Johannes Kolbe
API calls suck! Okay, not all of them. But building your AI features reliant on third party APIs can bring a lot of trouble. In this talk you'll learn how to use web technologies to become more indepe...
Infrastructure - Hardware & Cloud Talk

Flying Beyond Keywords: Our Aviation Semantic Search Journey

By: Dat Tran, Dennis Schmidt
In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we up...
Infrastructure - Hardware & Cloud Talk (long)

Template-based web app and deployment pipeline at an enterprise-ready level on Azure

By: Johannes Schöck
A practical deep-dive into Azure DevOps pipelines, the Azure CLI, and how to combine pipeline, bicep, and python templates to build a fully automated web app deployment system. Deploying a new proof o...
Natural Language Processing & Audio (incl. Generative AI NLP) Talk

Bridging Custom Schemas and Wikidata with an LLM-Assisted Interactive Python Tool

By: Sankalp Gilda, Ph.D.
Many projects build knowledge graphs with custom schemas but struggle to align them with standard hubs like Wikidata. Manual mapping is tedious and error-prone, while fully automated methods often lac...
Natural Language Processing & Audio (incl. Generative AI NLP) Talk

From Months to Minutes: Accelerating Compliance Reviews with GenAI

By: Elizaveta Zinovyeva
Transform time-consuming document compliance reviews into automated workflows with Generative AI. Through live demonstrations, learn how to build systems that extract policies from unstructured data, ...
PyData & Scientific Libraries Stack Tutorial

A Beginner's Guide to State Space Modeling

By: Alexandre Andorra, Jesse Grabowski
**State Space Models** (SSMs) are powerful tools for time series analysis, widely used in finance, economics, ecology, and engineering. They allow researchers to encode structural behavior into time s...
PyData & Scientific Libraries Stack Talk

Advanced Polars: Lazy Queries and Streaming Mode

By: Emanuele Fabbiani
Do you find yourself struggling with Pandas' limitations when handling **massive datasets** or **real-time data streams**? Discover **Polars**, the lightning-fast DataFrame library built in Rust. T...
PyData & Scientific Libraries Stack Talk

Building Reactive Data Apps with Shinylive and WebAssembly

By: Christoph Scheuch
WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstra...
PyData & Scientific Libraries Stack Talk

Causal Inference in Network Structures: Lessons learned From Financial Services

By: Danial Senejohnny
*Causal inference techniques are crucial to understanding the impact of actions on outcomes.* *This talk shares lessons learned from applying these techniques in real-world scenarios where standard me...
PyData & Scientific Libraries Stack Talk

Consumer Choice Models with PyMC Marketing

By: Nathaniel Forde
Consumer choice models are an important part of product innovation and market strategy. In this talk we'll see how they can be used to learn about substitution goods and market shares in competitive m...
PyData & Scientific Libraries Stack Talk

Risk Budget Optimization for Causal Mix Models

By: Carlos Trujillo
Traditional budget planners chase the highest predicted return and hope for the best. Bayesian models take the opposite route: they quantify uncertainty first, then let us optimize budgets with that u...
PyData & Scientific Libraries Stack Talk

The Importance and Elegance of Polars Expressions

By: Jeroen Janssens
Polars is known for its speed, but its elegance comes from its use of expressions. In this talk, we’ll explore how Polars expressions work and why they are key to efficient and elegant data manipulati...
Visualisation & Jupyter Talk

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

By: Yaseen Esmaeelpour
Optimizing user funnels is a common task for data analysts and data scientists. Funnels are not always linear in the real world. often, the next step depends on earlier responses or actions. This resu...
Visualisation & Jupyter Talk

Beyond the Black Box: Interpreting ML models with SHAP

By: Avik Basu
As machine learning models become more accurate and complex, explainability remains essential. Explainability helps not just with trust and transparency but also with generating actionable insights an...
Visualisation & Jupyter Talk

Building an A/B Testing Framework with NiceGUI

By: Wessel van de Goor
NiceGUI is a Python-based web UI framework that enables developers to build interactive web applications without using JavaScript. In this talk, I’ll share how my team used NiceGUI to create an intern...
Visualisation & Jupyter Talk

Democratizing Digital Maps: How Protomaps Changes the Game

By: Veit Schiele
Digital mapping has long been dominated by commercial providers, creating barriers of cost, complexity, and privacy concerns. This talk introduces Protomaps, an open-source project that reimagines how...
Talk

Scaling Probabilistic Models with Variational Inference

By: Dr. Juan Orduz
This talk presents variational inference as a tool to scale probabilistic models. We describe practical examples with NumPyro and PyMC to demonstrate this method, going through the main concepts and d...