Data Handling & Engineering

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

Talk (long)

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB - Session Card
Level: Intermediate Company/Institute: Orell Garten (Freelance)

Abstract

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB. In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects. If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.

Prerequisites

Attendees should have a basic understanding of data engineering principles, but no special knowledge is required.

Description

Cloud-native tools are everywhere. But not every system can or should move to the cloud.

In many industries like manufacturing, logistics, or energy, TCP streams remain the backbone of real-time data exchange. These systems are often on-premise, resource-constrained, and mission-critical.

This talk shows how you can build lean, powerful batch pipelines with source data coming from TCP streams using Python and DuckDB. All without the complexity of cloud services.

We'll cover:

  • Why TCP streams still matter
  • Stream vs. Batch: Choosing the right model for industrial data
  • Pipeline architecture: From streams to batch
  • DuckDB + Python: The perfect combo for lightweight analytics
  • Key pitfalls along the way
  • Limitations of this approach

You'll walk away with:

  • Ready-to-use patterns for TCP-based data pipelines
  • Insights on when to avoid unnecessary cloud complexity
  • Tips for building fast, reliable batch jobs on local infrastructure

Whether you process factory sensor data, machine logs, or legacy telemetry, this talk will give you modern tools to make your data streams actionable and efficient.

Speaker

O

Orell Garten

View Full Conference Program