Talk (long)
Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB. In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects. If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.
Attendees should have a basic understanding of data engineering principles, but no special knowledge is required.
Cloud-native tools are everywhere. But not every system can or should move to the cloud.
In many industries like manufacturing, logistics, or energy, TCP streams remain the backbone of real-time data exchange. These systems are often on-premise, resource-constrained, and mission-critical.
This talk shows how you can build lean, powerful batch pipelines with source data coming from TCP streams using Python and DuckDB. All without the complexity of cloud services.
We'll cover:
You'll walk away with:
Whether you process factory sensor data, machine logs, or legacy telemetry, this talk will give you modern tools to make your data streams actionable and efficient.