Talk
The visual exploration of large, high-dimensional datasets presents significant challenges in data processing, transfer, and rendering for engineering in various industries. This talk will explore innovative approaches to harnessing massive datasets for early drug discovery, with a focus on interactive visualizations. We will demonstrate how our team at Bayer utilizes a modern tech stack to efficiently navigate and analyze millions of data points in a high-dimensional embedding space. Attendees will gain insights into overcoming performance challenges, optimizing data rendering, and developing user-friendly tools for effective data exploration. We aim to demonstrate how these technologies can transform the way we interact with complex datasets in engineering applications and eventually allow us to find the needle in a multidimensional haystack.
Our talk is geared towards: Data engineers, visualization software developers, and scientific computing professionals Experience level: intermediate to advanced Attendees should be comfortable with Python and Type Script and have a basic familiarity with data processing pipelines or browser-based visualization frameworks.
From initial screening to regulatory approval, developing new drugs can take over a decade. A major bottleneck is the early-stage identification of promising compounds, a process that increasingly relies on high-throughput image-based profiling and requires researchers to sift through vast oceans of potential molecular candidates. Analyzing these large-scale, high-dimensional datasets introduces challenges in data ingestion, transformation, and visualization. Overcoming those challenges has the potential to significantly accelerate the journey from discovery to delivery, thus providing life-saving treatments to patients faster.
In this talk, we share how our team at Bayer engineered a system to navigate millions of cell-level data points in the browser. Starting with raw microscopy images, we use computer vision and deep learning models to extract morphological features. These features are aggregated into “consensus profiles” that enable robust comparisons across treatment conditions and experimental batches.
We’ll present how we automated and optimized what was previously a four-week manual workflow using a tech stack including:
• Apache Airflow for orchestrating parallel processing and ensuring reproducibility
• GraphQL combined with REST for a balance of flexibility and speed in serving data
• React and Next.js for building user interfaces that support real-time interaction with millions of records
We’ll also showcase techniques for creating accessible and performant visualizations: scatter plots, dose-response curves, dendrograms, and similarity heatmaps. These visualizations were designed for scientists who are no software developers, so particular attention was paid to usability, accessibility, and performance.
By presenting practical challenges and solutions, we will enable attendees to improve their approaches to data visualization and interaction in their own domains. We aim to convey how these technologies can transform the way we interact with complex datasets in engineering applications on a broad spectrum, empowering us with more efficient methodologies to locate the needle in a multidimensional haystack.
Senior Software Development Consultant
Lead AI Engineer
As a Machine Learning Engineer at Bayer, Matthias Orlowski has contributed to various projects, focusing on natural language processing in pharmacovigilance and medical image processing in radiology and early drug discovery. Matthias studied in Konstanz, Nottingham (UK), Durham (North Carolina, USA), and Berlin, where he earned a PhD from Humboldt University in 2015. Prior to joining Bayer, Matthias gained diverse experience in multiple roles and organizations, tackling projects in consumer targeting, campaigning, and recommender systems.