Generative AI

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

Talk

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval - Session Card
Level: Novice Company/Institute: Owkin

Abstract

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Prerequisites

Basic familiarity with Python and LLM concepts will be helpful but is not required.

Description

The growth of scientific healthcare literature and publicly available biomedical databases has created many opportunities but also great challenges for researchers. While large amounts of biological data are now freely available, finding and connecting relevant information across disparate sources remains time-consuming and complex. LLM-powered tools offer promising solutions to this challenge, but implementing them in healthcare, where accuracy can impact patient outcomes, requires specialised approaches and careful design considerations.
This talk will share practical lessons and technical strategies to address hallucinations, complex domain-specific terminology, source citations.

The presentation will be structured into three main sections:

  1. The challenge of scientific data retrieval (5 mins)

    1. Overview of the current landscape of biological databases and scientific literature
    2. Common challenges researchers face when searching for information across multiple sources
    3. Specificities of healthcare domain where accuracy is critical
  2. Technical architecture for LLM-powered scientific search (15 mins)

    1. Reliable approaches to querying structured databases using natural language
    2. Vector database implementation for semantic search across scientific literature
    3. Strategies to ensure retrieved information is properly attributed to sources
    4. Real-world performance considerations: balancing accuracy, latency, and cost
  3. Lessons learned and future directions (5 mins)

    1. Performance metrics and user feedback from academic researchers
    2. Challenges and limitations of current approaches
    3. Future directions for AI-assisted scientific discovery

Throughout the talk, I'll provide concrete examples of how these technologies can be applied to real research questions, in a production environment, demonstrating the practical value of AI agents in accelerating scientific discovery.

Intended audience: This talk is designed for data scientists, ML / Software engineers, bioinformaticians, and researchers interested in leveraging AI for scientific data retrieval and analysis.
While examples will focus on biological data, the principles and techniques discussed are applicable across scientific domains. Basic familiarity with Python and AI concepts will be helpful but is not required.

Speaker

Laura

Laura

Senior Machine learning engineer

I have worked in the healthcare industry for more than 10 years, currently a senior machine learning at Owkin. Committed to open source and open science principles, I aspire to leverage Python and data science for social good, focusing on health, inclusion, and projects that make a meaningful difference in people's lives.

View Full Conference Program