A quarter decade of learnings from scaling RAG to millions of users

Level: Intermediate Company/Institute: Google Room: {'en': 'B05-B06'} Time: 2025-09-02T11:40:00+00:00

Abstract

Drawing on experience at Google designing 50+ RAG applications rolled out to millions of users, this talk presents a practical RAG design blueprint. We'll dissect key decision points for building robust knowledge bases (data types, chunking), selecting effective retrieval strategies beyond basic vector search (including Knowledge Graphs and Text2SQL based on query types), and generating meaningful responses tailored to user needs. Attendees will learn reusable patterns and trade-offs essential for building production-ready, scalable RAG systems.

Prerequisites

https://weaviate.io/blog/introduction-to-rag#:~:text=The%20basic%20parts%20of%20a,leveraging%20valuable%20task%2Dspecific%20data. https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings https://hello-jp.net/building-beyond-the-buzz/graph-rag-a-conceptual-intro

Description

Target Audience & Prerequisites:
This talk is aimed at Data Scientists, Machine Learning Engineers, Software Engineers, and Architects who are building or planning to build applications integrating LLMs with proprietary or external knowledge bases. Attendees should have a basic understanding of Large Language Models and the core concept of RAG. Familiarity with different data storage paradigms (relational, graph, document, vector) is helpful but not essential.

Talk Outline & Content:
This session provides a structured blueprint to desining RAG systems, focusing on three critical stages:
Building the Knowledge Base.
Identifying and characterizing knowledge sources (internal/external, APIs).
Handling structured, unstructured, and semi-structured data.
Choosing appropriate chunking strategies (markdown, recursive, token-based, etc.) to preserve context for unstructured data.
Understanding the implications of data control (self-managed vs. external).
Retrieving the Right Content:
Matching Techniques to Queries:
Text Embeddings & Vector Search: Strengths for extractive queries on unstructured data.
Knowledge Graphs (KG): Benefits for aggregate queries and exploring relationships within unstructured data.
Text2SQL: Essential for analytical queries on structured data.
Hybrid Approaches: Combining methods for broader coverage.
Operational Considerations: Factors like query volume, latency requirements, data scale, and selecting appropriate storage/indexing solutions (using GCP options like Vertex AI Vector Search, AlloyDB, BigQuery as concrete examples of trade-offs, but keeping principles general). Integrating external sources like search APIs.
Generating Meaningful Responses: Retrieval is only half the battle. We'll discuss:
Tailoring generation to the user interface (Q&A, conversational, GUI).
Handling query complexity: Strategies for breaking down complex questions (e.g., "Deep Research" concept) before summarization.
Integrating RAG into wider systems or agentic applications.

Takeaway:
Attendees will leave with a practical, step-by-step blueprint for designing RAG systems. They will understand the critical design decisions at each stage, the trade-offs between different retrieval techniques (beyond just vectors), and how to tailor their architecture to specific data, query types, and scalability needs. The focus is on actionable patterns learned from real-world, large-scale deployments.

Speaker

Jakob Pörschmann

AI Customer Engineer

I'm a technologist, ml enthusiast and indie hacker from Berlin. Currently I build generative ai stuff at Google.

View Full Conference Program