From Manual to LLMs: Scaling Product Categorization

Level: Intermediate Company/Institute: GetYourGuide Room: {'en': 'B05-B06'} Time: 2025-09-02T14:00:00+00:00

Abstract

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.

Prerequisites

Attendees should have a foundational understanding of machine learning concepts and familiarity with the Python data science stack. Exposure to vector embeddings or Large Language Model (LLM) APIs is helpful but not mandatory.

Description

Target Audience: Data scientists, AI/ML engineers, and practitioners interested in applying large language models (LLMs) / generative AI to solve real-world classification problems at scale. Attendees should have a foundational understanding of machine learning concepts and familiarity with the Python data science stack. Exposure to vector embeddings or LLM APIs is helpful but not mandatory.

Takeaway: Attendees will gain practical insights and learn best practices for building, debugging, scaling, and productionizing a complex, multi-step generative AI system for large-scale product categorization. They will understand the evolution from traditional methods to LLMs, learn specific techniques for prompt engineering, batch processing, cost optimization with models like OpenAI's, and see the tangible business impact of such a system.

Detailed Outline:

This talk chronicles our journey tackling a yet challenging problem: accurately categorizing hundreds of thousands of diverse products into a fine-grained taxonomy of over 1,000 categories. We'll share our evolution from initial manual and rule-based systems to a sophisticated, production-ready Generative AI pipeline.

Part 1: The Challenge & Initial Approaches (10 minutes)
- Introduction to the business need for accurate product categorization at scale.
- Overview of the limitations encountered with traditional methods:
  - Manual Curation: Slow, expensive, inconsistent, and impossible to scale.
  - Rule-Based Systems: Brittle, hard to maintain, and unable to handle nuances or new product types.
  - Fine-tuned Semantic Models: An improvement, but struggled with zero-shot generalization to new categories and required significant labeled data and retraining.
Part 2: Entering the GenAI Era - Iterations & Lessons Learned (10 minutes)
- Our initial exploration using LLMs for categorization, what worked, and what failed.
- Developing the Prompt: We'll dive deep into the iterative process of prompt engineering for this complex multi-label, hierarchical classification task. We'll show examples of early prompts, their failure modes (e.g., inconsistent output format, hallucinated categories, difficulty handling multiple classification signals), and the refinements that led to more reliable results. We will discuss techniques for achieving structured output (e.g., JSON) from the LLM.
- Early Scaling Issues: Discussing the pitfalls of naive API usage, latency problems, and prohibitive costs when dealing with large volumes.
Part 3: Building a Robust, Scalable GenAI Pipeline (10 minutes)
- The Hybrid Approach: Detailing our successful multi-step architecture that combines the strengths of semantic embeddings for efficient candidate retrieval/filtering and LLMs (specifically leveraging OpenAI models) for nuanced final categorization.
- Productionization Strategies:
  - Batching: Implementing efficient batch processing using asynchronous requests and the OpenAI Batch API to drastically reduce latency and cost.
  - Cost vs. Accuracy: Strategies for selecting the right model based on complexity and cost constraints.
  - Semantic Similarity & Early Stopping: Using vector similarity to intelligently prune the search space, avoiding the need to evaluate every product against all 1,000+ categories with the LLM, thus significantly optimizing cost and throughput.
  - Automation & Monitoring: How we process updates of categories and products automatically in PySpark and monitor that the live system works as expected.
Part 4: Measuring Impact & Looking Ahead (10 minutes)
- Presenting the results: Showcasing the significant improvements in categorization accuracy and coverage compared to previous methodsIllustrative examples of challenging products correctly categorized by the GenAI system.
- Discussing the tangible business value derived as measured in A-B tests
- Briefly touching upon ongoing work and future directions.

This presentation will focus on the practical application and engineering challenges, sharing reusable techniques and hard-won lessons applicable to anyone looking to leverage the power of generative AI for large-scale, real-world problems. We aim to provide a transparent account of not just the successes, but also the crucial learnings from failures encountered along the way.

Speakers

Giampaolo Casolla

Senior Data Scientist

Giampaolo Casolla is a Senior Data Scientist at GetYourGuide, leveraging advanced machine learning and Generative AI to solve complex travel industry challenges. With expertise spanning areas like Safety, Risk, and Security, and strong skills in stats, Python, R, and cloud tech, he brings a diverse background to the role. Prior to GetYourGuide, Giampaolo developed award-winning ML solutions at Amazon and has a background in research with publications and conference presentations. At GetYourGuide, he's focused on integrating LLMs and GenAI into data products to drive innovation in travel technology.

Ansgar Grüne

Senior Data Scientist

Ansgar Grüne is a Senior Data Scientist at GetYourGuide in Berlin. His work focuses on ML/AI approaches to improve the users search and discovery experience on the platform. He holds a Ph.D. in Theoretical Computer Science and has 10 years of experience as a Data Scientist in the travel industry following several years as software engineer.

View Full Conference Program