Generative AI

One API to Rule Them All? LiteLLM in Production

Talk

One API to Rule Them All? LiteLLM in Production - Session Card
Level: Intermediate Company/Institute: scieneers

Abstract

Using LiteLLM in a Real-World RAG System: What Worked and What Didn’t LiteLLM provides a unified interface to work with multiple LLM providers—but how well does it hold up in practice? In this talk, I’ll share how we used LiteLLM in a production system to simplify model access and handle token budgets. I’ll outline the benefits, the hidden trade-offs, and the situations where the abstraction helped—or got in the way. This is a practical, developer-focused session on integrating LiteLLM into real workflows, including lessons learned around deployment, limitations, and decision points. If you’re considering LiteLLM, this talk offers a grounded look at using it beyond simple prototypes.

Prerequisites

- Audience should be familiar with basic concepts of LLM APIs (e.g. OpenAI, Anthropic, etc.) - No deep knowledge about RAG systems required

Description

Building a real-world LLM system often means juggling different providers, endpoints, and API quirks. LiteLLM promises a unified interface across model backends—but how well does it hold up in production?

In this talk, I’ll share how we integrated LiteLLM into a real-world system that includes budget usage tracking and other production concerns. From provider switching to budget handling, I’ll walk through the benefits we saw—and the challenges we hit. I’ll also touch on the limits of abstraction, and what to consider when balancing flexibility with simplicity. You’ll get a practical look at where LiteLLM helped us reduce complexity—and where it introduced trade-offs we didn’t expect.

Key Takeaways
- Understand how LiteLLM can be used to unify access to multiple LLM providers
- Learn how it fits into a real production pipeline (especially routing and budget management)
- Discover trade-offs related to abstraction, debugging, and control
- Get inspiration for how to evaluate abstraction layers in your own LLM projects

Target Audience
- Developers and engineers working with LLMs in production
- Anyone curious about LiteLLM’s strengths and limitations in a real system

Speaker

Alina Dallmann

Alina Dallmann

Alina Dallmann is a computer scientist currently working as a Data Scientist at scieneers GmbH. Her enthusiasm for classical software development and data-driven projects has recently come together in various projects focused on building retrieval-augmented generation (RAG) systems.

View Full Conference Program