Genie: How Uber Organized Data for AI That Works
At Uber, teams like Michelangelo maintained Slack channels for internal support. The volume of questions was enormous: 45,000 questions per month. Users waited a long time for answers, information was fragmented across wikis, internal Stack Overflow, and docs, and the same questions were asked repeatedly.
The solution? Genie, a Gen AI on-call copilot that transformed internal support.
The Architectural Decision: RAG vs Fine-tuning
Uber chose RAG (Retrieval-Augmented Generation) over fine-tuning for practical reasons:
Why not fine-tuning:
- Requires high-quality curated data
- Needs diverse examples for the LLM to learn
- Requires computational resources to update
- Longer time-to-market
Why RAG:
- Does not require diverse examples to start
- Easy to update with new data
- Reduced time-to-market
- Responds based on real documentation
The challenges to solve with RAG included hallucinations, data security, and user experience.
Architecture: From Data to Response
Genie’s data flow can be generalized as a RAG application using Apache Spark.
Data Ingestion
- Sources: Internal wiki (Engwiki), internal Stack Overflow, requirements documents
- Processing: Apache Spark for ETL at scale
- Embeddings: OpenAI embedding model
- Storage: Terrablob (blob storage) + Sia (internal vector DB)
Serving (Response)
- Input: User question on Slack
- Knowledge Service: Converts question to embedding, searches relevant chunks
- LLM: Generates response using retrieved context
- Output: Response with source URL + action buttons
ETL Pipeline with Apache Spark
The ingestion pipeline has 4 stages:
1. Data Prep: Fetches content from sources via APIs. Output: DataFrame with URL and content.
2. Embeddings Creation: Chunking with LangChain + embedding generation via OpenAI using PySpark UDFs.
3. Vector Pusher: Push embeddings to Terrablob. Spark jobs for index build and merge.
4. Vector DB Sync: Each leaf syncs and downloads the base index from Terrablob daily.
Why Spark? Distributed processing for high volume, UDFs allow OpenAI integration, facilitates pipeline orchestration, and native integration with blob storage.
Knowledge Service: The Heart of Genie
The Knowledge Service is the backend that processes all queries. The flow:
- Receives question via Slack
- Generates embedding using Ada Embeddings Model
- Searches most relevant chunks in Vector DB
- Sends prompt with context to the LLM
Integrated Cost Tracking: Each call passes a UUID through the context, allowing cost tracking by channel, team, or use case. Recommended practice: always implement cost tracking from day one.
The Crucial Insight: Documentation Quality
“If documentation quality is bad, it doesn’t matter how good the LLM is - there’s no way to have good performance.”
Uber created a system to evaluate and improve document quality in the knowledge base. The system returns:
- Evaluation score for each document
- Explanation of the score
- Actionable suggestions on how to improve
Reducing Hallucinations
The main strategy was structuring prompts with sub-contexts and URLs:
Sub-context 1: [content]
Source: [URL]
Sub-context 2: [content]
Source: [URL]
Instruction: Respond ONLY using the sub-contexts above
and cite the source URL for each response.
Result: Each response includes the source URL, allowing user verification.
Other strategies:
- Source curation: Only use sources widely available to engineers
- Updated data: Daily pipeline ensures recent information
- Verification against sources: Mechanisms to verify responses against authoritative sources
Integrated Feedback System
Users give feedback by clicking buttons on Genie’s response:
- Resolved: Response completely solved the problem
- Helpful: Partially helped, but needs more
- Not Helpful: Wrong or irrelevant response
- Not Relevant: User needs human help
Real-time data allows quickly identifying problems and adjusting the system.
LLM as Judge
To evaluate responses at scale, Uber uses LLM as a Judge. The LLM compares responses against gold standards or human preferences.
Metrics evaluated:
- Hallucination rate
- Response relevance
- Context coverage
- Any custom metric
Results Since Launch
Since September 2023:
- 154 Slack channels served
- 70,000+ questions answered
- 48.9% usefulness rate
- 13,000 engineering hours saved
Considering average engineer salary, 13,000 saved hours represent significant value in recovered productivity.
6 Insights to Replicate
-
RAG is faster for MVP: Doesn’t require curated data to start. Fine-tuning can come later.
-
Doc quality matters more than the LLM: Continuously evaluate and improve documentation. Garbage in, garbage out.
-
Feedback loop from day 1: Integrate feedback collection into the flow. Use streaming systems for real-time data.
-
LLM as Judge for evaluation at scale: Allows measuring hallucinations and relevance without relying only on manual feedback.
-
Cite sources in every response: Structure prompts with sub-contexts + URLs. Reduces hallucinations and increases trust.
-
Track costs by UUID: Pass identifiers in each call for audit log. Enables cost optimization.
The Main Lesson
Genie demonstrates that AI that works in production doesn’t depend only on the most advanced model. It depends on:
- Well-organized data
- Quality documentation
- Continuous feedback
- Cost tracking
- Source citation
Data infrastructure is the true competitive differentiator.
At Victorino Group, we help companies organize data and build AI agents with real results. If you want to implement AI that works, let’s talk.
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation