Stop Saying RAG Is Dead
By Hamel Husain and Ben Clavié
Introduction: Stop Saying RAG Is Dead
🔗 https://hamel.dev/notes/llm/rag/not_dead.html
The AI community has recently cooled on RAG. Some claim that large models know everything, so why bother retrieving documents at all?. But the authors remind us that:
Even the most advanced LLMs don’t have up-to-date, proprietary, or long-tail knowledge.
Hallucinations remain a big issue—especially when answers require specific facts.
There’s massive untapped potential in fine-tuned retrieval systems that can handle edge cases and specialized domains.
They put together a practical 5-part series in order to rethink and rebuild RAG with modern tools and ideas.
Part 1: I Don’t Use RAG, I Just Retrieve Documents
🔗 https://hamel.dev/notes/llm/rag/p1-intro.html
Here’s the twist: many people who say they’re not using RAG are… actually using RAG.
If you're retrieving docs and stuffing them into an LLM prompt—you’re doing RAG.
Most production RAG today is naive—basic vector search, fixed chunking, minimal thought on how retrieval impacts output.
The authors call for a more intentional approach: diagnose what works, understand retrieval goals, and measure quality.
Part 2: Modern IR Evals for RAG
🔗 https://hamel.dev/notes/llm/rag/p2-evals.html
How do you know your RAG system is actually good?
The authors propose modern IR (Information Retrieval) evaluations that go beyond simple keyword matching.
Introduce Task-Aware Retrieval Evaluation: don’t just test if the right document is returned—test if it helps the LLM answer better.
Leverage human-in-the-loop and LLM-as-judge methods to evaluate grounding, completeness, and relevance.
Bottom line: You can’t improve what you don’t measure—so measure the right things.
Part 3: Optimizing Retrieval with Reasoning Models
🔗 https://hamel.dev/notes/llm/rag/p3_reasoning.html
Traditional RAG systems treat retrieval and generation as separate steps. But what if you could bring reasoning into retrieval itself?
Enter reasoning-aware retrieval: LLMs that think about what information is needed before retrieving it.
Use the LLM to generate hypotheses, queries, or task decompositions, then retrieve with more purpose.
This closes the gap between dumb retrieval and smart generation.
The future of RAG is cognitive—more like a research assistant than a keyword search.
Part 4: Late Interaction Models for RAG
🔗 https://hamel.dev/notes/llm/rag/p4_late_interaction.html
Most RAG systems use sparse retrieval (e.g., BM25) or dense retrieval (e.g., vector DBs with embeddings). But both have limits.
The authors introduce:
Late Interaction models (like ColBERT), which allow for more nuanced matching between queries and documents.
They balance accuracy and efficiency, making it possible to scale up while keeping precision high.
These models can interact with document tokens directly rather than relying on full document embeddings.
Think of it as next-gen retrieval: smarter, faster, and more flexible.
Part 5: RAG with Multiple Representations
🔗 https://hamel.dev/notes/llm/rag/p5_map.html
Why should a document have only one embedding? Why not several, based on different perspectives?
Use multiple embeddings to capture different facets: #factual, #emotional, #summarized, etc.
At query time, let the LLM choose which view is most helpful.
This aligns retrieval with task-specific needs and greatly improves grounding.
It’s like giving your LLM multiple lenses to look through—not just one blurry snapshot.
Final Thought: The future of RAG isn't about copying documents into prompts. It’s about smart, dynamic, context-aware systems that retrieve the right info at the right time—in a form the model can actually use. If you think RAG is dead, you’re not looking closely enough. You might just be doing it wrong.




