top of page

AI Publications

Public·6 members

Stop Saying RAG Is Dead

By Hamel Husain and Ben Clavié


Introduction: Stop Saying RAG Is Dead

🔗 https://hamel.dev/notes/llm/rag/not_dead.html


The AI community has recently cooled on RAG. Some claim that large models know everything, so why bother retrieving documents at all?. But the authors remind us that:

  • Even the most advanced LLMs don’t have up-to-date, proprietary, or long-tail knowledge.

  • Hallucinations remain a big issue—especially when answers require specific facts.

  • There’s massive untapped potential in fine-tuned retrieval systems that can handle edge cases and specialized domains.

They put together a practical 5-part series in order to rethink and rebuild RAG with modern tools and ideas.


Part 1: I Don’t Use RAG, I Just Retrieve Documents

🔗 https://hamel.dev/notes/llm/rag/p1-intro.html


Here’s the twist: many people who say they’re not using RAG are… actually using RAG.

  • If you're retrieving docs and stuffing them into an LLM prompt—you’re doing RAG.

  • Most production RAG today is naive—basic vector search, fixed chunking, minimal thought on how retrieval impacts output.

  • The authors call for a more intentional approach: diagnose what works, understand retrieval goals, and measure quality.


Part 2: Modern IR Evals for RAG

🔗 https://hamel.dev/notes/llm/rag/p2-evals.html


How do you know your RAG system is actually good?

  • The authors propose modern IR (Information Retrieval) evaluations that go beyond simple keyword matching.

  • Introduce Task-Aware Retrieval Evaluation: don’t just test if the right document is returned—test if it helps the LLM answer better.

  • Leverage human-in-the-loop and LLM-as-judge methods to evaluate grounding, completeness, and relevance.

Bottom line: You can’t improve what you don’t measure—so measure the right things.


Part 3: Optimizing Retrieval with Reasoning Models

🔗 https://hamel.dev/notes/llm/rag/p3_reasoning.html


Traditional RAG systems treat retrieval and generation as separate steps. But what if you could bring reasoning into retrieval itself?

  • Enter reasoning-aware retrieval: LLMs that think about what information is needed before retrieving it.

  • Use the LLM to generate hypotheses, queries, or task decompositions, then retrieve with more purpose.

  • This closes the gap between dumb retrieval and smart generation.

The future of RAG is cognitive—more like a research assistant than a keyword search.


Part 4: Late Interaction Models for RAG

🔗 https://hamel.dev/notes/llm/rag/p4_late_interaction.html


Most RAG systems use sparse retrieval (e.g., BM25) or dense retrieval (e.g., vector DBs with embeddings). But both have limits.

The authors introduce:

  • Late Interaction models (like ColBERT), which allow for more nuanced matching between queries and documents.

  • They balance accuracy and efficiency, making it possible to scale up while keeping precision high.

  • These models can interact with document tokens directly rather than relying on full document embeddings.

Think of it as next-gen retrieval: smarter, faster, and more flexible.


Part 5: RAG with Multiple Representations

🔗 https://hamel.dev/notes/llm/rag/p5_map.html


Why should a document have only one embedding? Why not several, based on different perspectives?

  • Use multiple embeddings to capture different facets: #factual, #emotional, #summarized, etc.

  • At query time, let the LLM choose which view is most helpful.

  • This aligns retrieval with task-specific needs and greatly improves grounding.

It’s like giving your LLM multiple lenses to look through—not just one blurry snapshot.


Final Thought: The future of RAG isn't about copying documents into prompts. It’s about smart, dynamic, context-aware systems that retrieve the right info at the right time—in a form the model can actually use. If you think RAG is dead, you’re not looking closely enough. You might just be doing it wrong.

12 Views
bottom of page