Heni writeups

My Journey with Retrieval-Augmented Generation (RAG)

2025-01-03

RAG: How I Learned to Stop Worrying and Love Knowledge Retrieval

After months of battling hallucinations and outdated information in my LLM applications, I finally dove into Retrieval-Augmented Generation. Here’s what I’ve learned, what works, and the honest challenges I’ve faced along the way.

What’s RAG and Why Should You Care?

At its core, RAG is simple: instead of asking an LLM to remember everything, we give it the ability to look things up. Think of it as the difference between a closed-book and open-book exam.

Traditional LLMs are like students who had to memorize the textbook before the test. RAG systems are like students who can consult reference materials during the exam - they still need to know how to understand and apply the information, but they don’t need to memorize every fact.

Here’s a simple visualization of how it works:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e0f7fa', 'primaryTextColor': '#006064', 'primaryBorderColor': '#00acc1', 'lineColor': '#0097a7', 'secondaryColor': '#e1f5fe', 'tertiaryColor': '#e8f5e9'}}}%%
flowchart LR
    A[User Question] --> B[Search for Relevant Info]
    B --> C[Provide Info to LLM]
    C --> D[Generate Answer]
    
    style A fill:#bbdefb,stroke:#1976d2,stroke-width:2px
    style B fill:#fff59d,stroke:#fdd835,stroke-width:2px
    style C fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style D fill:#ffccbc,stroke:#e64a19,stroke-width:2px

My RAG Setup: Nothing Fancy, But It Works

After experimenting with several approaches, I settled on a fairly straightforward implementation:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f3e5f5', 'primaryTextColor': '#4a148c', 'primaryBorderColor': '#9c27b0', 'lineColor': '#7b1fa2', 'secondaryColor': '#ede7f6', 'tertiaryColor': '#f3e5f5'}}}%%
graph TD
    A[Our Documentation] --> B[Split into Chunks]
    B --> C[Convert to Embeddings]
    C --> D[Store in Vector DB]
    
    E[User Question] --> F[Create Question Embedding]
    F --> G[Find Similar Documents]
    D --> G
    G --> H[Add Context to Prompt]
    H --> I[Send to LLM]
    I --> J[Return Answer]
    
    style A fill:#d1c4e9,stroke:#673ab7,stroke-width:2px
    style B fill:#c5cae9,stroke:#3f51b5,stroke-width:2px
    style C fill:#bbdefb,stroke:#2196f3,stroke-width:2px
    style D fill:#b2ebf2,stroke:#00bcd4,stroke-width:2px
    style E fill:#b2dfdb,stroke:#009688,stroke-width:2px
    style F fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
    style G fill:#dcedc8,stroke:#8bc34a,stroke-width:2px
    style H fill:#fff9c4,stroke:#ffeb3b,stroke-width:2px
    style I fill:#ffecb3,stroke:#ffc107,stroke-width:2px
    style J fill:#ffe0b2,stroke:#ff9800,stroke-width:2px

Why RAG Changed Everything for Our Team

When we implemented RAG in our customer support AI, three things immediately improved:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e8f5e9', 'primaryTextColor': '#1b5e20', 'primaryBorderColor': '#4caf50', 'lineColor': '#388e3c', 'secondaryColor': '#f1f8e9', 'tertiaryColor': '#e0f2f1'}}}%%
graph LR
    A[Before RAG] --> B[After RAG]
    
    subgraph "Accuracy"
    C[67% Correct] --> D[93% Correct]
    end
    
    subgraph "Freshness"
    E[Always Outdated] --> F[Always Current]
    end
    
    subgraph "Trust"
    G[Team Skeptical] --> H[Team Relies On It]
    end
    
    style A fill:#ffcdd2,stroke:#e53935,stroke-width:2px
    style B fill:#c8e6c9,stroke:#43a047,stroke-width:2px
    style C fill:#ffcdd2,stroke:#e53935,stroke-width:2px
    style D fill:#c8e6c9,stroke:#43a047,stroke-width:2px
    style E fill:#ffcdd2,stroke:#e53935,stroke-width:2px
    style F fill:#c8e6c9,stroke:#43a047,stroke-width:2px
    style G fill:#ffcdd2,stroke:#e53935,stroke-width:2px
    style H fill:#c8e6c9,stroke:#43a047,stroke-width:2px

The biggest win? Our support team went from fact-checking every AI response to trusting the system enough to focus on the tough cases the AI couldn’t handle.

Real-World Applications I’ve Seen Work

I’ve either built or seen colleagues build these RAG applications with impressive results:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'primaryTextColor': '#01579b', 'primaryBorderColor': '#03a9f4', 'lineColor': '#0288d1', 'secondaryColor': '#e3f2fd', 'tertiaryColor': '#e8eaf6'}}}%%
mindmap
  root((My RAG Projects))
    Support Knowledge Base
      Product documentation
      Troubleshooting guides
      Customer conversations
    Research Assistant
      Academic papers
      Internal research
      Competitive analysis
    Code Documentation Helper
      GitHub repositories
      API docs
      Stack Overflow solutions
    Personalized Learning
      Course materials
      Student questions
      Learning progress

The support knowledge base was by far the most successful - we saw a 42% reduction in escalations and a 27% improvement in first-contact resolution.

The Not-So-Pretty Parts: RAG Challenges

Let me be honest about the struggles. RAG isn’t magic, and these issues still challenge me daily:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fce4ec', 'primaryTextColor': '#880e4f', 'primaryBorderColor': '#e91e63', 'lineColor': '#d81b60', 'secondaryColor': '#f8bbd0', 'tertiaryColor': '#f3e5f5'}}}%%
graph TD
    A[Real RAG Challenges] --> B[Garbage In, Garbage Out]
    A --> C[Hallucinations Still Happen]
    A --> D[Context Window Limits]
    A --> E[Slow Retrieval at Scale]
    
    B --> B1[Hard to automate quality control]
    C --> C1[LLM still makes things up sometimes]
    D --> D1[Can't fit all relevant docs]
    E --> E1[Latency issues with large DBs]
    
    style A fill:#f8bbd0,stroke:#c2185b,stroke-width:3px
    style B fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
    style C fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
    style D fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
    style E fill:#e1bee7,stroke:#8e24aa,stroke-width:2px
    style B1 fill:#d1c4e9,stroke:#5e35b1,stroke-width:1px
    style C1 fill:#d1c4e9,stroke:#5e35b1,stroke-width:1px
    style D1 fill:#d1c4e9,stroke:#5e35b1,stroke-width:1px
    style E1 fill:#d1c4e9,stroke:#5e35b1,stroke-width:1px

The biggest lesson? Your retrieval quality matters more than anything else. A sophisticated LLM with poor retrieval will always underperform compared to a simpler LLM with excellent retrieval.

Practical Tips From My Experience

Here are some real tips that saved me countless hours:

  1. Start small - Begin with a focused document set you know well
  2. Chunk thoughtfully - Document splitting affects everything downstream
  3. Test with real users - Their questions rarely match what you expect
  4. Build evaluation early - You need to measure to improve
  5. Prompt engineering still matters - How you instruct the LLM to use the retrieved context makes a huge difference
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e0f7fa', 'primaryTextColor': '#006064', 'primaryBorderColor': '#00bcd4', 'lineColor': '#00acc1', 'secondaryColor': '#e0f2f1', 'tertiaryColor': '#e8f5e9'}}}%%
graph TB
    A[My RAG Process] --> B[Identify Valuable Knowledge]
    B --> C[Preprocess & Clean Text]
    C --> D[Experiment with Chunk Sizes] 
    D --> E[Test Different Embeddings]
    E --> F[Refine Query Processing]
    F --> G[Optimize Prompts]
    G --> H[Evaluate & Iterate]
    
    style A fill:#b2ebf2,stroke:#00acc1,stroke-width:3px
    style B fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style C fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style D fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style E fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style F fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style G fill:#b3e5fc,stroke:#039be5,stroke-width:2px
    style H fill:#b3e5fc,stroke:#039be5,stroke-width:2px

Simple RAG Implementation: How I Started

My first implementation used just a few components:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e0f2f1', 'primaryTextColor': '#004d40', 'primaryBorderColor': '#009688', 'lineColor': '#00897b', 'secondaryColor': '#e8f5e9', 'tertiaryColor': '#f1f8e9'}}}%%
graph LR
    A[Python + Langchain] --> B[OpenAI Embeddings]
    B --> C[Chroma Vector DB]
    C --> D[GPT-3.5]
    
    style A fill:#b2dfdb,stroke:#00796b,stroke-width:2px
    style B fill:#b2dfdb,stroke:#00796b,stroke-width:2px
    style C fill:#b2dfdb,stroke:#00796b,stroke-width:2px
    style D fill:#b2dfdb,stroke:#00796b,stroke-width:2px

This basic setup handled about 5,000 documents and served 50 users quite well. It wasn’t perfect, but it worked much better than what we had before.

The Future of RAG (In My View)

Here’s where I see RAG heading in the next year:

  1. Multi-step reasoning - RAGs that can plan their retrieval strategy
  2. Hybrid retrieval - Combining multiple retrieval methods for better results
  3. Self-improving systems - RAGs that learn from user feedback and usage patterns
  4. Multi-modal retrieval - Finding and using images, video, and audio alongside text
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e8eaf6', 'primaryTextColor': '#1a237e', 'primaryBorderColor': '#3f51b5', 'lineColor': '#3949ab', 'secondaryColor': '#e3f2fd', 'tertiaryColor': '#e1f5fe'}}}%%
graph TD
    A[Evolution of RAG] --> B[Today: Basic Context]
    B --> C[Next: Strategic Retrieval]
    C --> D[Future: Agentic Knowledge Systems]
    
    style A fill:#c5cae9,stroke:#3f51b5,stroke-width:3px
    style B fill:#bbdefb,stroke:#2196f3,stroke-width:2px
    style C fill:#bbdefb,stroke:#2196f3,stroke-width:2px
    style D fill:#bbdefb,stroke:#2196f3,stroke-width:2px

Conclusion: Why RAG Matters To Me

RAG isn’t just a technical approach - it’s changed how I think about AI systems. Instead of trying to build models that know everything, I now focus on building systems that know when and how to look things up.

This feels more honest and more useful. Our RAG systems are explicit about where their information comes from, which builds trust with users and makes the systems more maintainable for our team.

If you’re just starting with RAG, my advice is simple: pick a small, well-defined knowledge domain you care about, and build a basic prototype. Even a simple implementation can deliver impressive results, and you’ll learn so much by doing.

Feel free to reach out if you’re building something similar - I’m always happy to compare notes with fellow RAG enthusiasts!


Resources I’ve Found Helpful:

← Back to Home