Search

Search IconIcon to open search

Retrieval-Augmented Generation (RAG)

Last updated by Simon Späti

Retrieval-augmented generation is a technique that can provide more accurate results to queries than a generative large language model on its own because RAG uses knowledge external to data already contained in the Large Language Models (LLMs).

  • vector database
  • alternative to training your own model

RAG is specifically designed as a technique to enhance existing LLMs without retraining them. RAG is an extension technique, not a training technique.

# AI Architectures

RAG in combination with GenBI:

flowchart TB
    subgraph "User Interface Layer"
        Human["👤 Human User"]
        BITool["🖥️ BI Tool Interface"]
    end

    subgraph "LLM Integration Layer"
        LLMPool["Available LLMs Pool"]
        OpenAI & Claude & Gemini & Others --> LLMPool
    end

    subgraph "Natural Language Processing Layer"
        NLP["Natural Language Processing"]
        RAG["Retrieval-Augmented Generation (RAG)"]
        QueryEngine["Query Understanding Engine"]
    end

    subgraph "Data Layer"
        MetricsLayer["📊 Metrics Layer"]
        DataSources["💾 Data Sources"]
        VectorStore["Vector Store"]
    end

    %% User Interface Connections
    Human <-->|"Prompts/Feedback"| BITool
    BITool -->|"Natural Language Query"| NLP

    %% LLM Processing Flow
    LLMPool -->|"Base Knowledge"| NLP
    NLP -->|"Query Processing"| QueryEngine
    NLP -->|"Context Enhancement"| RAG

    %% Data Layer Connections
    RAG -->|"Retrieve Context"| VectorStore
    VectorStore -->|"Indexed Metrics"| MetricsLayer
    QueryEngine -->|"Execute Queries"| DataSources
    MetricsLayer -->|"Metric Definitions"| DataSources

    %% Feedback Loops
    QueryEngine -.->|"Query Refinement"| NLP
    RAG -.->|"Context Refinement"| NLP

    classDef interface fill:#f9f,stroke:#333,stroke-width:2px
    classDef storage fill:#fcf,stroke:#333,stroke-width:2px
    classDef process fill:#fff,stroke:#333,stroke-width:2px
    
    class Human,BITool interface
    class DataSources,VectorStore storage
    class NLP,RAG,QueryEngine process

Another image by Aurimas Griciūnas ( source)

It has come to my attention that some of you have not realized how central the database is to RAG implementations. So I drew it out for y’all. As Hubert said: “It is a database + LLM”:
https://twitter.com/gwenshap/status/1768709687636271520 ^95372c


Knowledge Graphs as a source of trust for LLM-powered enterprise question answering by Juan Sequeda


Origin: Daniel Svonava on LinkedIn: RAG + Knowledge Graphs cut customer support resolution time by 29.6%. 📉… | 85 comments
Source: Data Engineering Whitepapers

# Dead within a Year

I have similar thoughts. As we re-create what we have in data engineering already. There are already Vector Databases, or databases that support vectors or vector-based indices. I believe that more databases will support this use case instead of building completely new databases.

And the key piece is where the transformation happens. Not that we lock them away into another silo, we should add it to the orchestrators instead.

Here is another opinion:

My prediction is that RAG will be dead in a year or two.

RAG emerged as a workaround for LLMs’ limited context, retrieving external information to provide factual grounding. But RAG introduces many challenges on its own.

Two promising research directions towards LLMs overcoming these limitations intrinsically:

  • Faster attention mechanisms like Flash Attention are enabling far larger context sizes. LLMs like Claude 2.1 now support 200k tokens - the size of a book! As context size increases, RAG becomes redundant.
  • New architectures like Mamba and RWKV aim to replace transformers, reducing quadratic complexity for more efficient and scalable sequence modeling.

RAG allowed more knowledge to be included in the limited context of large language models. But if context capacity becomes virtually unlimited, RAG would be unnecessary, as the entire knowledge base could live within the context. Donato Riccio on LinkedIn: The End of RAG? | 39 comments, full article The End of RAG.

# Examples

# History

From Vanna.ai LinkedIn:

This started as a quick experiment back in May. Could we get ChatGPT to generate SQL for a stock market database? It worked ok as a proof of concept but only worked about 50% of the time. We were using OpenAI’s now-deprecated text-davinci-003 model. That feels like a lifetime ago now.

Then I stumbled upon a Hacker News article that talked about how to use the OpenAI “embeddings” API to generate “vectors” that you could use for semantic similarity search and then use that information as part of an LLM prompt. I actually built my own vector database in Go to store the embeddings, which seems like a crazy thing to do but no good self-hosted options existed at the time and now there are hundreds. We actually briefly considered becoming a vector database company.

We found that if you store enough example correct SQL queries and just search for queries that answer similar questions, the LLM can reliably use that example query to generate SQL that’s correct >90% of the time.

Now, the industry has a term for this method. It’s called RAG (Retrieval Augmented Generation).

More:

It worked really really well for our financial database. So we built a web app around it and generalized it so it would work with any SQL database. We released the web app… and the demos went really well but we found that people were unwilling to put their live database credentials into a random web app on the internet. We also talked to a number of large companies and they needed to customize it in ways that we couldn’t do with a small team.

So we decided to go open-source. This technology is going to revolutionize the way that people interact with databases — but only if they actually try it with their live database.

After workshopping many, many, many versions (we’re on the 35th release as of today), we made something that not only works really well, it works with any configuration that you need (and that your IT security will allow).

We’ve settled on a framework that lets you connect any SQL database, any LLM, any vector storage, and any front-end. You could run everything locally if you wanted to.

We have an article that did pretty well talking about how you can “Build a Chatbot for your SQL database in 20 lines of Python using Streamlit and Vanna” which did reasonably well.

Then one of our users posted about us on Hacker News.

It blew up in a way that I was not expecting. We got a huge influx of users, stars on GitHub, tons of people joining our Discord, people submitting issues, pull requests, and everything else.

Thank you to everyone who’s helped us get here.

This is still just the beginning. ⏩


Origin: Blending Fine-Tuning and RAG for Collaborative Filtering with LLMs | by Anthony Alcaraz | Nov, 2023 | Medium
References:
Created 2023-11-07