Search
Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation is a technique that can provide more accurate results to queries than a generative large language model on its own because RAG uses knowledge external to data already contained in the Large Language Models (LLMs).
- vector database
- alternative to training your own model
RAG is specifically designed as a technique to enhance existing LLMs without retraining them. RAG is an extension technique, not a training technique.
# AI Architectures
RAG in combination with GenBI:
flowchart TB subgraph "User Interface Layer" Human["👤 Human User"] BITool["🖥️ BI Tool Interface"] end subgraph "LLM Integration Layer" LLMPool["Available LLMs Pool"] OpenAI & Claude & Gemini & Others --> LLMPool end subgraph "Natural Language Processing Layer" NLP["Natural Language Processing"] RAG["Retrieval-Augmented Generation (RAG)"] QueryEngine["Query Understanding Engine"] end subgraph "Data Layer" MetricsLayer["📊 Metrics Layer"] DataSources["💾 Data Sources"] VectorStore["Vector Store"] end %% User Interface Connections Human <-->|"Prompts/Feedback"| BITool BITool -->|"Natural Language Query"| NLP %% LLM Processing Flow LLMPool -->|"Base Knowledge"| NLP NLP -->|"Query Processing"| QueryEngine NLP -->|"Context Enhancement"| RAG %% Data Layer Connections RAG -->|"Retrieve Context"| VectorStore VectorStore -->|"Indexed Metrics"| MetricsLayer QueryEngine -->|"Execute Queries"| DataSources MetricsLayer -->|"Metric Definitions"| DataSources %% Feedback Loops QueryEngine -.->|"Query Refinement"| NLP RAG -.->|"Context Refinement"| NLP classDef interface fill:#f9f,stroke:#333,stroke-width:2px classDef storage fill:#fcf,stroke:#333,stroke-width:2px classDef process fill:#fff,stroke:#333,stroke-width:2px class Human,BITool interface class DataSources,VectorStore storage class NLP,RAG,QueryEngine process
Another image by Aurimas Griciūnas ( source)
It has come to my attention that some of you have not realized how central the database is to RAG implementations. So I drew it out for y’all. As Hubert said: “It is a database + LLM”:
https://twitter.com/gwenshap/status/1768709687636271520 ^95372c
Knowledge Graphs as a source of trust for LLM-powered enterprise question answering by Juan Sequeda
Origin:
Daniel Svonava on LinkedIn: RAG + Knowledge Graphs cut customer support resolution time by 29.6%. 📉… | 85 comments
Source: Data Engineering Whitepapers
# Dead within a Year
I have similar thoughts. As we re-create what we have in data engineering already. There are already Vector Databases, or databases that support vectors or vector-based indices. I believe that more databases will support this use case instead of building completely new databases.
And the key piece is where the transformation happens. Not that we lock them away into another silo, we should add it to the orchestrators instead.
Here is another opinion:
My prediction is that RAG will be dead in a year or two.
RAG emerged as a workaround for LLMs’ limited context, retrieving external information to provide factual grounding. But RAG introduces many challenges on its own.
Two promising research directions towards LLMs overcoming these limitations intrinsically:
- Faster attention mechanisms like Flash Attention are enabling far larger context sizes. LLMs like Claude 2.1 now support 200k tokens - the size of a book! As context size increases, RAG becomes redundant.
- New architectures like Mamba and RWKV aim to replace transformers, reducing quadratic complexity for more efficient and scalable sequence modeling.
RAG allowed more knowledge to be included in the limited context of large language models. But if context capacity becomes virtually unlimited, RAG would be unnecessary, as the entire knowledge base could live within the context. Donato Riccio on LinkedIn: The End of RAG? | 39 comments, full article The End of RAG.
# Examples
# History
From Vanna.ai LinkedIn:
This started as a quick experiment back in May. Could we get ChatGPT to generate SQL for a stock market database? It worked ok as a proof of concept but only worked about 50% of the time. We were using OpenAI’s now-deprecated text-davinci-003 model. That feels like a lifetime ago now.
Then I stumbled upon a Hacker News article that talked about how to use the OpenAI “embeddings” API to generate “vectors” that you could use for semantic similarity search and then use that information as part of an LLM prompt. I actually built my own vector database in Go to store the embeddings, which seems like a crazy thing to do but no good self-hosted options existed at the time and now there are hundreds. We actually briefly considered becoming a vector database company.
We found that if you store enough example correct SQL queries and just search for queries that answer similar questions, the LLM can reliably use that example query to generate SQL that’s correct >90% of the time.
Now, the industry has a term for this method. It’s called RAG (Retrieval Augmented Generation).
More:
It worked really really well for our financial database. So we built a web app around it and generalized it so it would work with any SQL database. We released the web app… and the demos went really well but we found that people were unwilling to put their live database credentials into a random web app on the internet. We also talked to a number of large companies and they needed to customize it in ways that we couldn’t do with a small team.
So we decided to go open-source. This technology is going to revolutionize the way that people interact with databases — but only if they actually try it with their live database.
After workshopping many, many, many versions (we’re on the 35th release as of today), we made something that not only works really well, it works with any configuration that you need (and that your IT security will allow).
We’ve settled on a framework that lets you connect any SQL database, any LLM, any vector storage, and any front-end. You could run everything locally if you wanted to.
We have an article that did pretty well talking about how you can “Build a Chatbot for your SQL database in 20 lines of Python using Streamlit and Vanna” which did reasonably well.
Then one of our users posted about us on Hacker News.
It blew up in a way that I was not expecting. We got a huge influx of users, stars on GitHub, tons of people joining our Discord, people submitting issues, pull requests, and everything else.
Thank you to everyone who’s helped us get here.
This is still just the beginning. ⏩
Origin:
Blending Fine-Tuning and RAG for Collaborative Filtering with LLMs | by Anthony Alcaraz | Nov, 2023 | Medium
References:
Created 2023-11-07