Prompt design and sources curated by Greg Walters
At its core, RAG is a hybrid model that marries the depth and breadth of information retrieval with the sophisticated understanding and generation capabilities of language models. Traditional language models, while adept at generating coherent and contextually relevant text, are limited by the information they were trained on. They can't access or incorporate new information post-training, which limits their applicability in dynamic, real-world scenarios where up-to-date information is crucial.
Enter RAG, which addresses this limitation by dynamically retrieving information from external databases or documents during the generation process. This approach allows the model to pull in the most current and relevant information, ensuring that its responses are not just contextually appropriate but also factually accurate and up-to-date.
How does RAG work?
The process involves two key stages: retrieval and generation. In the retrieval stage, the model queries a large database or set of documents based on the input it receives. This query returns a set of documents or passages that are likely to contain relevant information. Next, in the generation stage, the model uses this retrieved information, along with the original input, to generate a response. This response is not only informed by the model's training but also enriched by the specific, real-time information it has just accessed.
“We definitely would have put more thought into the name had we known our work would become so widespread,” said Patrick Lewis, lead author of the 2020 paper that coined the term RAG, in an interview from Singapore, where he was sharing his ideas with a regional conference of database developers. “We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea,” said Lewis, who now leads a RAG team at AI startup Cohere.
Topics & Writers