How RAG boosts LLM accuracy and reduces AI hallucination

By Jayesh Shinde | Updated on 23-Jul-2024

23-Jul-2024

By now, I don’t doubt that you all have used a variety of LLM (Large Language Model) based GenAI chatbots – like OpenAI’s ChatGPT, Google Gemini or Microsoft’s Copilot (in the Edge browser). But having used them for Generative AI content, have you ever wondered how these AI assistants manage to answer your questions with such seemingly expert fluency, even on unfamiliar topics? And what about the innate risk of the AI model enthusiastically generating responses that are inaccurate, misleading, or even hallucinatory in nature?

This is where modern LLMs have unlocked a powerful technique called Retrieval-Augmented Generation, or just RAG for short. Unlike most commonly available GenAI chatbot models which rely solely on their own internal knowledge accumulated during their training phase, RAG incorporates a much-needed and exciting twist: it consults external sources to enhance the overall quality and accuracy of its generative responses. IBM defines RAG as an AI framework for retrieving facts from an external knowledge base to ensure LLMs have access to the most accurate, up-to-date information for generating relevant responses to user prompts.

How does the RAG AI model work?

In order to better understand what’s happening behind the scenes, imagine a librarian with an encyclopaedic memory – that’s essentially what an LLM aspires to be, in effect. LLMs are trained on massive amounts of text data, allowing them to generate human-quality text, translate languages, and answer your questions in an easy-to-understand fashion – just as a human librarian would type out a response for your query. But just like a librarian can’t possibly know everything and has finite limits of time and quantity in terms of their accumulated knowledge, so can LLMs sometimes struggle with factual accuracy or dealing out responses with the most up-to-date information. This is where RAG steps in, acting as the ever-reliable research assistant that empowers the LLM to deliver more trustworthy and relevant responses both in terms of recency of information and its overall accuracy.

Also read: AI hallucination in LLM and beyond: Will it ever be fixed?

According to Luis Lastras, director of language technologies at IBM Research, “In a RAG system, you are asking the model to respond to a question by browsing through the content in a book, as opposed to trying to remember facts from memory.” This helps in reducing hallucination in AI chatbots’ response, for example.

Further expanding the librarian analogy, think of RAG acting like a constantly updated research database at the librarian’s fingertips. Imagine the librarian can instantly access and integrate the latest scholarly articles, news reports, and factual resources alongside their own vast knowledge. This is the power of RAG – it empowers the LLM (librarian) to generate responses that are not only human-quality but also grounded in the most current and reliable information available out there.

RAG = Retriever + Generator

Internally, RAG relies on two key elements working in tandem – the retriever and generator, for a context pipeline and action pipeline, respectively. The retriever element acts like a skilled researcher, scouring the knowledge store for relevant information based on a user’s query. The second element – the generator – functions similar to a creative writer, using the retrieved information to craft a response that is both informative and factually coherent.

RAG works by first meticulously analysing the user’s query to grasp its meaning and intent. Then, it makes use of a powerful retrieval component that acts like a super-powered search engine, sifting through a vast knowledge base of text documents, articles, or specific databases. Based on the user’s question, RAG retrieves the most relevant passages that hold the most pertinent information. This retrieved information is then presented to the LLM, acting as a source of truth and factual grounding for the LLM’s response generation process. This interplay between the knowledge store and RAG’s internal components is what fuels its ability to deliver reliable and insightful responses, and its innate grounding to a set of facts for generating a response is what helps reduce AI hallucination in LLMs and GenAI chatbots.

Let’s take the example of ChatGPT to understand how RAG works in practice. If you prompt ChatGPT with a question asking for the latest developments in the field of quantum computing, it may not have the latest facts solely as part of its internal knowledge (accumulated during its training phase, which would have happened a few days / weeks / months ago most likely). So in this case, if ChatGPT is smart, it would first use the retriever component of RAG to search for relevant articles or passages on the topic with the latest information from trusted sources. After retrieving this information, ChatGPT would then use the latest information on quantum computing into its generation process, producing a response that is not only coherent but also grounded in the most recent factually accurate data.

Applications of RAG

By implementing the RAG AI framework, GenAI applications can open a whole host of possibilities that LLMs otherwise lacked. As NVIDIA rightly suggests, RAG powered LLMs essentially allow users to have conversations with data repositories (information stored in files and folders), opening up new kinds of GenAI experiences.

Think of a doctor who can consult an AI assistant loaded with medical journals that’s present securely inside a hospital’s knowledge bank. This AI assistant could instantly answer questions, suggest treatment options, and even help with diagnoses. Similarly, financial advisors could get the latest trusted information from AI assistants that understand market trends and data powered by Bloomberg terminals or securities exchange portals.

But RAG isn’t just for doctors and finance experts. Any business can use RAG to turn their existing resources, like training manuals, customer service logs, or even videos, into a treasure trove of knowledge for AI. This is already becoming a game-changer for things like customer support chats, employee training programs, and even helping developers write code faster.

Limitations of RAG

While RAG is a powerful AI technique for improving any LLM’s generative accuracy, it’s not without some limitations. The quality of RAG’s responses is only as good as its underlying knowledge base. Biases or errors in these sources can be perpetuated by RAG, potentially misleading users by failing to do its most fundamental job. For example, if a RAG-based AI model being used in a healthcare application is referencing Wikipedia as its main source of latest information, it may not lead to the best possible outcome in terms of reliable generative content. This highlights the need for critical evaluation of the information RAG retrieves from its reference sources on a continual basis.

Accurately interpreting user intent is another challenge for RAG AI models. OpenAI’s research indicates that even advanced LLMs struggle with tasks requiring complex reasoning and understanding user intent, according to a Forbes report. If LLMs misunderstand the user’s query, RAG might still retrieve irrelevant information, leading to a less-than-ideal response.

Keeping knowledge bases current is an ongoing battle, as a McKinsey report estimates the doubling of global data volume every two years. This constantly growing, never-ending sea of information makes it difficult to ensure the retrieved data reflects the latest developments every single time.

Despite these limitations, RAG is a significant step forward for LLMs. Advancements in areas like knowledge base curation and enhanced contextual understanding can help address these limitations and make AI-powered responses even more reliable and trustworthy in the near future.

Also read: Rise of NPU: The most important ingredient of AI PCs

Jayesh Shinde

Executive Editor at Digit. Technology journalist since Jan 2008, with stints at Indiatimes.com and PCWorld.in. Enthusiastic dad, reluctant traveler, weekend gamer, LOTR nerd, pseudo bon vivant. View Full Profile