How RAG Works – Step by Step

Listen to this Post

Featured Image
Retrieval-Augmented Generation (RAG) enhances language models by allowing them to pull information from external sources rather than relying solely on pre-trained knowledge. Below is a detailed breakdown of how RAG operates:

Step-by-Step RAG Process

1️⃣ User Query Input

  • A user submits a question, e.g., “Can it do PDF exports?”
  • The question may lack context, requiring refinement.

2️⃣ LLM Rephrases the Question

  • The LLM converts the query into a standalone question using chat history:
    “What features does the Pro plan include? Can it export PDFs?”

3️⃣ Semantic Search Activation

  • The standalone question is converted into a vector embedding.
  • A vector database (e.g., ChromaDB, FAISS, Pinecone) retrieves relevant document chunks via similarity search.

4️⃣ Prompt Assembly

  • A QA Chain combines:
  • The standalone question
  • Retrieved context chunks
  • Predefined answer template
  • Example prompt structure:
    Answer the question based on the context: 
    Question: {standalone_question} 
    Context: {retrieved_documents} 
    

5️⃣ LLM Processes the Full Prompt

  • The model generates an answer using both its internal knowledge and the retrieved external data.

6️⃣ Final Answer Generation

  • The response is context-aware, ensuring accuracy beyond the model’s original training.

Use Cases for RAG

✔ Internal Knowledge Assistants (e.g., company docs)

✔ Support Chatbots (dynamic FAQ responses)

✔ Legal & Policy Q&A (up-to-date compliance info)

You Should Know: Essential RAG Implementation Commands & Code

1. Setting Up a Vector Database (ChromaDB)

import chromadb

Initialize ChromaDB client 
client = chromadb.Client()

Create a collection 
collection = client.create_collection("knowledge_base")

Add documents with embeddings 
collection.add( 
documents=["The Pro plan includes PDF exports.", "Enterprise has API access."], 
ids=["doc1", "doc2"] 
)

Query the database 
results = collection.query( 
query_texts=["Does the Pro plan support PDF exports?"], 
n_results=2 
) 
print(results) 

2. Generating Embeddings (Using OpenAI)

from openai import OpenAI

client = OpenAI(api_key="your_api_key")

response = client.embeddings.create( 
input="What features does the Pro plan include?", 
model="text-embedding-3-small" 
)

embedding = response.data[bash].embedding 
print(embedding) 

3. Semantic Search with FAISS

import faiss 
import numpy as np

Generate random embeddings (example) 
dim = 768  Embedding dimension 
data = np.random.rand(100, dim).astype('float32')

Build FAISS index 
index = faiss.IndexFlatL2(dim) 
index.add(data)

Perform a search 
query_embedding = np.random.rand(1, dim).astype('float32') 
k = 3  Number of nearest neighbors 
distances, indices = index.search(query_embedding, k) 
print("Nearest docs:", indices) 

4. Running a QA Chain (LangChain Example)

from langchain.chains import RetrievalQA 
from langchain.llms import OpenAI

qa_chain = RetrievalQA.from_chain_type( 
llm=OpenAI(), 
chain_type="stuff", 
retriever=vector_db.as_retriever() 
)

response = qa_chain.run("Can the Pro plan export PDFs?") 
print(response) 

What Undercode Say

RAG bridges the gap between static LLM knowledge and dynamic real-world data. By integrating vector databases and semantic search, AI applications become more accurate and adaptable. Future advancements may include:
– Hybrid search (combining keyword + vector search)
– Self-updating knowledge bases (automated doc ingestion)
– Multi-modal RAG (text + images + audio)

For cybersecurity applications, RAG can enhance threat intelligence by pulling the latest CVE databases or malware analysis reports.

Expected Output

A fully functional RAG system that retrieves and generates answers based on external knowledge, improving accuracy and reducing hallucinations in AI responses.

Prediction

RAG will become a standard in enterprise AI, reducing dependency on fine-tuning and enabling real-time knowledge updates. Future models may integrate self-correcting RAG, where incorrect retrievals trigger automatic re-searches.

(URLs for further reading: LangChain RAG, ChromaDB Docs)

IT/Security Reporter URL:

Reported By: Ninadurann How – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram