The problem RAG solves

You have tried asking ChatGPT about your business and it made things up. You asked Claude to answer customer questions and it gave generic responses. The AI is smart - it just does not know anything about your company. It has never seen your product documentation, your pricing, your policies, your internal processes, or your customer data.

This is the fundamental limitation of general-purpose AI models. They were trained on public internet data. They know about the world in general, but they know nothing about your business in particular. Ask Claude about your returns policy and it will either hallucinate a plausible-sounding policy that is completely wrong, or tell you it does not have that information.

Fine-tuning - training a model on your data - sounds like the obvious solution, but it is expensive (thousands to tens of thousands of pounds), slow (weeks to set up), requires clean training data (which most businesses do not have), and needs to be redone every time your information changes. For most businesses, fine-tuning is overkill and impractical.

RAG is the alternative that actually works.

What RAG is - the library analogy

RAG stands for Retrieval-Augmented Generation. The name is technical but the concept is simple. Think of it like hiring a brilliant researcher who has never worked in your industry.

If you sit them down and ask them questions cold, they will give you general answers based on what they already know. Some will be useful. Some will be wrong. They are smart but uninformed.

Now give that researcher full access to your company library - every document, policy, product spec, FAQ, training manual, and customer communication you have ever written. Tell them: before you answer any question, go find the most relevant documents first, read them, and then answer based on what you found.

That is RAG. The AI model is the smart researcher. Your business documents are the library. The retrieval system is what finds the right documents. The generation step is the AI composing an answer based on what it retrieved.

The result is an AI that answers questions using your actual data, cites its sources, and stays current as your documents change - without any model training or fine-tuning.

How RAG works: step by step

Here is what happens under the hood, explained without assuming any technical background.

Step 1 - Prepare your documents

Your business documents - knowledge bases, SOPs, product docs, FAQs, policies, training manuals, customer guides - are collected and broken into chunks. Chunking is simply cutting long documents into smaller, manageable pieces. A 50-page employee handbook might be broken into 200 chunks of a few paragraphs each.

The chunk size matters. Too small and each chunk lacks context - a paragraph about your returns policy is less useful without the surrounding information about what qualifies for a return. Too large and the retrieval becomes imprecise - sending the entire handbook when someone asks about parking is wasteful and confusing. Finding the right chunk size for your content type is part of the setup process.

Step 2 - Create embeddings

Each chunk is converted into a vector embedding - a list of numbers that represents the meaning of that text. This is not keyword matching. The embedding for "our refund policy for sale items" and "can I return something I bought on discount?" would be very similar numerically, even though they share almost no words.

This is the breakthrough that makes RAG work. Traditional search finds documents that contain the same words as your query. Vector search finds documents that mean the same thing as your query. For business knowledge bases where people ask questions in dozens of different ways, this is transformative.

Step 3 - Store the vectors

The embeddings are stored in a vector database alongside the original text chunks. We typically use Supabase with pgvector, which means your vectors live in the same PostgreSQL database as the rest of your application data. No separate infrastructure to manage. Other options include Pinecone, Weaviate, and Qdrant, but for most business applications, pgvector is simpler and cheaper.

Step 4 - Retrieve at query time

When someone asks a question, that question is also converted into a vector embedding using the same model. The system then performs a similarity search against all your stored vectors, finding the chunks that are semantically closest to the question. Typically, the top 5-10 most relevant chunks are retrieved.

This happens in milliseconds. Even with tens of thousands of document chunks, vector similarity search is fast.

Step 5 - Generate an answer

The retrieved chunks are sent to the AI model along with the original question and a system prompt that says something like: "Answer the user's question based only on the following context. If the context does not contain enough information to answer, say so." The AI reads the relevant documents and composes a natural-language answer grounded in your actual data.

Because the AI is told to answer only from the provided context, it is far less likely to hallucinate. It is working from your documents, not making things up from its general training data.

Real business use cases

RAG is not theoretical. Here are concrete applications we build for clients as part of our agent development service.

Customer support agent

A customer asks: "What is your returns policy for items bought on sale?"

Without RAG, the AI either makes something up or gives a generic answer. With RAG, it retrieves your actual returns policy document, finds the section about sale items, and gives an accurate answer with the correct timeframes, conditions, and any exceptions specific to discounted products.

Scale this across hundreds of policy questions, product queries, and account-specific issues and you have a support agent that handles 60-80% of incoming questions accurately, available 24/7, with instant response times. Human agents focus on the complex cases that actually need a person.

Internal knowledge assistant

An employee asks: "What is the process for requesting time off if I am a contractor?"

The RAG system retrieves the relevant HR policy, distinguishes between employee and contractor processes, and gives the specific steps with the correct approval chain. It knows the difference because it is reading from your actual HR documentation, not guessing based on general knowledge of how companies typically handle time-off requests.

For companies with 50+ employees, an internal knowledge assistant reduces the load on HR and operations teams dramatically. New starters get instant answers to onboarding questions. Managers find policy information without trawling through SharePoint. The knowledge assistant works at 3am when no one from HR is available.

Document Q&A for complex domains

A legal firm uploads 500 contracts to a RAG system. A partner asks: "Which of our client contracts have change-of-control clauses, and what are the notification periods?" The system retrieves the relevant clause from each contract and provides a summary with specific references.

This kind of analysis would take a junior associate days of manual review. The RAG system provides a first draft in seconds, which the associate can then verify and refine. It does not replace legal judgement - it eliminates the tedious search-and-compile work that consumes most of the time.

Sales enablement

A salesperson asks: "What features do we have that competitor X does not?"

The system retrieves your competitive analysis docs, product feature list, and recent release notes to give an up-to-date comparison. Because you update the source documents when features launch or competitors change, the AI's answers stay current automatically - no retraining needed.

Costs: what to expect

RAG is significantly cheaper than fine-tuning. Here is a rough breakdown for a typical business deployment.

Document processing (one-time): Converting your documents to embeddings costs pennies per document. Processing an entire knowledge base of 1,000 documents might cost $5-20 in embedding API fees.

Vector storage: With Supabase pgvector, this is included in your existing database hosting. With a dedicated vector database like Pinecone, expect $70-200/month depending on data volume.

Query costs: Each question requires an embedding call (fraction of a penny) plus an LLM call to generate the answer (typically 1-5 pence per query depending on the model and context length). At 1,000 queries per day, you are looking at $30-150/month in AI API costs.

Total cost for most businesses: $100-500/month in running costs once built, depending on volume. Compared to a single customer support agent's salary, this is transformative economics.

RAG vs fine-tuning: when to use each

| Factor | RAG | Fine-tuning | |--------|-----|-------------| | Setup time | Hours to days | Weeks to months | | Cost | Low (document processing) | High (GPU training) | | Updating knowledge | Add new documents instantly | Retrain the model | | Accuracy | Cites specific sources | Can hallucinate training data | | Data privacy | Data stays in your infrastructure | Data sent to training pipeline | | Best for | Factual Q&A, document search, support | Tone/style changes, specialised reasoning |

Use RAG when you need the AI to answer questions using specific, up-to-date business information. Use fine-tuning when you need the AI to reason differently - for example, writing in your exact brand voice or performing industry-specific analysis that requires specialised knowledge baked into the model's weights.

For most business applications, RAG is the right choice. Fine-tuning is for edge cases where the model needs to think differently, not just know more.

Limitations to be honest about

RAG is powerful but not magic. Here are the real limitations.

Quality depends on your documents. RAG is only as good as your source material. If your knowledge base is outdated, contradictory, or poorly written, your AI will give outdated, contradictory, or confusing answers. The first step in any RAG project is auditing and cleaning your documents. This is unsexy but essential.

It cannot reason beyond the documents. RAG retrieves and synthesises. It does not create new knowledge. If the answer to a question is not in your documents (even implicitly), RAG cannot help. It will either say it does not know (good) or stretch the available context to give a partial answer (less good).

Complex multi-step questions are harder. "What is our returns policy?" works brilliantly. "Compare our Q3 performance across all regions, factor in the seasonal adjustment methodology from the finance handbook, and recommend next quarter's targets" is asking RAG to retrieve from multiple document types and reason across them - which is possible but requires careful architecture.

Retrieval quality is everything. The AI's answer is only as good as the documents it retrieves. If the retrieval step returns the wrong documents - because the question is ambiguous, the chunks are poorly structured, or the embedding model struggles with your domain terminology - the answer will be wrong no matter how smart the AI model is. Testing and tuning retrieval quality is the most important part of a RAG deployment.

Getting started

RAG is built into most of the AI agents we develop at Bloodstone. Whether you need a customer-facing support agent, an internal knowledge assistant, or a document search tool, RAG is the architecture we use to ground AI in your actual business data.

The first step is understanding what documents you have and what questions people need answered. From there, we can scope and build a RAG-powered agent typically in 2-3 weeks. The process involves auditing your documents, setting up the vector infrastructure, building the retrieval and generation pipeline, and testing with real questions from your team.

If you are exploring how AI can help your business but want it grounded in your actual data rather than generic responses, let's talk about your use case. We will tell you honestly whether RAG is the right approach, what it will cost, and how long it will take.

For a broader view of how AI strategy fits into your business, our AI strategy service covers everything from identifying the right use cases to building a roadmap for implementation.

RAG Explained: How to Make AI Actually Understand Your Business