RAG

RAG means Retrieval-Augmented Generation. It is a pattern where the workflow searches trusted content first, then gives the matching content to the model before asking it to answer.

Use RAG when an answer must be grounded in specific documents, page content, policies, product data, or knowledge that may not be in the model.

Why RAG exists

A chat model answers from the context it can see and from patterns learned during training. If your workflow asks about private, recent, or highly specific information without providing that information, the model may guess.

RAG changes the question from:

“What do you know about this?”

to:

“Using these retrieved sources, answer this question.”

The RAG pipeline

flowchart TD
  Docs["Source documents or page text"] --> Split["Split into chunks"]
  Split --> Embed["Create embeddings"]
  Embed --> Store["Store in vector store"]
  Question["User question"] --> QEmbed["Embed question"]
  QEmbed --> Search["Search vector store"]
  Store --> Search
  Search --> Context["Top matching chunks"]
  Context --> Prompt["Prompt with sources"]
  Prompt --> Model["Chat model"]
  Model --> Answer["Answer with source grounding"]

  style Store fill:#e8f5e9,stroke:#2e7d32
  style Search fill:#e1f5fe,stroke:#0277bd
  style Answer fill:#fff3e0,stroke:#ef6c00

There are two phases: indexing and answering. Indexing prepares the knowledge. Answering searches that knowledge and asks the model to respond from the retrieved context.

Indexing decisions

Decision	Tradeoff
Chunk size	Smaller chunks improve precision; larger chunks preserve context
Chunk overlap	Helps avoid cutting important meaning at boundaries
Metadata	Lets you filter by source, date, page, category, or customer
Embedding model	Affects search quality and cost
Re-index schedule	Keeps answers current when source content changes

Bad indexing leads to bad retrieval. If the correct source never appears in search results, the model cannot reliably answer from it.

Answering decisions

Decision	Tradeoff
Number of results	Too few misses context; too many adds noise
Similarity threshold	Higher thresholds reduce weak matches but may return nothing
Prompt wording	Should tell the model to answer only from provided sources
Citation format	Helps users verify the answer
Fallback behavior	Defines what to do when retrieval finds nothing useful

What “grounded” means

A grounded answer should be traceable to retrieved context. It should not add unsupported facts just because they sound plausible.

flowchart LR
  Retrieved["Retrieved source text"] --> Claim["Answer claim"]
  Claim --> Verify{"Can the claim be traced?"}
  Verify -->|Yes| Keep["Keep in answer"]
  Verify -->|No| Remove["Remove or mark unknown"]

Common failure modes

Symptom	Likely cause	Fix
Answer is generic	No relevant context was retrieved	Improve chunking, metadata, or query
Answer mixes sources incorrectly	Too many unrelated chunks	Lower result count or add filters
Answer invents details	Prompt allows guessing	Require “not found in sources” fallback
Correct source is missing	Index is stale or incomplete	Re-index source content
Answer ignores source	Prompt is too broad	Put source-grounding rules near the task

Agentic WorkFlow pattern

For a browser-based RAG workflow:

Extract content with Get All Text or URL content nodes.
Split text with a text splitter dependency.
Create embeddings with an embeddings dependency.
Store content in Local Knowledge.
Ask with RAG Agent or a Tools Agent connected to retrieval.