Skip to content

RAG

RAG means Retrieval-Augmented Generation. It is a pattern where the workflow searches trusted content first, then gives the matching content to the model before asking it to answer.

Use RAG when an answer must be grounded in specific documents, page content, policies, product data, or knowledge that may not be in the model.

A chat model answers from the context it can see and from patterns learned during training. If your workflow asks about private, recent, or highly specific information without providing that information, the model may guess.

RAG changes the question from:

“What do you know about this?”

to:

“Using these retrieved sources, answer this question.”

flowchart TD
  Docs["Source documents or page text"] --> Split["Split into chunks"]
  Split --> Embed["Create embeddings"]
  Embed --> Store["Store in vector store"]
  Question["User question"] --> QEmbed["Embed question"]
  QEmbed --> Search["Search vector store"]
  Store --> Search
  Search --> Context["Top matching chunks"]
  Context --> Prompt["Prompt with sources"]
  Prompt --> Model["Chat model"]
  Model --> Answer["Answer with source grounding"]

  style Store fill:#e8f5e9,stroke:#2e7d32
  style Search fill:#e1f5fe,stroke:#0277bd
  style Answer fill:#fff3e0,stroke:#ef6c00

There are two phases: indexing and answering. Indexing prepares the knowledge. Answering searches that knowledge and asks the model to respond from the retrieved context.

DecisionTradeoff
Chunk sizeSmaller chunks improve precision; larger chunks preserve context
Chunk overlapHelps avoid cutting important meaning at boundaries
MetadataLets you filter by source, date, page, category, or customer
Embedding modelAffects search quality and cost
Re-index scheduleKeeps answers current when source content changes

Bad indexing leads to bad retrieval. If the correct source never appears in search results, the model cannot reliably answer from it.

DecisionTradeoff
Number of resultsToo few misses context; too many adds noise
Similarity thresholdHigher thresholds reduce weak matches but may return nothing
Prompt wordingShould tell the model to answer only from provided sources
Citation formatHelps users verify the answer
Fallback behaviorDefines what to do when retrieval finds nothing useful

A grounded answer should be traceable to retrieved context. It should not add unsupported facts just because they sound plausible.

flowchart LR
  Retrieved["Retrieved source text"] --> Claim["Answer claim"]
  Claim --> Verify{"Can the claim be traced?"}
  Verify -->|Yes| Keep["Keep in answer"]
  Verify -->|No| Remove["Remove or mark unknown"]
SymptomLikely causeFix
Answer is genericNo relevant context was retrievedImprove chunking, metadata, or query
Answer mixes sources incorrectlyToo many unrelated chunksLower result count or add filters
Answer invents detailsPrompt allows guessingRequire “not found in sources” fallback
Correct source is missingIndex is stale or incompleteRe-index source content
Answer ignores sourcePrompt is too broadPut source-grounding rules near the task

For a browser-based RAG workflow:

  1. Extract content with Get All Text or URL content nodes.
  2. Split text with a text splitter dependency.
  3. Create embeddings with an embeddings dependency.
  4. Store content in Local Knowledge.
  5. Ask with RAG Agent or a Tools Agent connected to retrieval.