RAG
RAG means Retrieval-Augmented Generation. It is a pattern where the workflow searches trusted content first, then gives the matching content to the model before asking it to answer.
Use RAG when an answer must be grounded in specific documents, page content, policies, product data, or knowledge that may not be in the model.
Why RAG exists
Section titled “Why RAG exists”A chat model answers from the context it can see and from patterns learned during training. If your workflow asks about private, recent, or highly specific information without providing that information, the model may guess.
RAG changes the question from:
“What do you know about this?”
to:
“Using these retrieved sources, answer this question.”
The RAG pipeline
Section titled “The RAG pipeline”flowchart TD Docs["Source documents or page text"] --> Split["Split into chunks"] Split --> Embed["Create embeddings"] Embed --> Store["Store in vector store"] Question["User question"] --> QEmbed["Embed question"] QEmbed --> Search["Search vector store"] Store --> Search Search --> Context["Top matching chunks"] Context --> Prompt["Prompt with sources"] Prompt --> Model["Chat model"] Model --> Answer["Answer with source grounding"] style Store fill:#e8f5e9,stroke:#2e7d32 style Search fill:#e1f5fe,stroke:#0277bd style Answer fill:#fff3e0,stroke:#ef6c00
There are two phases: indexing and answering. Indexing prepares the knowledge. Answering searches that knowledge and asks the model to respond from the retrieved context.
Indexing decisions
Section titled “Indexing decisions”| Decision | Tradeoff |
|---|---|
| Chunk size | Smaller chunks improve precision; larger chunks preserve context |
| Chunk overlap | Helps avoid cutting important meaning at boundaries |
| Metadata | Lets you filter by source, date, page, category, or customer |
| Embedding model | Affects search quality and cost |
| Re-index schedule | Keeps answers current when source content changes |
Bad indexing leads to bad retrieval. If the correct source never appears in search results, the model cannot reliably answer from it.
Answering decisions
Section titled “Answering decisions”| Decision | Tradeoff |
|---|---|
| Number of results | Too few misses context; too many adds noise |
| Similarity threshold | Higher thresholds reduce weak matches but may return nothing |
| Prompt wording | Should tell the model to answer only from provided sources |
| Citation format | Helps users verify the answer |
| Fallback behavior | Defines what to do when retrieval finds nothing useful |
What “grounded” means
Section titled “What “grounded” means”A grounded answer should be traceable to retrieved context. It should not add unsupported facts just because they sound plausible.
flowchart LR
Retrieved["Retrieved source text"] --> Claim["Answer claim"]
Claim --> Verify{"Can the claim be traced?"}
Verify -->|Yes| Keep["Keep in answer"]
Verify -->|No| Remove["Remove or mark unknown"]
Common failure modes
Section titled “Common failure modes”| Symptom | Likely cause | Fix |
|---|---|---|
| Answer is generic | No relevant context was retrieved | Improve chunking, metadata, or query |
| Answer mixes sources incorrectly | Too many unrelated chunks | Lower result count or add filters |
| Answer invents details | Prompt allows guessing | Require “not found in sources” fallback |
| Correct source is missing | Index is stale or incomplete | Re-index source content |
| Answer ignores source | Prompt is too broad | Put source-grounding rules near the task |
Agentic WorkFlow pattern
Section titled “Agentic WorkFlow pattern”For a browser-based RAG workflow:
- Extract content with Get All Text or URL content nodes.
- Split text with a text splitter dependency.
- Create embeddings with an embeddings dependency.
- Store content in Local Knowledge.
- Ask with RAG Agent or a Tools Agent connected to retrieval.