Node | RAG | DataGen | Fine-Tuning | Privacy

Main Site | Docs | Download Free

MokingBird RAG — ai.mokingbird.xyz/mbrag

Build Production RAG
Pipelines in Minutes

From document to answer — 4 pipeline levels, 12 document formats, 8 LLM providers.
Local, cloud, or hybrid.

⬇ Get RAG — Free

12 Document Formats

8 LLM Providers

4 Pipeline Levels

100% Local Option

Free Download

Pipeline Levels

Document Formats

LLM Providers

Embedding Providers

Vector Stores

Retrieval Strategies

Context Enhancement Levels

Four Levels of Intelligence

Choose the pipeline that fits your latency and quality requirements. Scale up or down without changing your code.

BasicRAG

Fast, direct retrieval with no overhead. Ideal for prototyping, simple Q&A, and latency-critical tasks where speed matters most.

~200–500ms

Direct vector similarity search
Single-pass retrieval
Minimal context processing
No memory overhead

EnhancedRAG

Adds re-ranking, query expansion, and context refinement for significantly improved answer quality over L1.

~500ms–1.2s

Query expansion + reformulation
Multi-stage re-ranking
Context deduplication
Source attribution

Conversational

Persistent memory across turns with session-aware retrieval. Maintains conversation history for coherent multi-turn dialogues.

~800ms–1.8s

Session memory management
Context-aware retrieval
Turn-level coreference
History summarization

Advanced

Maximum intelligence pipeline with entity extraction, multi-hop reasoning, and full analytics. Built for production enterprise use cases.

~1.5s–4s

Multi-hop & chain-of-thought
Entity & relationship extraction
Hybrid dense + sparse retrieval
Full analytics & tracing

The 8-Step RetrievalOrchestrator

Every query runs through a deterministic 8-step pipeline. Observable, debuggable, and fully configurable at each stage.

Memory
Recall

history lookup

→

Query
Enhancement

expand + rewrite

→

Retrieval

vector + sparse

→

Context
Formatting

rank + dedupe

→

Prompt
Assembly

template + inject

→

LLM
Generation

stream output

→

Analytics
Recording

latency + quality

→

Memory
Update

persist session

Fully observable: every step emits structured logs and timing metadata. Use the built-in Analytics view to inspect latency, token counts, and retrieval quality for any query — in real time.

12 Document Formats.
Zero Friction.

Drop in your files — PDFs, spreadsheets, emails, code, or web URLs. MokingBird RAG handles ingestion automatically.

📄

PDF

.pdf

📝

Word Documents

.docx

📊

Excel

.xlsx .xls

📋

CSV Data

.csv

✏️

Markdown

.md .mdx

📃

Plain Text

.txt

🗄️

JSON

.json

📧

.eml .msg

📑

PowerPoint

.pptx .ppt

🌐

Web URLs

http/https

🖼️

Images

Florence-2

🎞️

MultiModal

mixed

Connect Everything.
Lock In Nothing.

Swap LLM providers, embedding models, or vector stores without rewriting your pipeline. MokingBird RAG stays provider-agnostic.

🤖

LLM Providers

8 supported providers

Ollama OpenAI Anthropic HuggingFace vLLM TGI llama.cpp Custom API

🔢

Embedding Providers

7 supported providers

SentenceTransformers Ollama OpenAI HuggingFace Cohere API Custom

🗃️

Vector Stores

4 supported backends

ChromaDB FAISS Pinecone Qdrant

Retrieval Strategies

Dense Vector Sparse BM25 Hybrid MMR Re-rank

6 Context Enhancement Levels

Dial up context quality from raw retrieval to full entity-aware enrichment. Each level adds measurable quality improvement.

Level	Name	Description	Latency	Quality Gain
L1	Raw	Chunks returned as-is from vector store. No processing.	+0ms	Baseline
L2	Deduped	Removes duplicate and near-duplicate chunks for cleaner context.	+30ms	+Low
L3	Re-ranked	Cross-encoder re-ranking for relevance-ordered context windows.	+80ms	+Medium
L4	Summarized	LLM-generated summaries reduce context length while preserving signal.	+200ms	+High
L5	Structured	Formats context as structured data for better prompt injection.	+150ms	+Very High
L6	Entity Extraction	Named entities and relationships extracted and injected as enriched context.	+350ms	+Maximum

Full Desktop Application
PySide6

Not a web app. Not a CLI tool. A full native desktop experience built with PySide6 — 12 interactive views, 7 themes, and everything local.

12 Interactive Views

Every part of your RAG pipeline has a dedicated view. Switch, configure, and monitor — all in one application.

Dashboard

Query

Chat

Documents

Collections

Retrievers

Context Levels

Memory

Pipeline Status

Analytics

Configuration

Settings

7 Built-in Themes

Dark (Default) Midnight Blue Deep Purple Solarized Mocha Nord High Contrast

See a sample interactive preview of the desktop app below. The full interactive design file is available when you download and install MokingBird RAG.

Interactive Preview

🔍

MokingBird RAG Desktop

This is a design preview of the MokingBird RAG desktop application built with PySide6.

Download and install to start using all 12 views, configure your LLM providers, and run fully local RAG pipelines.

Will be embedded on the live site · Interactive design file

MokingBird RAG — Query View

Ask a question about your documents...

Run →

PIPELINE: L2 EnhancedRAG · 3 sources retrieved · 487ms

Based on the documents in your collection, the answer is...

You Own It.
Completely.

No telemetry. No data leaving your machine. No subscriptions. No vendor lock-in.

🔒

Fully Offline

Run the entire stack — models, embeddings, vector store — without an internet connection.

🚫

Zero Telemetry

We collect nothing. No usage data, no crash reports, no analytics phoning home.

🧠

You Own the Models

Connect your own LLMs and embeddings. No dependency on our infrastructure.

🔓

No Subscriptions

Free to download and use. No seat licenses, no monthly fees, no paywalls.

🔌

No Vendor Lock-in

Swap providers, formats, and stores anytime. Your data and pipeline belong to you.

✓ Fully offline ✓ Zero telemetry ✓ You own the models

Download MokingBird RAG — Free

Built for Every Domain

From legal due diligence to customer support automation — MokingBird RAG adapts to your use case, not the other way around.

📡

Telecommunications

Network documentation Q&A, fault analysis, and technical specification retrieval for NOC teams.

⚖️

Legal

Contract review, case law retrieval, regulatory compliance search across large document libraries.

🏥

Medical

Clinical guideline lookup, research paper synthesis, and patient record Q&A — fully local and private.

💻

Software Engineering

Codebase Q&A, documentation search, architecture decision record (ADR) retrieval.

🔬

Research

Academic paper ingestion, citation search, and multi-document synthesis for researchers.

🏢

Enterprise IT

Internal knowledge bases, runbook retrieval, IT support automation over corporate documentation.

🎓

Education

Curriculum Q&A, lecture note search, and intelligent tutoring grounded in course materials.

💬

Customer Support

Product documentation Q&A, support ticket deflection, and knowledge base-grounded responses.

Start building today.
No signup required.

Downloads are preparing for launch. Join the notify list and we will email you when MokingBird RAG is available.

⬇ Get MokingBird RAG — Free Read the Docs

← Back to mbRAG

Blog

mbRAG — Articles

Technical deep-dives on retrieval-augmented generation with mbRAG

mbRAG

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

April 2026 · MokingBird Team

← Back to Blog

mbRAG

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

April 13, 2026 · MokingBird Team · Tags: mbRAG, RAG, local AI, LLM, accuracy

Retrieval-Augmented Generation is one of the most practically useful ideas in applied AI. Instead of asking a language model to answer from its training data alone, RAG retrieves relevant passages from your documents and gives them to the model as context. The result is answers that are grounded in your actual data.

The idea is simple. The implementation is where most frameworks fail.

The Problem with Existing RAG Frameworks

LangChain made RAG accessible. It also made it fragile. Breaking changes between versions. Vendor lock-in through obscured abstractions. Poor retrieval accuracy on anything beyond toy datasets.

The deeper problem is that most RAG implementations use a single retrieval strategy — typically simple cosine similarity — and tune nothing else. This breaks down for complex multi-part questions, documents with overlapping information, long documents where chunk context is lost, and queries that require reasoning across multiple passages.

mbRAG was built to solve these failure modes by implementing every major RAG approach from scratch in a unified, stable system.

Four Levels, One System

Level	Latency	Best for
L1 Basic	~0.8s	Fast lookup, simple Q&A
L2 Enhanced	~1.5s	Improved recall, standard workflows
L3 Smart	~2.2s	Context-aware, complex documents
L4 Advanced	~3.5s	Maximum accuracy, research-grade

6-Level Contextual Retrieval: The Signature Innovation

Standard RAG splits documents into chunks and indexes them. The problem: chunks lose context. MokingBird's 6-Level Contextual Retrieval enriches every chunk before indexing:

Level	Context added
L1	Document title and section heading
L2	+ Preceding paragraph summary
L3	+ Following paragraph summary
L4	+ Document-level metadata
L5	+ Entity relationships from surrounding text
L6	+ Full document summary as additional context signal

This is the primary driver of the 40–50% accuracy improvement on complex document queries.

What You Can Connect

17 document formats: PDF (with multi-engine fallback), DOCX, Excel, CSV, JSON, Markdown, PowerPoint, Email, Images with OCR, Web content, and more
10 LLM providers: OpenAI, Anthropic Claude, Google Gemini, Ollama, HuggingFace, vLLM, llama.cpp, and more
8 embedding providers including local sentence-transformers and Ollama
6 vector store backends: ChromaDB, FAISS, Qdrant, Pinecone, Weaviate, Milvus
6 retrieval strategies including Ensemble and Contextual retrieval

Fully Local, Fully Yours

mbRAG runs on your hardware. Your documents are indexed locally. Your vector stores are stored on your filesystem. If you use a local LLM (Ollama, llama.cpp), the entire pipeline runs without a single network request. Complete air-gap operation.

Download Free

mbRAG is available as part of MokingBird AI — free to download. Advanced features are available in the Premium tier. Download at ai.mokingbird.xyz.

---
title: "How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels"
date: "2026-04-13"
author: "MokingBird Team"
tags: ["mbRAG", "RAG", "retrieval-augmented generation", "local AI", "LLM", "accuracy"]
---

# How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

Retrieval-Augmented Generation is one of the most practically useful ideas in applied AI. Instead of asking a language model to answer from its training data alone — which is frozen at a cutoff date and doesn't include your specific documents — RAG retrieves relevant passages from your documents and gives them to the model as context. The result is answers that are grounded in your actual data.

The idea is simple. The implementation is where most frameworks fail.

---

## The Problem with Existing RAG Frameworks

LangChain made RAG accessible. It also made it fragile. Breaking changes between versions. Vendor lock-in through obscured abstractions. Poor retrieval accuracy on anything beyond toy datasets. A configuration surface that exposes implementation details you shouldn't have to care about.

The deeper problem is that most RAG implementations use a single retrieval strategy — typically simple cosine similarity over dense embeddings — and tune nothing else. This works acceptably for simple queries over well-structured documents. It breaks down for:

- Complex, multi-part questions
- Documents with overlapping or contradictory information
- Long documents where chunk context is lost
- Queries that require reasoning across multiple passages

mbRAG was built to solve these failure modes. Not by wrapping LangChain, but by implementing every major RAG approach from scratch in a unified, stable system.

---

## Four Levels, One System

mbRAG organizes its capabilities into four pipeline levels. You choose based on your latency requirements and accuracy needs.

### L1 Basic (~0.8s)
**Similarity search + direct generation**

Dense embedding lookup over your vector store, followed by direct LLM generation from the top-k retrieved chunks. Fast, reliable for straightforward Q&A, appropriate when latency matters more than maximum accuracy.

Best for: chatbots, real-time lookup, simple factual queries over clean documents.

### L2 Enhanced (~1.5s)
**Reranking + improved context**

Adds a reranking step after initial retrieval. Candidate chunks are scored a second time using a cross-encoder (or LLM-based reranker) that considers the full query-chunk pair — not just the embedding similarity. This step alone typically improves precision by 15–25% over L1 on complex queries.

Best for: knowledge bases, document Q&A, customer support systems.

### L3 Smart (~2.2s)
**Multi-query + ensemble retrieval**

Generates multiple query variations from your original question (multi-query expansion), retrieves results for each variation, and combines them using an ensemble approach that merges sparse retrieval (BM25) with dense retrieval. Covers more of the relevant result space and handles ambiguous queries significantly better.

Best for: research assistance, legal document review, technical documentation search.

### L4 Advanced (~3.5s)
**Full contextual retrieval + reasoning**

The full pipeline. Contextual retrieval at all 6 levels, multi-query expansion, ensemble retrieval, cross-encoder reranking, parent document retrieval (retrieve the full section around a relevant chunk rather than the chunk alone), and LLM-based reasoning over the assembled context.

This is the pipeline that delivers the 40–50% accuracy improvement over naive implementations on complex queries.

Best for: research-grade applications, sensitive document analysis, high-stakes Q&A where accuracy is critical.

---

## The 8-Step RetrievalOrchestrator

Every query through mbRAG (at any level) passes through the RetrievalOrchestrator — the core of the system. Here's what happens:

1. **Query analysis** — Parse the query, identify entity types, determine optimal retrieval strategy
2. **Query expansion** (L3+) — Generate multiple reformulations to cover different phrasings
3. **Embedding** — Embed the query using your configured embedding model
4. **Vector retrieval** — Fetch candidate chunks from your vector store
5. **Sparse retrieval** (L3+) — BM25 retrieval in parallel for ensemble fusion
6. **Reranking** (L2+) — Score candidates using cross-encoder for precision
7. **Context enhancement** — Apply contextual enrichment based on your level configuration
8. **Response generation** — Pass assembled context to the LLM with appropriate prompt structure

The result is a grounded response with source attribution — you know which passages informed each answer.

---

## 6-Level Contextual Retrieval: The Signature Innovation

Standard RAG splits documents into chunks and indexes them. The problem: chunks lose context. A chunk that says "the agreement was terminated on March 15th" doesn't tell you which agreement. A chunk that references "the method described above" doesn't include what that method is.

MokingBird's **6-Level Contextual Retrieval** enriches every chunk before indexing it:

| Level | Context added |
|-------|-------------|
| L1 | Document title and section heading |
| L2 | + Preceding paragraph summary |
| L3 | + Following paragraph summary |
| L4 | + Document-level metadata (author, date, type) |
| L5 | + Entity relationships extracted from surrounding text |
| L6 | + Full document summary as additional context signal |

Each level adds computational cost at indexing time but dramatically improves retrieval quality. At L6, every chunk carries rich contextual signals that allow the vector store to match it against queries that would have missed it entirely in a naive chunking setup.

This is the primary driver of the 40–50% accuracy improvement on complex document queries.

---

## What You Can Connect

**17 document formats:**
PDF (PyMuPDF + pypdf + pdfplumber with auto-fallback), DOCX, Excel, CSV, JSON, Markdown, PowerPoint, Email, Images with OCR (Tesseract + EasyOCR), Web content, Multimodal documents.

**10 LLM providers:**
- Cloud: OpenAI (GPT-4, GPT-3.5), Anthropic Claude, Google Gemini
- Local: Ollama (llama3, mistral, phi, and any locally available model), HuggingFace Direct, vLLM, llama.cpp

**8 embedding providers:**
OpenAI, local sentence-transformers, Ollama, HuggingFace, Cohere, and custom API endpoints.

**6 vector store backends:**
ChromaDB (default, local), FAISS, Qdrant, Pinecone, Weaviate, Milvus.

**6 retrieval strategies:**
Similarity search, MMR (Maximal Marginal Relevance — diversity-aware retrieval), Multi-Query, Ensemble (sparse + dense fusion), Parent Document, Contextual (all 6 levels).

---

## Use Cases

**Legal document review.** A law firm has thousands of contracts. Using mbRAG at L4 with parent document retrieval, associates can query across the entire contract corpus and get answers with precise source citations — without uploading documents to any cloud service.

**Medical literature search.** A researcher is working with a corpus of clinical papers. mbRAG's 6-level contextual retrieval maintains the meaning of clinical concepts across chunk boundaries, reducing false retrievals that would otherwise introduce incorrect context into LLM responses.

**Enterprise knowledge base.** An engineering team wants to query across internal wikis, Confluence pages, Slack exports, and technical specifications. mbRAG's 17-format support ingests the full corpus; L3 multi-query expansion handles the varied phrasings engineers use when searching for the same concept.

**Software engineering assistance.** A developer feeds mbRAG their entire codebase documentation, API specs, and internal runbooks. Queries like "how does the auth middleware work with the rate limiter?" are answered accurately because contextual retrieval preserves the relationships between components across documents.

---

## Accuracy vs. Latency: Choosing Your Level

The right pipeline level depends on your use case:

| Use case | Recommended level | Reason |
|----------|------------------|--------|
| Real-time chat | L1–L2 | Sub-second response required |
| Document Q&A | L2–L3 | Good accuracy, acceptable latency |
| Research / analysis | L3–L4 | Maximum accuracy, latency acceptable |
| Batch processing | L4 | Latency irrelevant, accuracy is everything |

You can also configure mbRAG to automatically select the pipeline level based on query complexity — simple queries route to L1, complex multi-part queries escalate to L4.

---

## Fully Local, Fully Yours

mbRAG runs on your hardware. Your documents are indexed locally. Your vector stores are stored on your filesystem. Your queries are processed on your machine.

If you use a local LLM (Ollama, llama.cpp), the entire pipeline — from document ingestion to final answer — runs without a single network request. Complete air-gap operation.

If you connect to a cloud LLM using your own API key, your key is stored locally and transmitted only to the provider's API, never to MokingBird servers.

---

## Beyond a Single Chat Interface

RAG is often implemented as a chat feature. But retrieval is infrastructure — and mbRAG is designed for that scope.

The same core retrieval framework can power multiple product surfaces:

- **Internal knowledge assistants** — query HR policies, engineering runbooks, or company wikis
- **Documentation copilots** — help developers navigate large codebases or API documentation
- **Compliance-support retrieval** — surface relevant regulatory clauses or audit evidence
- **Research synthesis pipelines** — aggregate and query across large academic or technical corpora
- **Customer support tooling** — answer support queries from product documentation

When designed correctly, one well-built RAG layer serves all of these. mbRAG is built for that role.

---

## Download Free

mbRAG is available as part of MokingBird AI — free to download and use.

Advanced features (L3 and L4 pipelines, all 6 vector store backends, all 10 LLM providers, full analytics) are available in the Premium tier.

Download at [ai.mokingbird.xyz](https://ai.mokingbird.xyz).

Build Production RAGPipelines in Minutes

Four Levels of Intelligence

BasicRAG

EnhancedRAG

Conversational

Advanced

The 8-Step RetrievalOrchestrator

12 Document Formats.Zero Friction.

Connect Everything.Lock In Nothing.

LLM Providers

Embedding Providers

Vector Stores

6 Context Enhancement Levels

Full Desktop ApplicationPySide6

12 Interactive Views

You Own It.Completely.

Built for Every Domain

Start building today.No signup required.

mbRAG — Articles

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

The Problem with Existing RAG Frameworks

Four Levels, One System

6-Level Contextual Retrieval: The Signature Innovation

What You Can Connect

Fully Local, Fully Yours

Download Free

Build Production RAG
Pipelines in Minutes

12 Document Formats.
Zero Friction.

Connect Everything.
Lock In Nothing.

Full Desktop Application
PySide6

You Own It.
Completely.

Start building today.
No signup required.