MokingBird RAG — ai.mokingbird.xyz/mbrag

Build Production RAG
Pipelines in Minutes

From document to answer — 4 pipeline levels, 12 document formats, 8 LLM providers.
Local, cloud, or hybrid.

⬇ Get RAG — Free
12 Document Formats
8 LLM Providers
4 Pipeline Levels
100% Local Option
Free Download
4
Pipeline Levels
12
Document Formats
8
LLM Providers
7
Embedding Providers
4
Vector Stores
5
Retrieval Strategies
6
Context Enhancement Levels

Four Levels of Intelligence

Choose the pipeline that fits your latency and quality requirements. Scale up or down without changing your code.

L1

BasicRAG

Fast, direct retrieval with no overhead. Ideal for prototyping, simple Q&A, and latency-critical tasks where speed matters most.

~200–500ms
  • Direct vector similarity search
  • Single-pass retrieval
  • Minimal context processing
  • No memory overhead
L2

EnhancedRAG

Adds re-ranking, query expansion, and context refinement for significantly improved answer quality over L1.

~500ms–1.2s
  • Query expansion + reformulation
  • Multi-stage re-ranking
  • Context deduplication
  • Source attribution
L3

Conversational

Persistent memory across turns with session-aware retrieval. Maintains conversation history for coherent multi-turn dialogues.

~800ms–1.8s
  • Session memory management
  • Context-aware retrieval
  • Turn-level coreference
  • History summarization
L4

Advanced

Maximum intelligence pipeline with entity extraction, multi-hop reasoning, and full analytics. Built for production enterprise use cases.

~1.5s–4s
  • Multi-hop & chain-of-thought
  • Entity & relationship extraction
  • Hybrid dense + sparse retrieval
  • Full analytics & tracing

The 8-Step RetrievalOrchestrator

Every query runs through a deterministic 8-step pipeline. Observable, debuggable, and fully configurable at each stage.

1
Memory
Recall
history lookup
2
Query
Enhancement
expand + rewrite
3
Retrieval
vector + sparse
4
Context
Formatting
rank + dedupe
5
Prompt
Assembly
template + inject
6
LLM
Generation
stream output
7
Analytics
Recording
latency + quality
8
Memory
Update
persist session

Fully observable: every step emits structured logs and timing metadata. Use the built-in Analytics view to inspect latency, token counts, and retrieval quality for any query — in real time.

12 Document Formats.
Zero Friction.

Drop in your files — PDFs, spreadsheets, emails, code, or web URLs. MokingBird RAG handles ingestion automatically.

📄
PDF
.pdf
📝
Word Documents
.docx
📊
Excel
.xlsx .xls
📋
CSV Data
.csv
✏️
Markdown
.md .mdx
📃
Plain Text
.txt
🗄️
JSON
.json
📧
Email
.eml .msg
📑
PowerPoint
.pptx .ppt
🌐
Web URLs
http/https
🖼️
Images
Florence-2
🎞️
MultiModal
mixed

Connect Everything.
Lock In Nothing.

Swap LLM providers, embedding models, or vector stores without rewriting your pipeline. MokingBird RAG stays provider-agnostic.

🤖

LLM Providers

8 supported providers

Ollama OpenAI Anthropic HuggingFace vLLM TGI llama.cpp Custom API
🔢

Embedding Providers

7 supported providers

SentenceTransformers Ollama OpenAI HuggingFace Cohere API Custom
🗃️

Vector Stores

4 supported backends

ChromaDB FAISS Pinecone Qdrant

Retrieval Strategies

Dense Vector Sparse BM25 Hybrid MMR Re-rank

6 Context Enhancement Levels

Dial up context quality from raw retrieval to full entity-aware enrichment. Each level adds measurable quality improvement.

Level Name Description Latency Quality Gain
L1 Raw Chunks returned as-is from vector store. No processing. +0ms
Baseline
L2 Deduped Removes duplicate and near-duplicate chunks for cleaner context. +30ms
+Low
L3 Re-ranked Cross-encoder re-ranking for relevance-ordered context windows. +80ms
+Medium
L4 Summarized LLM-generated summaries reduce context length while preserving signal. +200ms
+High
L5 Structured Formats context as structured data for better prompt injection. +150ms
+Very High
L6 Entity Extraction Named entities and relationships extracted and injected as enriched context. +350ms
+Maximum

Full Desktop Application
PySide6

Not a web app. Not a CLI tool. A full native desktop experience built with PySide6 — 12 interactive views, 7 themes, and everything local.

12 Interactive Views

Every part of your RAG pipeline has a dedicated view. Switch, configure, and monitor — all in one application.

1
Dashboard
2
Query
3
Chat
4
Documents
5
Collections
6
Retrievers
7
Context Levels
8
Memory
9
Pipeline Status
10
Analytics
11
Configuration
12
Settings

7 Built-in Themes

Dark (Default) Midnight Blue Deep Purple Solarized Mocha Nord High Contrast

See a sample interactive preview of the desktop app below. The full interactive design file is available when you download and install MokingBird RAG.

Interactive Preview
🔍
MokingBird RAG Desktop

This is a design preview of the MokingBird RAG desktop application built with PySide6.

Download and install to start using all 12 views, configure your LLM providers, and run fully local RAG pipelines.

Will be embedded on the live site · Interactive design file

MokingBird RAG — Query View
Ask a question about your documents...
Run →
PIPELINE: L2 EnhancedRAG · 3 sources retrieved · 487ms
Based on the documents in your collection, the answer is...

You Own It.
Completely.

No telemetry. No data leaving your machine. No subscriptions. No vendor lock-in.

🔒
Fully Offline
Run the entire stack — models, embeddings, vector store — without an internet connection.
🚫
Zero Telemetry
We collect nothing. No usage data, no crash reports, no analytics phoning home.
🧠
You Own the Models
Connect your own LLMs and embeddings. No dependency on our infrastructure.
🔓
No Subscriptions
Free to download and use. No seat licenses, no monthly fees, no paywalls.
🔌
No Vendor Lock-in
Swap providers, formats, and stores anytime. Your data and pipeline belong to you.
✓ Fully offline ✓ Zero telemetry ✓ You own the models
Download MokingBird RAG — Free

Built for Every Domain

From legal due diligence to customer support automation — MokingBird RAG adapts to your use case, not the other way around.

📡
Telecommunications
Network documentation Q&A, fault analysis, and technical specification retrieval for NOC teams.
⚖️
Legal
Contract review, case law retrieval, regulatory compliance search across large document libraries.
🏥
Medical
Clinical guideline lookup, research paper synthesis, and patient record Q&A — fully local and private.
💻
Software Engineering
Codebase Q&A, documentation search, architecture decision record (ADR) retrieval.
🔬
Research
Academic paper ingestion, citation search, and multi-document synthesis for researchers.
🏢
Enterprise IT
Internal knowledge bases, runbook retrieval, IT support automation over corporate documentation.
🎓
Education
Curriculum Q&A, lecture note search, and intelligent tutoring grounded in course materials.
💬
Customer Support
Product documentation Q&A, support ticket deflection, and knowledge base-grounded responses.

Start building today.
No signup required.

Downloads are preparing for launch. Join the notify list and we will email you when MokingBird RAG is available.

← Back to mbRAG
Blog

mbRAG — Articles

Technical deep-dives on retrieval-augmented generation with mbRAG
mbRAG

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

April 2026 · MokingBird Team

← Back to Blog
mbRAG

How mbRAG Delivers Research-Grade Accuracy with 4 Pipeline Levels

April 13, 2026 · MokingBird Team · Tags: mbRAG, RAG, local AI, LLM, accuracy

Retrieval-Augmented Generation is one of the most practically useful ideas in applied AI. Instead of asking a language model to answer from its training data alone, RAG retrieves relevant passages from your documents and gives them to the model as context. The result is answers that are grounded in your actual data.

The idea is simple. The implementation is where most frameworks fail.

The Problem with Existing RAG Frameworks

LangChain made RAG accessible. It also made it fragile. Breaking changes between versions. Vendor lock-in through obscured abstractions. Poor retrieval accuracy on anything beyond toy datasets.

The deeper problem is that most RAG implementations use a single retrieval strategy — typically simple cosine similarity — and tune nothing else. This breaks down for complex multi-part questions, documents with overlapping information, long documents where chunk context is lost, and queries that require reasoning across multiple passages.

mbRAG was built to solve these failure modes by implementing every major RAG approach from scratch in a unified, stable system.

Four Levels, One System

LevelLatencyBest for
L1 Basic~0.8sFast lookup, simple Q&A
L2 Enhanced~1.5sImproved recall, standard workflows
L3 Smart~2.2sContext-aware, complex documents
L4 Advanced~3.5sMaximum accuracy, research-grade

6-Level Contextual Retrieval: The Signature Innovation

Standard RAG splits documents into chunks and indexes them. The problem: chunks lose context. MokingBird's 6-Level Contextual Retrieval enriches every chunk before indexing:

LevelContext added
L1Document title and section heading
L2+ Preceding paragraph summary
L3+ Following paragraph summary
L4+ Document-level metadata
L5+ Entity relationships from surrounding text
L6+ Full document summary as additional context signal

This is the primary driver of the 40–50% accuracy improvement on complex document queries.

What You Can Connect

  • 17 document formats: PDF (with multi-engine fallback), DOCX, Excel, CSV, JSON, Markdown, PowerPoint, Email, Images with OCR, Web content, and more
  • 10 LLM providers: OpenAI, Anthropic Claude, Google Gemini, Ollama, HuggingFace, vLLM, llama.cpp, and more
  • 8 embedding providers including local sentence-transformers and Ollama
  • 6 vector store backends: ChromaDB, FAISS, Qdrant, Pinecone, Weaviate, Milvus
  • 6 retrieval strategies including Ensemble and Contextual retrieval

Fully Local, Fully Yours

mbRAG runs on your hardware. Your documents are indexed locally. Your vector stores are stored on your filesystem. If you use a local LLM (Ollama, llama.cpp), the entire pipeline runs without a single network request. Complete air-gap operation.

Download Free

mbRAG is available as part of MokingBird AI — free to download. Advanced features are available in the Premium tier. Download at ai.mokingbird.xyz.