MokingbirdRAG Desktop — 🦜 Retrieval-Augmented Generation
v2.0.0-beta
MokingbirdRAG Dashboard
● Ollama Online fast_local preset
Dashboard
MokingbirdRAG — fast_local preset · RetrievalOrchestrator active
Vectors
12,847
indexed chunks
↑ 847 today
Documents
23
in default collection
Queries
1,204
this session
↑ 18% vs avg
Avg Latency
0.84s
last 100 queries
↓ 12% improved
Pipeline Levels
mbBasicRAGPipeline
Embed → Similarity Retrieve → LLM → Answer
Active
mbEnhancedRAGPipeline
+ Query expansion, context enhancement (6 levels)
Active
mbConversationalPipeline
+ Multi-turn memory (conversation + vector + summary)
Idle
mbAdvancedPipeline
+ Full analytics, streaming, ensemble retrievers
Idle
RetrievalOrchestrator — 8-Step Pipeline
🧠
Memory
Check
🔍
Query
Enhance
📄
Retrieve
Chunks
🧱
Context
Format
✍️
Prompt
Build
🤖
LLM
Generate
📊
Analytics
Log
💾
Memory
Update
System Resources
CPU34%
RAM58%
GPU VRAM42%
Recent Queries
QueryPipelineRetrieverLatencyScoreTokens
What is eCPRI fronthaul interface?EnhancedSimilarity0.72s0.97412
O-RAN split options comparisonEnhancedMMR0.91s0.94638
5G NR timing requirementsBasicSimilarity0.58s0.89287
What are the CPRI line bit rates?AdvancedEnsemble1.43s0.96891
Query
Ask questions across your indexed document corpus
Conversational Chat
Multi-turn dialogue with persistent memory — mbConversationalPipeline
Session Info
Memory: Conversation + Vector
Turns: 3
Context tokens: 1,842
Session: sess-a3f9
Memory Types
Conversation — last 10 turns
Vector — semantic search over history
Summary — compressed long context
Entity — named entity tracking
U
What is eCPRI and how does it differ from CPRI?
M
eCPRI (enhanced Common Public Radio Interface) is a next-generation fronthaul standard that improves on CPRI in three key ways:

1. Transport flexibility — runs over standard Ethernet/IP instead of proprietary serial links
2. Bandwidth efficiency — moves more processing to the radio (7-2x functional split), reducing fronthaul bandwidth 10–20×
3. Multi-vendor interoperability — enables open ecosystem deployment

CPRI uses constant-bitrate TDM; eCPRI uses packet-based transport with QoS.
U
What bandwidth savings can we expect in a typical 5G NR deployment?
M
For a typical 5G NR Massive MIMO deployment with 64T64R antennas:

CPRI: ~160 Gbps per cell (IQ samples × carriers × antennas)
eCPRI (7-2x split): ~25 Gbps per cell

That's roughly a 6–8× reduction. For mmWave with 256T256R it can reach 12–15× reduction. This enables fronthaul over standard 25GbE/100GbE Ethernet infrastructure.
U
What about latency requirements?
Documents
23 documents · 12,847 vectors · default collection
Total Docs
23
in collection
Indexed
21
91%
Processing
1
network_arch.pdf
Total Chunks
12,847
avg 610/doc
Document Library
NameTypeChunksSplitterLoadersStatusSizeActions
ecpri_specification.pdf PDF 847ParagraphPyMuPDF ● Indexed 12 MB
oran_standard.docx DOCX 312Semanticpython-docx ● Indexed 4.2 MB
network_architecture.pdf PDF 0/940…AutoPyMuPDF ⟳ Processing 28 MB
deployment_guide.md MD 94MarkdownText ● Indexed 0.3 MB
telecom_diagram.png IMG 1Florence-2 ● Indexed 2.4 MB
Context Enhancement Levels
Core innovation — all 6 levels stored at ingestion. Select which levels to use at query time for optimal context richness.
6-Level Context System
Click to toggle levels for current queries
1
Raw Chunk
Exact retrieved chunk text (always included)
✓ On
2
Surrounding Window
±2 adjacent chunks for local context continuity
✓ On
3
Section Context
Full section/heading the chunk belongs to
✓ On
4
Document Context
Title, abstract, document-level summary
Off
5
Entity / Cross-Reference
Named entities, related chunks from other documents
Off
6
Semantic Cluster
Thematically similar chunks from entire corpus
Off
Context Impact Analysis
Answer Relevance (Levels 1–2)84%
Answer Relevance (Levels 1–3)91%
Answer Relevance (Levels 1–4)94%
Answer Relevance (Levels 1–6)97%
Latency Trade-off
Levels 1–2 (fast)0.62s
Levels 1–3 (balanced)0.84s
Levels 1–4 (rich)1.12s
Levels 1–6 (maximum)1.84s
Retrievers
Configure retrieval strategy for your queries
Available Retrievers
Similarity (mbSimilarityRetriever)
Cosine similarity search. Best for focused, specific queries. Top-K = 5.
DefaultFast
MMR (mbMMRRetriever)
Max Marginal Relevance — diverse, non-redundant results. λ = 0.5.
Diverse
Multi-Query (mbMultiQueryRetriever)
LLM generates N query variants → union of results. N = 3.
Thorough
Parent Document (mbParentDocumentRetriever)
Retrieves child chunks → returns parent context. Good for long docs.
Contextual
Ensemble (mbEnsembleRetriever)
Combines Similarity + BM25 with Reciprocal Rank Fusion. Best accuracy.
Highest QualitySlower
Retriever Settings
0 = max diversity, 1 = max relevance
Analytics
Query performance, usage, and system metrics
Total Queries
8,492
all time
Avg Latency
0.84s
↓ 12% improved
Cache Hit
34%
query cache
Avg Score
0.91
retrieval relevance
Latency by Pipeline
mbBasicRAGPipeline0.62s
mbEnhancedRAGPipeline0.84s
mbConversationalPipeline1.12s
mbAdvancedPipeline1.84s
Queries by Retriever
Similarity5,214 (61%)
MMR1,698 (20%)
Ensemble1,019 (12%)
Multi-Query561 (7%)
Top Queries (by frequency)
Query PatternCountAvg LatencyAvg Score
eCPRI interface definition2840.71s0.97
O-RAN split options1980.88s0.94
5G NR timing requirements1560.64s0.89
fronthaul bandwidth calculation1421.02s0.92
Configuration
Pipeline, model, and provider settings
Quick Presets
⚡ fast_local
LOCAL embed + Ollama LLM + ChromaDB
Active
⚖ balanced
Ollama embed + Ollama LLM + ChromaDB
☁ cloud_quality
OpenAI embed + GPT-4 + Chroma
🔒 privacy_focused
All local, no cloud calls
🔬 research
All retrievers + max context levels
LLM Provider
Embedding Provider
Vector Store
Settings
Application preferences and system settings
Appearance
Theme
Choose light or dark interface
Font Size
UI text size
Compact Mode
Reduce padding and spacing
Query Behavior
Streaming Mode
Stream tokens as generated
Auto-copy Answer
Copy answer to clipboard
Show Source Scores
Display relevance scores
System
Analytics Logging
Save query metrics locally
Auto-refresh Analytics
Update stats every 30s
Ollama Health Check
Ping Ollama on startup
Cache Queries
Cache results for identical queries
Keyboard Shortcuts
Run QueryCtrl+Enter
Go to DashboardCtrl+1
Go to QueryCtrl+2
Add DocumentCtrl+O
Toggle SidebarCtrl+\
Pipeline Status
Real-time service health and pipeline diagnostics
Service Health
Ollama LLM Server
http://localhost:11434 · llama3.2:3b loaded
● Online
ChromaDB Vector Store
Local · 12,847 vectors · default collection
● Ready
Sentence-Transformers
all-MiniLM-L6-v2 · 384 dim · loaded
● Loaded
FastAPI Server
port 8000 · not started
Stopped
Florence-2 (Image Captioner)
Optional · not installed
Optional
RetrievalOrchestrator Diagnostics
🧠
Memory
Check
🔍
Query
Enhance
📄
Retrieve
Chunks
🧱
Context
Format
✍️
Prompt
Build
🤖
LLM
Generate
📊
Analytics
Log
💾
Memory
Update
All 8 steps operational · avg execution: 0.84s
Memory check0.01s
Query enhancement0.04s
Retrieval0.08s
Context formatting0.02s
LLM generation0.69s
Collections
default
23 documents · 12,847 vectors
● Active
oran_specs
7 documents · 3,242 vectors
Inactive
+
New Collection
Memory
Conversation memory management for mbConversationalPipeline and mbAdvancedPipeline
Memory Types Available
ConversationMemory
Stores last N turns. Simple buffer. Low overhead.
Active (10 turns)
VectorMemory
Stores turns as embeddings for semantic search over history.
Inactive
SummaryMemory
Compresses old turns to a running summary. Good for long sessions.
Inactive
EntityMemory
Tracks named entities (people, org, places) across conversation.
Inactive
Current Session Memory
Session: sess-a3f9 · 3 turns · 1,842 tokens
[Turn 1 - User]
"What is eCPRI and how does it differ from CPRI?"
[Turn 1 - Assistant]
"eCPRI (enhanced Common Public Radio Interface) is a next-generation fronthaul standard…"
[Turn 2 - User]
"What bandwidth savings can we expect?"
[Turn 2 - Assistant]
"For a typical 5G NR Massive MIMO deployment with 64T64R antennas…"
[Turn 3 - User]
"What about latency requirements?"