MokingbirdRAG — Desktop Application Design

Dashboard

MokingbirdRAG — fast_local preset · RetrievalOrchestrator active

Vectors

12,847

indexed chunks

↑ 847 today

Documents

23

in default collection

Queries

1,204

this session

↑ 18% vs avg

Avg Latency

0.84s

last 100 queries

↓ 12% improved

Pipeline Levels

mbBasicRAGPipeline

Embed → Similarity Retrieve → LLM → Answer

Active

mbEnhancedRAGPipeline

+ Query expansion, context enhancement (6 levels)

Active

mbConversationalPipeline

+ Multi-turn memory (conversation + vector + summary)

Idle

mbAdvancedPipeline

+ Full analytics, streaming, ensemble retrievers

Idle

RetrievalOrchestrator — 8-Step Pipeline

🧠

Memory
Check

›

🔍

Query
Enhance

›

📄

Retrieve
Chunks

›

🧱

Context
Format

›

✍️

Prompt
Build

›

🤖

LLM
Generate

›

📊

Analytics
Log

›

💾

Memory
Update

System Resources

CPU34%

RAM58%

GPU VRAM42%

Recent Queries

Query	Pipeline	Retriever	Latency	Score	Tokens
What is eCPRI fronthaul interface?	Enhanced	Similarity	0.72s	0.97	412
O-RAN split options comparison	Enhanced	MMR	0.91s	0.94	638
5G NR timing requirements	Basic	Similarity	0.58s	0.89	287
What are the CPRI line bit rates?	Advanced	Ensemble	1.43s	0.96	891

Query

Ask questions across your indexed document corpus

Pipeline Level

Retriever

Context Levels

Query

Conversational Chat

Multi-turn dialogue with persistent memory — mbConversationalPipeline

Session Info

Memory: Conversation + Vector

Turns: 3

Context tokens: 1,842

Session: sess-a3f9

Memory Types

Conversation — last 10 turns

Vector — semantic search over history

Summary — compressed long context

Entity — named entity tracking

U

What is eCPRI and how does it differ from CPRI?

M

eCPRI (enhanced Common Public Radio Interface) is a next-generation fronthaul standard that improves on CPRI in three key ways:

1. Transport flexibility — runs over standard Ethernet/IP instead of proprietary serial links
2. Bandwidth efficiency — moves more processing to the radio (7-2x functional split), reducing fronthaul bandwidth 10–20×
3. Multi-vendor interoperability — enables open ecosystem deployment

CPRI uses constant-bitrate TDM; eCPRI uses packet-based transport with QoS.

U

What bandwidth savings can we expect in a typical 5G NR deployment?

M

For a typical 5G NR Massive MIMO deployment with 64T64R antennas:

• CPRI: ~160 Gbps per cell (IQ samples × carriers × antennas)
• eCPRI (7-2x split): ~25 Gbps per cell

That's roughly a 6–8× reduction. For mmWave with 256T256R it can reach 12–15× reduction. This enables fronthaul over standard 25GbE/100GbE Ethernet infrastructure.

U

What about latency requirements?

Documents

23 documents · 12,847 vectors · default collection

Total Docs

23

in collection

Indexed

21

91%

Processing

1

network_arch.pdf

Total Chunks

12,847

avg 610/doc

Document Library

Name	Type	Chunks	Splitter	Loaders	Status	Size
ecpri_specification.pdf	PDF	847	Paragraph	PyMuPDF	● Indexed	12 MB
oran_standard.docx	DOCX	312	Semantic	python-docx	● Indexed	4.2 MB
network_architecture.pdf	PDF	0/940…	Auto	PyMuPDF	⟳ Processing	28 MB
deployment_guide.md	MD	94	Markdown	Text	● Indexed	0.3 MB
telecom_diagram.png	IMG	1	—	Florence-2	● Indexed	2.4 MB

Context Enhancement Levels

Core innovation — all 6 levels stored at ingestion. Select which levels to use at query time for optimal context richness.

6-Level Context System

Click to toggle levels for current queries

1

Raw Chunk

Exact retrieved chunk text (always included)

✓ On

2

Surrounding Window

±2 adjacent chunks for local context continuity

✓ On

3

Section Context

Full section/heading the chunk belongs to

✓ On

4

Document Context

Title, abstract, document-level summary

Off

5

Entity / Cross-Reference

Named entities, related chunks from other documents

Off

6

Semantic Cluster

Thematically similar chunks from entire corpus

Off

Context Impact Analysis

Answer Relevance (Levels 1–2)84%

Answer Relevance (Levels 1–3)91%

Answer Relevance (Levels 1–4)94%

Answer Relevance (Levels 1–6)97%

Latency Trade-off

Levels 1–2 (fast)0.62s

Levels 1–3 (balanced)0.84s

Levels 1–4 (rich)1.12s

Levels 1–6 (maximum)1.84s

Retrievers

Configure retrieval strategy for your queries

Available Retrievers

Similarity (mbSimilarityRetriever)

Cosine similarity search. Best for focused, specific queries. Top-K = 5.

DefaultFast

MMR (mbMMRRetriever)

Max Marginal Relevance — diverse, non-redundant results. λ = 0.5.

Diverse

Multi-Query (mbMultiQueryRetriever)

LLM generates N query variants → union of results. N = 3.

Thorough

Parent Document (mbParentDocumentRetriever)

Retrieves child chunks → returns parent context. Good for long docs.

Contextual

Ensemble (mbEnsembleRetriever)

Combines Similarity + BM25 with Reciprocal Rank Fusion. Best accuracy.

Highest QualitySlower

Retriever Settings

Active Retriever

Top-K Results

Score Threshold

MMR Lambda (diversity)

0 = max diversity, 1 = max relevance

Multi-Query N Variants

Analytics

Query performance, usage, and system metrics

Total Queries

8,492

all time

Avg Latency

0.84s

↓ 12% improved

Cache Hit

34%

query cache

Avg Score

0.91

retrieval relevance

Latency by Pipeline

mbBasicRAGPipeline0.62s

mbEnhancedRAGPipeline0.84s

mbConversationalPipeline1.12s

mbAdvancedPipeline1.84s

Queries by Retriever

Similarity5,214 (61%)

MMR1,698 (20%)

Ensemble1,019 (12%)

Multi-Query561 (7%)

Top Queries (by frequency)

Query Pattern	Count	Avg Latency	Avg Score
eCPRI interface definition	284	0.71s	0.97
O-RAN split options	198	0.88s	0.94
5G NR timing requirements	156	0.64s	0.89
fronthaul bandwidth calculation	142	1.02s	0.92

Configuration

Pipeline, model, and provider settings

Quick Presets

⚡ fast_local

LOCAL embed + Ollama LLM + ChromaDB

Active

⚖ balanced

Ollama embed + Ollama LLM + ChromaDB

☁ cloud_quality

OpenAI embed + GPT-4 + Chroma

🔒 privacy_focused

All local, no cloud calls

🔬 research

All retrievers + max context levels

LLM Provider

Provider

Model

Temperature

Max Tokens

Embedding Provider

Provider

Model

Chunk Size

Chunk Overlap

Vector Store

Backend

Persist Directory

Collection Name

Settings

Application preferences and system settings

Appearance

Theme

Choose light or dark interface

Font Size

UI text size

Compact Mode

Reduce padding and spacing

Query Behavior

Streaming Mode

Stream tokens as generated

Auto-copy Answer

Copy answer to clipboard

Show Source Scores

Display relevance scores

System

Analytics Logging

Save query metrics locally

Auto-refresh Analytics

Update stats every 30s

Ollama Health Check

Ping Ollama on startup

Cache Queries

Cache results for identical queries

Keyboard Shortcuts

Run QueryCtrl+Enter

Go to DashboardCtrl+1

Go to QueryCtrl+2

Add DocumentCtrl+O

Toggle SidebarCtrl+\

Pipeline Status

Real-time service health and pipeline diagnostics

Service Health

Ollama LLM Server

http://localhost:11434 · llama3.2:3b loaded

● Online

ChromaDB Vector Store

Local · 12,847 vectors · default collection

● Ready

Sentence-Transformers

all-MiniLM-L6-v2 · 384 dim · loaded

● Loaded

FastAPI Server

port 8000 · not started

Stopped

Florence-2 (Image Captioner)

Optional · not installed

Optional

RetrievalOrchestrator Diagnostics

🧠

Memory
Check

🔍

Query
Enhance

📄

Retrieve
Chunks

🧱

Context
Format

✍️

Prompt
Build

🤖

LLM
Generate

📊

Analytics
Log

💾

Memory
Update

All 8 steps operational · avg execution: 0.84s

Memory check0.01s

Query enhancement0.04s

Retrieval0.08s

Context formatting0.02s

LLM generation0.69s

Collections

default

23 documents · 12,847 vectors

● Active

oran_specs

7 documents · 3,242 vectors

Inactive

+

New Collection

Memory

Conversation memory management for mbConversationalPipeline and mbAdvancedPipeline

Memory Types Available

ConversationMemory

Stores last N turns. Simple buffer. Low overhead.

Active (10 turns)

VectorMemory

Stores turns as embeddings for semantic search over history.

Inactive

SummaryMemory

Compresses old turns to a running summary. Good for long sessions.

Inactive

EntityMemory

Tracks named entities (people, org, places) across conversation.

Inactive

Current Session Memory

Session: sess-a3f9 · 3 turns · 1,842 tokens

[Turn 1 - User]
"What is eCPRI and how does it differ from CPRI?"

[Turn 1 - Assistant]
"eCPRI (enhanced Common Public Radio Interface) is a next-generation fronthaul standard…"

[Turn 2 - User]
"What bandwidth savings can we expect?"

[Turn 2 - Assistant]
"For a typical 5G NR Massive MIMO deployment with 64T64R antennas…"

[Turn 3 - User]
"What about latency requirements?"