Runs Locally · Download Full Node for Free · Exposable APIs

The AI Suite Built for Privacy.

RAG. Dataset Generation. Fine-Tuning.
All running on your hardware. No cloud. No subscriptions. No compromises.

// zero telemetry · air-gap compatible · you own every byte

One Ecosystem, Four Tools

MokingBird Node sits at the center, connecting RAG, DataGen, and Fine-Tuning into one seamless local workflow.

RAG Pipeline
DataGen Pipeline
Fine-Tune Pipeline

How It Fits Together

MokingBird Node acts as the central hub, connecting all specialized tools into one coherent local workflow. Node is the wrapper layer; RAG, DataGen, and FT are the core product modules.

▸ MokingBird Node (Hub Layer)

Orchestration, scheduling, model management, and coordinated pipeline execution.

🔍 RAG · ⚡ DataGen · 🧬 FT

Modular tools. Download full Node or install each tool separately based on your setup.

-- Your Hardware · Your Models · Your Data --

Explore Each Product

Click a product below to explore its capabilities.

MokingBird Node

The unified hub that orchestrates your entire local AI workflow. One interface to manage models, pipelines, and all MokingBird tools — all on your hardware.

  • Unified dashboard for all tools
  • Local LLM model management & hot-swap
  • Background pipeline scheduling
  • Real-time GPU & memory monitoring
Download Full Node →
// Node Hub Specs
TypeOrchestration Hub
InterfaceDesktop GUI + API
IntegratesRAG, DataGen, FT
Model FormatsGGUF, GGML, HF
Cloud RequiredNever

MokingBird RAG

A comprehensive RAG framework with 6-level contextual retrieval. From basic keyword search up to graph-augmented retrieval — all local, all private.

  • 6-level retrieval hierarchy
  • Hybrid semantic + keyword (RRF fusion)
  • 30–50% accuracy gain vs naive RAG
  • All embeddings & indexes stored locally
// Retrieval Levels
Level 1Keyword BM25
Level 2Dense Semantic
Level 3Hybrid + RRF
Level 4Contextual Compress
Level 5Agentic Re-rank
Level 6Graph-Augmented

MokingBird DataGen ★ Flagship

The crown jewel of MokingBird. GPRO-Hybrid Reinforcement Learning drives quality-controlled dataset synthesis — generating training data that actually makes your models better.

  • GPRO-Hybrid RL quality control engine
  • Instruction, preference, DPO, conversation formats
  • Automated scoring, dedup & diversity filtering
  • Scale to millions of samples, all offline
// GPRO-Hybrid RL Loop
Step 1Generate candidates
Step 2Score w/ reward model
Step 3Filter top-k
Step 4Policy refinement
OutputHigh-quality dataset

MokingBird FT

Universal fine-tuning platform supporting 7+ training frameworks. LoRA, QLoRA, DPO, ORPO and more — your fine-tuned models are yours alone, stored locally.

  • 7+ frameworks: LoRA, QLoRA, DPO, ORPO, PPO…
  • Auto hyperparameter optimization
  • Built-in eval benchmarks & loss visualization
  • Export to GGUF, HuggingFace, ONNX
// Framework Support
LoRA / QLoRA✓ Full
DPO / IPO✓ Full
ORPO✓ Full
PPO / RLHF✓ Full
Full Fine-tune✓ Full
GaLore +✓ + more

Built for Real Work

Every tool is production-grade, privacy-first, and designed to run on your own hardware indefinitely.

🔮
Hub · Orchestration
MokingBird Node

The all-in-one control plane for your local AI stack. Manage models, schedule pipelines, monitor resources, and orchestrate RAG, DataGen, and FT — from one unified interface.

Desktop GUI REST API Model Manager Pipeline Scheduler
Download Full Node →
🔍
RAG · Retrieval
MokingBird RAG

Comprehensive RAG with 6-level contextual retrieval hierarchy. Outperforms standard cloud RAG by 30–50% — no SaaS subscription needed, no data leaving your network.

6-Level Retrieval Hybrid Search Graph RAG Local Embeddings
★ Flagship
DataGen · Flagship · GPRO-Hybrid RL
MokingBird DataGen

The jewel of the suite. Generate high-quality AI training datasets locally using GPRO-Hybrid Reinforcement Learning. Quality-controlled, diverse, and at any scale — on your hardware.

GPRO-Hybrid RL Instruction Tuning Preference Data DPO Format
🧬
Fine-Tuning · Universal
MokingBird FT

Universal fine-tuning platform with 7+ frameworks. From LoRA adapters to full fine-tuning to RLHF — train the model you need, own the weights, export anywhere.

LoRA / QLoRA DPO ORPO 7+ Frameworks

Who Uses MokingBird AI

Designed for anyone who takes AI privacy seriously.

🔬
ML Researcher
Reproduce & Extend Research

Generate custom datasets, fine-tune open models, and run reproducible RAG experiments — all offline, all verifiable.

DataGen FT RAG
🏢
Enterprise Developer
Private Document Q&A

Build RAG systems over sensitive internal docs. No cloud. No data leaks. No compliance headaches. Air-gap ready.

RAG Node
🧑‍💻
Indie AI Developer
Build & Ship Local AI Apps

Use MokingBird's Python API to power local AI features in your products. Keep your users' data off the cloud by design.

Node API DataGen
🛡️
Security-Conscious Team
Zero-Trust AI Deployment

Deploy in air-gapped environments. MokingBird never phones home. Zero-telemetry local stack you control end-to-end.

Air-Gap Zero Telemetry Audit
⚙️
Fine-Tuning Engineer
Domain-Specific Model Training

Generate domain datasets with DataGen, then fine-tune with FT across 7+ frameworks. Evaluate, iterate, and export — all locally.

DataGen FT Eval
📚
Academic Institution
Research Without API Costs

Run large-scale AI experiments on institutional hardware. No per-token costs, no data agreements with cloud providers.

All Tools Free Scale

You Own It. All of It.

No telemetry. No data leaving your machine. No subscriptions. No vendor lock-in. Your AI, your rules.

Fully Local

All core workflows run on your own hardware. Air-gap compatible by design.

Zero Telemetry

No hidden call-home behavior. You control data movement and integration boundaries.

Model Ownership

You own model artifacts, datasets, and operational lifecycle from training to deployment.

Launching Soon

Your AI.
Your Hardware.

Downloads are being finalized. Join the notify list and we will email you as soon as MokingBird AI launches.

⬇ Download Full Node for Free
// Zero telemetry · air-gap compatible · you own every byte
← Back to MokingBird AI
Company

About MokingBird AI

MokingBird Oy · Business ID: 3615646-1 · Finland · ai.mokingbird.xyz

Local AI Infrastructure, Built for the Real World

MokingBird AI is the desktop AI ecosystem developed by MokingBird — a suite of three powerful, production-grade tools for working with large language models, entirely on your own hardware. No cloud accounts. No data leaving your machine. No subscriptions required to get started.

The suite is organized around the MokingBird Node — a desktop hub application that brings mbRAG, mbDataGen, and mbFT together in one place. Use them as an integrated system through the Node, or run each tool independently as a standalone application.

Domain: ai.mokingbird.xyz
Company: MokingBird Oy, Business ID: 3615646-1, Finland


Our Mission

Your models. Your hardware. Your data. Your rules.

Local-first AI infrastructure for builders, researchers, and teams who need professional capability without cloud lock-in or data exposure.


What is MokingBird AI?

MokingBird AI is an umbrella for our local AI infrastructure products. The core architecture is:

MokingBird Node (Desktop Hub)
├── mbRAG     — Retrieval-Augmented Generation framework
├── mbDataGen — Synthetic dataset generation platform
└── mbFT      — Universal fine-tuning platform

Each tool can be installed and used standalone, or together through the Node. The Node is the hub — a PySide6 desktop application (Python) that surfaces all three tools in a unified interface and exposes local FastAPI endpoints for programmatic access.

Platforms: Windows, macOS, Linux
Architecture: 100% local execution — all AI operations run on your machine
Internet: Optional (for connecting to cloud LLM providers using your own API keys)


Our Three Tools

mbRAG — Advanced Retrieval-Augmented Generation

mbRAG is a production-ready, stable alternative to LangChain. It implements every major RAG approach in a single unified system — without the breaking changes, vendor lock-in, or opacity that plagues other frameworks.

At a glance:

  • 4 pipeline levels: L1 Basic (~0.8s) → L4 Advanced (~3.5s)
  • 17 supported document formats (PDF with 3 engine fallbacks, DOCX, Excel, CSV, JSON, Markdown, PowerPoint, Images with OCR, Web content, and more)
  • 10 LLM providers (OpenAI, Anthropic Claude, Google Gemini, Ollama, HuggingFace, vLLM, llama.cpp, and more)
  • 8 embedding providers
  • 6 vector store backends (ChromaDB, FAISS, Qdrant, Pinecone, Weaviate, Milvus)
  • 6 retrieval strategies including Ensemble and Contextual
  • 6-Level Contextual Retrieval — MokingBird's signature innovation that enriches every document chunk with surrounding context before retrieval, dramatically improving accuracy
  • 40–50% accuracy improvement over naive RAG implementations

mbDataGen — Synthetic Dataset Generation

mbDataGen solves one of the most persistent problems in applied ML: getting high-quality, domain-specific training data. Instead of scraping, labeling manually, or accepting noisy public datasets, mbDataGen generates clean, validated synthetic data from your own documents.

At a glance:

  • 5-phase pipeline: Extract → Enrich → Generate → Validate → Deploy
  • GPRO-Hybrid RL — our original reward learning approach: Total Reward = 0.7 × Field/Process Reward + 0.3 × Outcome/Overall Reward
  • K=4 candidate generation with comparative reward scoring
  • 5-stage validator: Schema → Distribution → Dedupe → Grounding → Novelty
  • HMAC-signed RunManifest for complete data provenance
  • Minimum 6GB VRAM, 8GB recommended
  • Generates output in any schema you define — Jogg quiz format, instruction-following pairs, preference data, and more

mbFT — Universal Fine-Tuning Platform

mbFT makes fine-tuning accessible without hiding what's actually happening. The Smart Config Engine handles the complexity of choosing hyperparameters and memory-efficient configurations, while keeping you in full control.

At a glance:

  • 16 fine-tuning techniques:
    • 6 SFT methods (LoRA, QLoRA, Full Fine-Tuning, Prefix Tuning, Prompt Tuning, Adapter Layers)
    • 5 RL methods (GRPO, PPO, DPO, ORPO, Kahneman-Tversky Optimization)
    • 5 Multimodal methods (Vision-Language, Audio-Text, Code-Specialized, Medical Imaging, Document Understanding)
  • 7 supported frameworks (Unsloth, Axolotl, LLaMAFactory, Transformers, DeepSpeed, FSDP, TRL)
  • VRAM pre-simulation — before you start a run, the system estimates memory requirements so you know if your hardware can handle it
  • Hybrid GRPO — MokingBird's original contribution, combining reward model and rule-based signals
  • 6-tier hardware classification system
  • Comparison interface vs manual setups in Unsloth/Axolotl/LLaMAFactory

Why Local AI?

The cloud AI model has a hidden cost: your data. When you send documents to an API endpoint to answer questions, generate datasets, or fine-tune a model, those documents leave your control. Even with privacy-protecting terms of service, the fundamental architecture means your data travels.

MokingBird AI is built on a different premise: the model comes to your data, not the other way around.

  • Your documents stay on your machine
  • Your API keys are stored locally, never transmitted to MokingBird
  • Your fine-tuned models are yours — we have no access to them
  • No registration required
  • Works fully offline when using local LLMs via Ollama

This isn't just a privacy positioning. It's an architecture choice that also means lower latency, no per-query costs for local models, and no dependency on service uptime. Critical workflows don't become unusable when an external API is down.


Our Mission for AI

We believe powerful AI tools should not require:

  • A cloud account
  • A corporate API budget
  • Trusting a third party with proprietary data
  • A PhD to configure

MokingBird AI is our contribution toward democratizing serious AI infrastructure — the kind of RAG, data generation, and fine-tuning capability that has historically been available only to well-funded research teams or large enterprises.


Part of MokingBird

MokingBird AI is developed by MokingBird Oy — The Everything Lab. We also build MB Viewer, Sortify, Jogg, and Jogg Mini.

Learn more about MokingBird at mokingbird.xyz/about.


Contact

← Back to MokingBird AI
Company

Privacy Policy

MokingBird Oy · Business ID: 3615646-1 · Last updated: April 2026

MokingBird Oy | Business ID: 3615646-1 | Finland

Last updated: April 2026

This policy covers all MokingBird AI products and websites, including:

  • https://ai.mokingbird.xyz/
  • https://ai.mokingbird.xyz/mbrag
  • https://ai.mokingbird.xyz/datagen
  • https://ai.mokingbird.xyz/finetune

Introduction

MokingBird AI is a suite of desktop applications developed by MokingBird Oy. This Privacy Policy describes how our AI products handle your data. It applies to:

  • MokingBird Node — the desktop hub application
  • mbRAG — Retrieval-Augmented Generation framework
  • mbDataGen — Synthetic dataset generation platform
  • mbFT — Universal fine-tuning platform

All of the above are desktop applications that run locally on your device. They were designed from the ground up with a core architectural commitment: your data stays on your machine.

This is not a privacy policy that describes what we do with data we've collected about you. It is a policy that explains why we collect essentially none, and what limited data interactions do exist.

Contact us for privacy inquiries at [email protected].


1. Our Privacy Commitment

MokingBird AI products are built on the principle that the model comes to your data, not the other way around.

When you use mbRAG to query your documents, those documents are processed on your hardware. When you use mbDataGen to generate training data from your corpus, that corpus stays on your machine. When you use mbFT to fine-tune a model, the resulting model is yours — we have no visibility into it.

This is not a product feature. It is the architecture. There are no MokingBird servers receiving your documents, queries, datasets, or model outputs during normal application operation.


2. Data We Collect During App Operation

During normal operation of any MokingBird AI application: none.

Specifically, the following are never transmitted to MokingBird servers:

  • Documents you load into mbRAG
  • Queries you submit to any pipeline
  • Vector embeddings or index files created on your machine
  • Datasets generated by mbDataGen
  • Model weights or fine-tuning results produced by mbFT
  • Your configuration files or settings
  • Your usage patterns, session duration, or feature usage statistics
  • Crash reports or error logs
  • Any other information generated during app use

There is no background analytics process. There is no telemetry SDK. There is no crash reporter sending data to our servers. If you want to verify this, you can monitor the application's network activity — you will find nothing from the app itself during normal operation.


3. Your API Keys

Many users connect MokingBird AI tools to external LLM providers (OpenAI, Anthropic Claude, Google Gemini) or embedding providers (Cohere, HuggingFace API). To do so, you provide API keys.

Your API keys:

  • Are stored locally on your device (in a configuration file or OS keychain, depending on your platform)
  • Are never transmitted to MokingBird servers
  • Are sent only to the third-party provider you are connecting to (e.g., an OpenAI API key is sent to OpenAI's servers when you make a query, exactly as it would be in any OpenAI client)
  • Are not logged or cached by MokingBird systems

MokingBird Oy has no access to your API keys.


4. Documents and Data You Process

When you feed documents into mbRAG, generate data with mbDataGen, or provide training data to mbFT:

  • All processing happens on your machine using your hardware (CPU/GPU)
  • Document contents are loaded into memory locally and never transmitted over the network to MokingBird
  • Vector stores (embeddings databases) are saved locally to a directory you control
  • Generated datasets are written to local files on your machine
  • Fine-tuned model checkpoints are saved to local directories on your machine

The only exception is if you have configured a cloud LLM provider: in that case, queries sent to that provider (e.g., OpenAI API) are governed by that provider's own privacy policy. MokingBird has no control over, or access to, what you send to those providers.


5. LLM Provider Privacy

MokingBird AI supports both local LLMs (via Ollama, llama.cpp, HuggingFace local, vLLM) and cloud LLMs (OpenAI, Anthropic, Google Gemini).

When using local LLMs:
All queries stay entirely on your device. No internet connection is required. Complete privacy end-to-end.

When using cloud LLMs:
Your queries are sent to the provider's API servers (e.g., api.openai.com). This is governed by OpenAI's, Anthropic's, or the relevant provider's privacy policy — not ours. MokingBird is a client that calls the API on your behalf using your key. We do not see, log, or store those queries.

We recommend reviewing the privacy policy of any cloud LLM provider you connect to MokingBird AI tools.


6. Optional Update Checks

MokingBird AI applications may offer optional checks for software updates. When an update check occurs:

  • A single HTTPS request is made to our release endpoint (GitHub Releases)
  • The request contains: current application version, your operating system type
  • No personal data, user ID, API keys, or document data is included
  • The response contains: latest version number and release notes

You can disable automatic update checks in the application Settings.


7. Data We Do Not Intend to Collect by Default

For local-first product operation, we do not require or intend to collect:

  • Mandatory account creation for basic desktop usage
  • Broad behavioral tracking for advertising or profiling
  • User dataset content for data brokerage or model training
  • File contents processed during app operation

This is architectural, not just a policy statement — the app has no mechanism to transmit these data types in local mode.


8. No Telemetry, No Analytics

Our applications contain no:

  • Analytics SDKs (Google Analytics, Mixpanel, Amplitude, etc.)
  • Crash reporting services (Sentry, Bugsnag, etc.)
  • Usage tracking or feature analytics
  • Session recording or heatmapping

We do not know how you use the software. We do not track which features you use, how many documents you process, or what models you fine-tune. This is intentional.


9. Website Analytics (ai.mokingbird.xyz)

The MokingBird AI website may use privacy-friendly, cookieless web analytics (such as Plausible or Cloudflare Web Analytics) to understand aggregate traffic patterns. If analytics are in use, they:

  • Do not track individual users
  • Do not use cookies
  • Do not build behavioral profiles
  • Collect only aggregate page view counts and referrer data

We will update this section if our analytics setup changes.


10. International Transfers

MokingBird Oy is based in Finland, within the EU. Where infrastructure services process data outside the EU/EEA, we apply required GDPR safeguards including standard contractual clauses or adequacy decisions where applicable.


11. Your GDPR Rights

As a user in the European Union (or anyone protected by GDPR-equivalent legislation), you have the following rights:

  • Right of access — Request information about personal data we hold about you
  • Right to erasure — Request deletion of your personal data
  • Right to rectification — Request correction of inaccurate data
  • Right to portability — Receive your data in a machine-readable format
  • Right to object — Object to processing of your data
  • Right to withdraw consent — Withdraw consent for any processing based on consent (e.g., newsletter subscriptions)

Since our desktop applications collect no personal data, most of these rights are satisfied by design. For any data held through website contact forms, newsletter subscriptions, or support tickets, contact [email protected]. We respond within 30 days.


12. Data Retention (Website and Support)

Data typeRetention
Website server logs30 days
Contact form submissions90 days after resolution
Newsletter subscriptionsUntil you unsubscribe
Support ticket history1 year after resolution

Application data (documents, models, configurations, datasets) is stored only on your device. We have no copies.


13. Children's Privacy

MokingBird AI products are professional developer and researcher tools. They are not intended for use by children under 13. We do not knowingly collect personal information from children under 13. Since the applications collect no personal data during operation, there is no such data at risk. If you believe a child has used these applications in a way that has generated personal data, contact [email protected].


14. Changes to This Policy

We will update this policy when our data practices change. Material changes will be noted on the website and in the application's release notes. The "Last updated" date at the top of this document will always reflect the most recent revision.


15. Contact

MokingBird Oy
Business ID: 3615646-1
Finland


← Back to MokingBird AI
Company

Terms of Service

MokingBird Oy · Business ID: 3615646-1 · Last updated: April 2026

MokingBird Oy | Business ID: 3615646-1 | Finland

Last updated: April 2026

Note: Pricing and subscription tier details are subject to change. Current pricing is available at ai.mokingbird.xyz/pricing.

1. Acceptance of Terms

By downloading, installing, or using any MokingBird AI application — including the MokingBird Node, mbRAG, mbDataGen, and mbFT (collectively "the Software" or "the Products") — you agree to these Terms of Service ("Terms"). If you do not agree, do not download or use the Software.

These Terms constitute a legal agreement between you ("User") and MokingBird Oy, a company registered in Finland (Business ID: 3615646-1).


2. Products Covered

These Terms apply to all MokingBird AI products:

  • MokingBird Node — Desktop hub application bundling all three tools
  • mbRAG — Retrieval-Augmented Generation framework (available standalone or via Node)
  • mbDataGen — Synthetic dataset generation platform (available standalone or via Node)
  • mbFT — Universal fine-tuning platform (available standalone or via Node)

These Terms also apply to the MokingBird AI website (ai.mokingbird.xyz) and any associated APIs or services provided by MokingBird Oy in connection with the above products.


3. License Tiers

3.1 Free Tier

MokingBird AI products are available for free download and use. The Free Tier grants you a non-exclusive, non-transferable license to:

  • Download and install the Software on devices you own or control
  • Use the Software for personal and commercial purposes
  • Access core features as defined in the current Free Tier feature set

The Free Tier includes the core functionality of mbRAG, mbDataGen, and mbFT with certain limitations on advanced features, pipeline configurations, or compute-intensive operations.

3.2 Premium Tier

Premium subscribers pay a monthly or annual subscription fee (pricing available at ai.mokingbird.xyz/pricing) and receive:

  • Access to all advanced features across mbRAG, mbDataGen, and mbFT
  • Priority support with guaranteed response times
  • Access to new features before they reach the Free Tier
  • Higher limits on pipeline configurations and concurrent operations
  • Premium documentation and tutorials

Premium subscriptions are billed in advance. You may cancel at any time; cancellation takes effect at the end of the current billing period.

3.3 Enterprise Tier

Enterprise licenses are available for organizations requiring:

  • Volume licensing (multiple users or devices)
  • Custom deployment and integration support
  • Service Level Agreements (SLAs) with guaranteed uptime and response times
  • Custom feature development and priority roadmap influence
  • Dedicated support channel

Enterprise pricing is available on request at [email protected].

3.4 Usage-Based Features

Certain compute-intensive features within mbDataGen and mbFT may be offered on a usage basis — charged per job, per generated record, or per training run — in addition to or in place of subscription tiers. Usage-based pricing will be clearly indicated before you initiate a chargeable operation.

3.5 One-Time Purchase

MokingBird Oy may offer one-time purchase options for specific feature sets or perpetual licenses. Details will be available at ai.mokingbird.xyz/pricing.


4. Permitted Use

You may use MokingBird AI products to:

  • Query your own documents using mbRAG
  • Generate synthetic training datasets using mbDataGen
  • Fine-tune language models using mbFT
  • Build applications and pipelines on top of the local APIs exposed by the Software
  • Use the Software for personal research, commercial projects, or enterprise deployments (subject to your license tier)

5. Restrictions

You may not:

  • Sell, sublicense, or distribute the Software as a product in your own name without prior written agreement with MokingBird Oy
  • Reverse engineer, decompile, or disassemble the Software except as explicitly permitted by applicable law
  • Remove or alter any MokingBird Oy copyright notices, branding, or proprietary notices
  • Use the Software to develop a competing product with substantially similar functionality
  • Circumvent license key validation, usage limits, or other technical controls
  • Represent the Software as your own product
  • Use the Software in any way that violates applicable laws or regulations

6. User Responsibilities

Users are responsible for:

  • Ensuring they have the legal right to use any input data (documents, training datasets) with MokingBird AI products
  • Complying with applicable law and regulation in their jurisdiction
  • Managing backups, access control, and security on their local infrastructure
  • Reviewing third-party provider terms when enabling external model APIs (OpenAI, Anthropic, etc.)
  • Not using the products for unlawful purposes

7. Availability and Changes

MokingBird Oy may update, add, modify, or retire features at any time to improve safety, reliability, or product quality. For subscription users, material reductions in functionality will be communicated in advance.


8. Payment Terms

For Premium and Enterprise tiers, and for any usage-based charges:

  • Payments are processed in EUR (Euro)
  • Subscription fees are billed in advance (monthly or annually)
  • Usage-based charges are billed at the end of the billing period in which they occurred
  • Accepted payment methods are listed at ai.mokingbird.xyz/pricing
  • All prices are exclusive of VAT; applicable taxes are added at checkout

9. Refund Policy

  • Free Tier: No charges, no refunds applicable
  • Premium subscriptions: Eligible for a full refund within 14 days of initial purchase. After 14 days, no refunds for the current billing period. Cancellations take effect at the end of the billing period.
  • Enterprise licenses: Refund terms are specified in the individual enterprise agreement
  • Usage-based charges: Non-refundable once a job has been initiated, unless a technical failure on our part prevented completion

To request a refund, contact [email protected].


10. Your Data and Intellectual Property

Your data is yours. MokingBird Oy does not claim any rights to:

  • Documents you feed into mbRAG
  • Training data you generate with mbDataGen
  • Model weights you produce with mbFT
  • Configurations, pipelines, or workflows you create

All output produced by the Software using your data belongs to you. See our Privacy Policy for details on how we handle data.

MokingBird Oy retains all intellectual property rights in the Software, including its source code, algorithms, interfaces, and branding.


11. Third-Party Services

MokingBird AI tools may be configured to connect to third-party LLM providers (OpenAI, Anthropic, Google, etc.), embedding services, or vector store APIs. Your use of those services is governed by their respective terms and privacy policies. MokingBird Oy is not responsible for the availability, accuracy, or data practices of third-party services.


12. Disclaimer of Warranties

The Software is provided "as is" and "as available" without warranties of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.

MokingBird Oy does not warrant that:

  • The Software will be error-free or uninterrupted
  • AI-generated outputs (RAG responses, synthetic datasets, fine-tuned models) will be accurate, complete, or suitable for any particular purpose
  • The Software will meet your specific requirements

AI outputs are probabilistic in nature. Always validate AI-generated content before using it in production systems or making decisions based on it.


13. Limitation of Liability

To the maximum extent permitted by Finnish and EU law, MokingBird Oy shall not be liable for:

  • Indirect, incidental, special, consequential, or punitive damages
  • Loss of data, profits, business opportunities, or goodwill
  • Damages resulting from AI outputs (RAG answers, generated datasets, fine-tuned models)
  • Any damages exceeding the amount paid by you to MokingBird Oy in the twelve months preceding the claim

14. Termination

Your license terminates immediately upon breach of these Terms. MokingBird Oy may suspend or terminate your access to subscription services with reasonable notice for non-payment or material breach.

To terminate a Premium subscription, cancel through your account settings or contact [email protected]. To uninstall the Software, delete the application files from your device.


15. Governing Law and Dispute Resolution

These Terms are governed by the laws of Finland and applicable European Union law. Any disputes shall be subject to the jurisdiction of Finnish courts. EU consumer protection law applies where applicable.


16. Changes to These Terms

We may update these Terms from time to time. Material changes will be communicated through the website and in application release notes. Continued use of the Software after changes constitutes acceptance of the updated Terms.


17. Contact

MokingBird Oy
Business ID: 3615646-1
Finland


← Back to MokingBird AI
Company

Security

MokingBird Oy · Business ID: 3615646-1 · Last updated: April 2026

MokingBird Oy | Business ID: 3615646-1 | Finland

Last updated: April 2026


Security Overview

MokingBird AI products are built on a local-first architecture — meaning the primary security model is architectural rather than perimeter-based. By running AI operations entirely on your device, we eliminate the largest attack surface present in cloud AI systems: a central server holding your data.

This document explains our security approach across all MokingBird AI products (Node, mbRAG, mbDataGen, mbFT), what data is protected, how, and how to report security issues.


1. Local-First Architecture

The most significant security property of MokingBird AI is that we have no servers holding your data.

In cloud AI systems, your documents, queries, and model outputs are transmitted to and stored on provider servers. This creates:

  • A central target for attackers (your data + millions of others)
  • Insider threat risk at the provider
  • Regulatory exposure from cross-border data transfers
  • Supply chain risk if the provider is compromised

With MokingBird AI, your documents stay on your machine. Your vector stores are local. Your model checkpoints are local. Your datasets are local. There is no MokingBird server that, if breached, would expose your data — because we don't have your data.

This is not a claim about perfect security. It is a statement about threat model: the primary threats are local (your device's security) rather than remote (a provider's server breach).


2. API Key Security

When you connect MokingBird AI tools to external LLM or embedding providers, you provide API keys. These keys:

  • Are stored in a local configuration file on your device (or the OS keychain if supported by your platform)
  • Are encrypted at rest using platform-standard encryption where available
  • Are transmitted only to the third-party provider when making API calls — not to MokingBird servers
  • Are never logged in plain text by the application

Best practices for API key security:

  • Use provider-specific API keys rather than organization master keys
  • Set spending limits on your API keys at the provider's dashboard
  • Rotate keys periodically or after sharing with others
  • Do not put API keys in documents you feed into the system

3. Document and Data Security

Files you process through mbRAG, mbDataGen, or mbFT are processed in memory on your machine and, where persistence is needed, written to local disk.

Local file storage:

  • Vector stores (ChromaDB, FAISS, local Qdrant) are stored in directories you configure
  • Generated datasets from mbDataGen are written to output paths you specify
  • Fine-tuned model checkpoints from mbFT are written to local directories
  • None of these are uploaded or synced by MokingBird

Recommendations:

  • Store sensitive vector stores and datasets in encrypted directories or drives (e.g., BitLocker on Windows, FileVault on macOS)
  • Apply appropriate filesystem permissions to restrict access on shared machines
  • Follow your organization's data handling policies for sensitive documents

4. Network Security

MokingBird AI applications make network requests in the following controlled circumstances only:

RequestWhenContainsDoes not contain
Update checkOptional, user-initiatedApp version, OS typeAny personal data, documents, keys
LLM API callWhen you make a query using a cloud LLMYour query (governed by provider's policy)Document contents unless you include them
Embedding API callWhen using a cloud embedding providerText chunks you submitAPI keys (sent in headers per provider standard)

All API calls to third-party providers use HTTPS. We enforce TLS for all outbound connections.

When using local LLMs (Ollama, llama.cpp, local HuggingFace models), zero network requests are made for inference. The application is entirely air-gapped from an LLM perspective.


5. Application Security

MokingBird AI desktop applications are built using PySide6 (Python) and follow application security best practices:

  • Dependency management: Dependencies are pinned and audited for known CVEs before each release
  • Input validation: Document parsers validate file types and content before processing to prevent parser-based attacks (e.g., malicious PDF or ZIP bombs)
  • Process isolation: Each module (RAG, DataGen, FT) runs in its own process context where possible
  • No remote code execution: The local FastAPI endpoints exposed by the Node are bound to localhost (127.0.0.1) by default and are not accessible from the network without explicit user configuration
  • Secure defaults: No services are exposed to the network by default

Local API security:
The FastAPI endpoints exposed for programmatic access listen on localhost only. If you expose them to a network interface for integration purposes, you are responsible for securing that endpoint (authentication, firewall rules, VPN).


6. Model Security

When downloading models (e.g., from Hugging Face) for use with mbFT or local inference:

  • MokingBird does not curate or host model files directly — you download from the model provider's official repository
  • We recommend only downloading models from verified, reputable sources (official Hugging Face model cards, research paper repositories)
  • Be aware that model files (particularly .safetensors, .gguf) from unverified sources may contain malicious payloads — apply the same scrutiny you would to any executable

MokingBird Oy is not responsible for security issues arising from third-party model files you download and use with our tools.


7. User-Side Security Responsibilities

Because MokingBird AI runs locally on user-controlled hardware, users are responsible for:

  • Keeping the host OS and endpoint protection up to date
  • Applying disk encryption and maintaining secure backups of local data, vector stores, and model outputs
  • Applying network segmentation where required by organizational policy
  • Managing access controls on local machines and shared internal networks
  • Using dedicated, scoped API keys rather than organization master keys
  • Reviewing third-party provider terms and data residency settings when using cloud LLMs

8. Secure Operations Practices

Recommended practices for teams deploying MokingBird AI:

  • Keep software versions current — security patches are included in regular updates
  • Enforce least-privilege access on machines where the tools are installed
  • Maintain audit trails for sensitive workflows where policy requires
  • Use separate environments for testing/experimentation and production workloads
  • Rotate API keys periodically and immediately if exposure is suspected

9. Reporting Security Issues

We take security vulnerabilities seriously and respond promptly.

To report a security vulnerability:

  1. Email [email protected]
  2. Include a description of the vulnerability, steps to reproduce, and potential impact
  3. If possible, include a proof-of-concept (without deploying it against systems you don't own)
  4. We will acknowledge receipt within 48 hours
  5. We aim to provide an initial assessment within 7 days and a fix timeline within 14 days for critical issues

Please do not:

  • Publicly disclose vulnerabilities before we have had a chance to fix them (responsible disclosure)
  • Test against systems other than your own installations
  • Use the vulnerability for any purpose other than demonstrating the issue

We are working toward a formal bug bounty program. Until that is live, we appreciate responsible disclosure and will acknowledge contributors publicly (with their permission) in release notes.


10. Security Updates

MokingBird Oy releases security patches as part of our regular update cycle. For critical security vulnerabilities, we release out-of-band patches as quickly as possible.

We strongly recommend:

  • Enabling automatic update checks in the application settings
  • Following @mokingbirdxyz for security announcements
  • Subscribing to our newsletter for important updates

Security advisories are published in our GitHub repository release notes.


11. Forward Security Roadmap

Planned security maturity improvements include:

  • Expanded threat-model documentation per deployment mode (local-only, hybrid, multi-user)
  • Hardening guides for enterprise deployments
  • More explicit enterprise security controls and configuration checklists
  • Formal security audit cycle

This is a first-version security overview. It will be refined with formal security review cycles.


Contact

MokingBird Oy
Business ID: 3615646-1
Finland

← Back to MokingBird AI
Company

Pricing

MokingBird Oy · Last updated: April 2026
Note: Pricing structure is coming soon. Download free to get started today.
Exact pricing for Premium and Enterprise tiers will be announced at launch.

Last updated: April 2026


Start for Free

MokingBird AI is free to download and use. No credit card required.


Plans

Free

€0 — Free to download and use

Get started immediately with the core features of the entire MokingBird AI suite.

Includes:

  • MokingBird Node (desktop hub) — full access
  • mbRAG — L1 and L2 pipeline levels, 8 document formats, ChromaDB vector store, up to 3 LLM provider connections
  • mbDataGen — 5-phase pipeline, up to [N] records per run, standard schema output
  • mbFT — 8 fine-tuning techniques (4 SFT + 4 RL), 4 frameworks, VRAM pre-simulation
  • Local API access (localhost)
  • Community support (GitHub Issues)
  • Free updates

Best for: Individual researchers, developers, students, and teams evaluating the platform.


Premium

Pricing coming soon — per user/month (monthly or annual billing)

Everything in Free, plus:

mbRAG — Advanced:

  • All 4 pipeline levels (L1 → L4 Advanced, up to ~3.5s)
  • All 17 document formats including multimodal
  • All 6 vector store backends (FAISS, Qdrant, Pinecone, Weaviate, Milvus)
  • All 10 LLM providers
  • All 6 retrieval strategies including Contextual Retrieval (all 6 levels)
  • Analytics dashboard (usage, cost tracking, performance metrics)
  • Advanced pipeline customization

mbDataGen — Advanced:

  • Unlimited records per run (hardware permitting)
  • All output schemas and custom schema definitions
  • Full GPRO-Hybrid RL with all K=4 candidates
  • Full 5-stage validator (Schema → Distribution → Dedupe → Grounding → Novelty)
  • HMAC-signed RunManifest with provenance tracking
  • Async batch processing

mbFT — Advanced:

  • All 16 fine-tuning techniques (6 SFT + 5 RL + 5 Multimodal)
  • All 7 frameworks (Unsloth, Axolotl, LLaMAFactory, Transformers, DeepSpeed, FSDP, TRL)
  • Full 6-tier hardware classification
  • Hybrid GRPO (MokingBird-original)
  • Advanced VRAM simulation and optimization

Support:

  • Priority email support
  • Access to premium documentation and tutorials
  • Early access to new features

Best for: Professionals, researchers, and teams with regular AI workflows.


Enterprise

Custom pricing — contact us

Everything in Premium, plus:

  • Volume licensing (teams of any size)
  • Custom deployment and on-premise support
  • Service Level Agreements (SLA) with guaranteed response times
  • Custom feature development
  • Priority roadmap influence
  • Dedicated support channel and account manager
  • Custom integration and onboarding support
  • Invoice billing

Contact [email protected] for a quote.

Best for: Organizations, enterprises, and research institutions with large-scale AI infrastructure needs.


Usage-Based Options

Certain compute-intensive operations in mbDataGen and mbFT may be available on a usage basis — charged per job or per batch — for users who need occasional high-compute runs without a full Premium subscription. Pricing will be available at launch.


One-Time Purchase

We are exploring a perpetual license option for users who prefer one-time payment over subscriptions. Details will be announced.


Packaging Options

Depending on your needs, MokingBird AI may be available in the following configurations:

  • Full Node bundle — all three tools (mbRAG + mbDataGen + mbFT) in one download
  • Individual modules — install only mbRAG, only mbDataGen, or only mbFT as standalone apps
  • Team / enterprise deployment — volume licensing with contract-based terms

What We Commit To

  1. Published pricing tables before any broad commercial rollout — no surprise charges
  2. No hidden ad-based monetization — we make money when you upgrade, not by selling your data
  3. Clear separation between free and paid capabilities — no bait-and-switch
  4. Transparent policy updates — pricing changes communicated before they take effect

Frequently Asked Questions

Is the free tier really free?
Yes. Free to download and use forever for core features. No credit card required. No time limit.

Do I need an internet connection?
No. All core features work offline. Internet is only needed for: optional update checks, and queries to cloud LLM providers (if you choose to use them).

Can I use the free tier for commercial work?
Yes. Free and Premium tiers both permit commercial use.

What happens if I cancel Premium?
Your account reverts to the Free tier at the end of your billing period. Your data, configurations, and locally generated files are unaffected.

Is there a student or academic discount?
We plan to offer academic discounts. Contact [email protected] with your institutional email for availability.


Questions?

← Back to MokingBird AI
Blog

MokingBird AI — Articles

Insights, deep dives, and technical articles about local AI infrastructure

Browse all articles about MokingBird AI and the Node ecosystem.

MokingBird AI

Introducing MokingBird AI: The Private AI Desktop Suite Built for Real Work

April 2026 · MokingBird Team

← Back to Blog
MokingBird AI

Introducing MokingBird AI: The Private AI Desktop Suite Built for Real Work

April 13, 2026 · MokingBird Team · Tags: local AI, RAG, fine-tuning, synthetic data, privacy

There's a version of AI adoption that looks something like this: your team starts using an AI assistant to query internal documents. It works well. Then someone asks: where exactly are those documents going? The answer is "to a cloud API." That answer starts a conversation with legal, then compliance, then security. Three months later, the project is on hold.

This isn't a hypothetical. It's the pattern many serious AI deployments hit.

MokingBird AI was built to remove that friction — not by making privacy features a checkbox, but by making local-first execution the architecture.


The Problem with Cloud AI for Real Workloads

Cloud AI APIs are genuinely impressive. They're easy to start with and require no hardware investment. But for organizations and researchers doing real work with sensitive data, the cloud model has structural problems:

Your data leaves your control. When you send documents to an API for RAG, generate training data through a cloud service, or fine-tune using uploaded datasets, your data is on someone else's server. Even with strong privacy terms, you're trusting a third party.

Per-query costs scale unpredictably. Embedding thousands of documents for a vector store, generating large synthetic datasets, or running extensive fine-tuning experiments can produce surprising API bills.

Internet dependency creates fragility. If the API is down, your pipeline is down. If you're in an air-gapped environment, you're out of luck. Critical workflows can't afford that dependency.

Failure resilience disappears. With cloud infrastructure, a single provider outage can make your entire AI pipeline unavailable. Local-first architecture means your tools work whether or not any external service is reachable.

Vendor lock-in is real. Switching from one LLM provider to another — even if you want to — often requires significant pipeline rework.

MokingBird AI solves all of these, not by being clever about cloud architecture, but by running on your hardware.


The MokingBird Node: One Hub, Three Tools

MokingBird AI is organized around the MokingBird Node — a desktop application that serves as a hub for three standalone tools:

MokingBird Node
├── mbRAG       — Retrieval-Augmented Generation
├── mbDataGen   — Synthetic Dataset Generation
└── mbFT        — Fine-Tuning Platform

You can install the Node to get all three tools in one place, or install any tool individually. Each tool also exposes a local FastAPI REST endpoint for programmatic access — so you can integrate them into existing pipelines without changing your workflow and without forcing a full platform migration.

Everything runs on your machine. Windows, macOS, or Linux.


mbRAG: Research-Grade Retrieval, Without the Framework Pain

mbRAG is a production-ready Retrieval-Augmented Generation framework — a comprehensive, stable alternative to LangChain, built from the ground up to be reliable and transparent.

It implements every major RAG approach in a unified system: sparse retrieval, dense retrieval, hybrid ensemble methods, multi-query expansion, parent document retrieval, and more. No swapping libraries when you need a different strategy. Everything is in one place.

Four pipeline levels let you trade latency for accuracy:

LevelLatencyBest for
L1 Basic~0.8sFast lookup, simple Q&A
L2 Enhanced~1.5sImproved recall, standard workflows
L3 Smart~2.2sContext-aware, complex documents
L4 Advanced~3.5sMaximum accuracy, research-grade

Document support covers 17 formats — PDF (with three parser engines and automatic fallback), DOCX, Excel, CSV, JSON, Markdown, PowerPoint, Email, Images with OCR via Tesseract and EasyOCR, Web content, and more.

The 8-Step RetrievalOrchestrator processes every query through: document loading → chunking strategy selection → embedding → vector store retrieval → reranking → context enhancement → response generation → quality validation.

The signature innovation: 6-Level Contextual Retrieval. Traditional RAG loses context when it splits documents into chunks — a chunk about "the agreement" doesn't know what "the agreement" refers to unless the surrounding context is preserved. MokingBird's Contextual Retrieval enriches every chunk with up to 6 levels of surrounding context before indexing, so retrieval doesn't lose the thread of meaning. This alone is responsible for much of the 40–50% accuracy improvement over naive RAG implementations.

You can connect mbRAG to 10 LLM providers — cloud (OpenAI, Anthropic Claude, Google Gemini) or local (Ollama, llama.cpp, vLLM, HuggingFace local) — and 8 embedding providers. Use it fully offline with Ollama for complete air-gap operation, or connect to cloud APIs using your own keys.


mbDataGen: Synthetic Data That Actually Works

Getting high-quality training data is one of the most persistent bottlenecks in applied ML. Public datasets are often noisy, domain-mismatched, or not structured for your specific task. Manual labeling is expensive and slow. mbDataGen is a different approach: generate clean, validated synthetic data from your own documents.

The 5-phase pipeline:

  1. Extract — Load and parse your source documents (17 formats supported)
  2. Enrich — Add metadata, context, and structural information
  3. Generate — Produce candidate data using the GPRO-Hybrid RL approach
  4. Validate — Run candidates through the 5-stage validator
  5. Deploy — Export to your training format with full provenance metadata

GPRO-Hybrid RL is mbDataGen's core innovation — an original reward learning approach that generates K=4 candidate outputs per data point and scores them:

Total Reward = 0.7 × Field/Process Reward + 0.3 × Outcome/Overall Reward

This means the system simultaneously optimizes for field-level accuracy (is this specific field correct?) and overall quality (does this example make sense as a whole?), producing data that passes both micro and macro quality checks.

The 5-stage validator then filters every generated record through: Schema validation → Distribution checking (does the generated dataset match realistic distributions?) → Deduplication → Grounding (can claims be traced back to the source?) → Novelty (does this add value over existing data?).

Every output includes a HMAC-signed RunManifest — a cryptographically verifiable provenance record that documents exactly how each data point was generated, what sources it came from, and what validation scores it received. This matters when you need to audit or certify training data.

Hardware requirements: minimum 6GB VRAM, 8GB recommended.


mbFT: Fine-Tuning Without the Complexity Tax

Fine-tuning a language model well requires expertise across multiple dimensions: choosing the right technique for your use case, selecting the right framework, estimating memory requirements, setting hyperparameters. mbFT's Smart Config Engine handles this complexity while keeping you in full control.

16 techniques across three categories:

  • 6 SFT methods: LoRA (the practical standard), QLoRA (memory-efficient LoRA), Full Fine-Tuning, Prefix Tuning, Prompt Tuning, Adapter Layers
  • 5 RL methods: GRPO, PPO, DPO (Direct Preference Optimization), ORPO, Kahneman-Tversky Optimization
  • 5 Multimodal methods: Vision-Language, Audio-Text, Code-Specialized, Medical Imaging, Document Understanding

7 supported frameworks — Unsloth (speed-optimized), Axolotl, LLaMAFactory, Hugging Face Transformers, DeepSpeed, FSDP, TRL — with automatic selection recommendations based on your hardware and task.

VRAM pre-simulation is the feature that saves the most time. Before you start a training run, mbFT estimates your memory requirements based on model size, technique, batch size, and sequence length. You see the estimate, compare it to your available VRAM, and decide whether to proceed — without paying for the 20-minute run that would have OOM'd on step 3.

Hybrid GRPO is MokingBird's original contribution: a fine-tuning method that combines reward model signals with rule-based signals, allowing you to shape model behavior with both learned preferences and hard constraints simultaneously. It's particularly useful for domain-specific applications where you have both preference data and hard rules to enforce.

The 6-tier hardware classification system automatically classifies your hardware and adjusts defaults accordingly — from laptop-class GPUs to multi-GPU research workstations.


Who It's For

Researchers working with proprietary or sensitive datasets who cannot send data to cloud APIs. mbRAG for literature review, mbDataGen for generating domain-specific training data, mbFT for adapting foundation models to specialized tasks.

Developers building LLM-powered applications who need reliable, controllable RAG infrastructure and the ability to fine-tune models for specific behaviors without outsourcing the process.

Enterprises with compliance requirements, air-gapped environments, or data residency obligations that preclude cloud AI.

ML engineers who want production-grade tooling — not a tutorial notebook — for RAG, data generation, and fine-tuning.


API and Integration Direction

An important part of the MokingBird AI design is composability. Each tool exposes a local FastAPI REST endpoint — meaning you can integrate mbRAG, mbDataGen, or mbFT into your own pipelines, internal tooling, or product prototypes without changing your existing workflow and without forcing a full platform migration.

Teams rarely operate in completely isolated environments. Local APIs make MokingBird AI infrastructure that integrates with what you already have.


Download Free

MokingBird AI is free to download. No account required. No credit card. The core features of all three tools are available on the Free tier.

Advanced features — full pipeline levels, all frameworks, all document formats — are available in Premium. Enterprise licensing is available for organizations.

Download from ai.mokingbird.xyz for Windows, macOS, and Linux.

Your data stays yours. It always has.

Your AI should run on your terms.