Skip to content

Bee Models

One engine, seven progressive tiers.

Cell → Brood → Comb → Buzz → Hive → Swarm → Enclave. Each tier serves a defined role across capability, usage, governance, and deployment control — from public access to sovereign deployment. Bee Cell is live today; the higher tiers land behind the governed-release pipeline as each clears the eval harness. Each tier ships on a curated open-weight production base; specific bases are auditable via the Enclave deployment manifest under NDA. Bee Ignite is internal R&D, not customer-selectable.

Live in production

Live serverless inference, OpenAI Chat Completions compatible. The Bee Cell production base ships on a curated open-weight Apache-2.0 release under our governed release policy; specific base disclosure is contractual via the Enclave deployment manifest.

Bee Cell

bee-cell · 128K context · cutoff 2026-03

Input

$0.15 / 1M

Output

$0.60 / 1M

Best for

  • Solo developers
  • Mobile / on-device chat
  • Local-first workflows

Capabilities

ChatToolsJSON modeRAGStreaming
Try Bee Cell

Roadmap tiers

Roadmap tiers are in final training and validation. Each tier lands behind its own Live badge once the trained checkpoint clears the eval harness. Evidence and validation criteria are published at /trust.

Bee Brood

In training

bee-brood · 256K context · target pricing $0.30 / $1.20 per 1M tok

base · Bee Brood reasoning base

Premium reasoning tier. Depends on the governed-release pipeline that lands each capability behind the eval harness.

ChatToolsThinking modeJSON modeRAGStreaming

Bee Comb

In training

bee-comb · 256K context · target pricing $0.50 / $1.50 per 1M tok

base · Bee Comb production base

Builder + production workflow tier. Specialised reasoning for coding, automation, API applications, and technical workflows. First domain adapter is in final validation.

ChatToolsJSON modeRAGStreamingVision (Hive+)

Bee Buzz

In training

bee-buzz · 256K context · target pricing $1.00 / $3.00 per 1M tok

base · Bee Buzz agent base

Team + agent workflow tier. Sized for collaboration, tool use, agents, pooled usage, internal tools, and operational workflows.

ChatToolsFunction callingJSON modeRAGStreaming

Bee Hive

In training

bee-hive · 256K context · target pricing $2.00 / $8.00 per 1M tok

base · Bee Hive multi-base ensemble

Enterprise specialist intelligence tier. High-capability specialised reasoning for enterprise analysis, technical depth, regulated-domain preparation, and knowledge-intensive workflows.

ChatToolsVisionExtended thinkingMulti-base routingQuantum reasoning

Bee Swarm

In training

bee-swarm · 1M context · target pricing $5.00 / $15.00 per 1M tok

base · Bee Swarm routed fabric (frontier-class)

High-assurance advanced reasoning tier. Premium intelligence for complex reasoning, research workflows, multi-agent coordination, and mission-critical analysis.

ChatToolsVisionExtended thinkingFrontier-class routingMulti-vendor fabric

Under the hood

A hybrid architecture, not a vanilla decoder.

The seven customer tiers (Cell, Brood, Comb, Buzz, Hive, Swarm, Enclave) share the same engine design. The numbers below describe the architectural design target. Bee Cell ships today on a curated open-weight base under our governed release policy while the rest of the ladder lands tier by tier through the eval-gated rollout.

Base parameters367.1M
Hidden size4,096
Transformer layers48
Attention32 heads / 8 KVGrouped Query Attention (GQA) + RoPE
Mixture of Experts16 experts, top-2every 4th layer
State SpaceMamba-style SSMevery 6th layer
Compressive memory4096 slots4× compression
Self-thinking depth8-level CoTself-verified

Capabilities

What every Bee model can do.

Adaptive router

Difficulty-scored routing between local execution and frontier-teacher escalation. Scoring blends keyword complexity, query length, conversation depth, code/math detection, and a per-domain multiplier.

  • Local lane: difficulty score < 0.4
  • Teacher lane: difficulty score > 0.7
  • Domain multipliers — quantum 1.5×, fintech / cybersecurity 1.3×
  • Self-verify pass threshold: 0.45 (coherence + relevance + completeness)

source · bee/adaptive_router.py

Evolution engine (architecture)

The evolution loop generates candidate neural modules — attention variants, SSM discretisations, compression codecs, memory protocols — runs them through a sandboxed eval, and only accepts winners. Implemented in bee/evolution.py + bee/invention_engine.py; not currently running against the live Bee Cell deployment.

  • 6 candidates × 3 generations per cycle
  • Orchestration cadence: every 300s when active
  • AST-based safety checks; forbidden imports/calls blacklist
  • Eval-gated acceptance, regression detection, automatic rollback

source · bee/evolution.py · bee/invention_engine.py

Self-coding & self-healing (architecture)

When a request needs code, the self-coding module is designed to write it, execute in a sandbox, read the error, and iterate. Activates on roadmap tiers as each clears the eval harness.

  • Self-coding: up to 5 iterations, 30s timeout
  • Self-healing: gradient norm monitor, loss-spike detection, NaN guard
  • Auto-tunes learning rate, checkpoints, rolls back to last good state
  • Algorithm invention, compression, crypto primitives, math proofs

source · bee/self_coding.py · bee/self_heal.py

RAG pipeline

Document upload → chunk → embed → FAISS index → cite. End-to-end retrieval is built into every tier; pass document IDs in the chat request and Bee handles the rest.

  • Vector store: FAISS
  • Embedding model: all-MiniLM-L6-v2
  • Embedding dimension: 384
  • Per-tenant isolation; shared index on Buzz and above

source · bee/rag* · bee/data_engine.py

Quantum integration (opt-in)

Real qiskit-ibm-runtime integration to IBM Heron r2 hardware, with a local statevector simulator fallback. The integration is real code in the repo — every API call is NOT routed through quantum today; it activates per-request on roadmap tiers as they ship.

  • Backends: ibm_kingston, ibm_fez, ibm_marrakesh
  • Family: IBM Heron r2 (156 qubits each)
  • Local statevector simulator fallback (~28 qubits)
  • Quantum tier — none / simulator / queued / pack / pack-pro / dedicated

source · bee/quantum_reasoning.py · bee/quantum_ibm.py

Domain adapters

LoRA adapters specialise the base model for low-cost fine-tuning per domain. Adapters are released through the governed-release pipeline once each clears the eval harness — every released adapter has a published validation record at /trust.

  • Domain-specific data collection · eval-harness-gated · governed release
  • General, technical, business, regulated, and research workflow domains
  • Restricted-domain (Tier-3) workflows include explicit acknowledgement and jurisdictional gates per /legal/acceptable-use
  • Specific LoRA configuration is contractual via the Enclave deployment manifest

source · Per-adapter validation records published at /trust

Internal eval suite

Real numbers, reproducible.

The Cell base (google/gemma-4-E4B-it) on our 40-task internal suite, run on Apple Silicon (MPS). Every score traces to the committed raw prompts + outputs in data/eval_reports/report.json — reproduce with python -m bee.eval_harness --device mps. A small internal suite on the base model (pre-adapter), not a comparative public benchmark — the raw outputs even show where the strict grader is over-strict.

Overall

70.0%

BenchmarkScorePassedAvg latency
Coding60%6 / 106740 ms
Reasoning60%6 / 10449 ms
Instruction following90%9 / 101445 ms
Grounded factual40%2 / 5718 ms
Domain (specialised)100%5 / 52168 ms

source · bee/eval_harness.py · data/eval_reports/report.json · google/gemma-4-E4B-it (7941M params) · 100.8s

API compatibility

OpenAI Chat Completions — drop-in.

Bee speaks the same wire protocol as OpenAI's /chat/completions. Switch the base URL and the API key — your existing client code keeps working. Tools, JSON mode, structured outputs, and streaming all supported.

Deployment

Bee Enclave

Deployment mode

bee-enclave · contracted · cutoff 2026-03

Run any Hive- or Swarm-class workload in your private VPC, regulated, or air-gapped environment. Same models, different deployment posture.

Hive / Swarm in private VPCPQC transportAudit + compliance evidence
Contact sales

Research track

Bee Ignite

Research only

bee-ignite · CUI Labs internal · not commercially available

Bee Ignite is the experimental Bee-native architecture: MoE, SSM memory, neural compression, distillation, and quantum-assisted modules. Findings backflow into production tiers but Ignite itself is not user-selectable.

Bee-native fusionMoE experimentsSSM memoryNeural compression