Skip to content

Roadmap

What we’re building.

Each stage has explicit exit criteria measured against the Bee Security Eval Harness. We don’t move on until the current stage’s criteria are met. Bee Ignite is internal R&D, not a customer tier — research wins backflow into the customer tiers below.

Stage 0 — Honest launch

Done

Get Bee live with the runtime safety wrapper, an honest marketing-claim audit, and the first Bee Security Eval Harness baseline (12.5 / 100). The point isn't the score — it's the ground truth to measure improvements against.

Shipped

  • Three-layer runtime safety wrapper (intent scan + system-prompt anchor + output filter) in front of every chat call
  • Marketing-claim audit — every "X works" claim cross-referenced against code or labelled Roadmap
  • Baseline eval written to Postgres for trend analysis (52 cases, 10 categories)

Stage 0.5 — Cybersec sweep across Comb / Hive / Swarm

In progress

Train domain-specialised LoRA adapters for cybersecurity on every tier from Comb up. The cybersec adapter pipeline is the template for the other nine domains; getting one right end-to-end de-risks the rest.

In flight

  • Vertex Comb cybersec adapter — landed (train_loss 0.314, ~6h on L4)
  • Vertex Hive cybersec smoke run on Qwen3-14B — running
  • 7 Hive domain adapters dispatched + 2 queued in parallel via 8× A100 quota in us-central1
  • Swarm cybersec on A100 80GB on-demand (Qwen3-30B-A3B MoE)
  • Tier-1 CII detection wrapper for Singapore Cybersecurity Act 2018 s.9 compliance
  • Research queue capture wired — every safety-wrapper block POSTs to /api/research/capture

Exit criteria

  • Hive sweep + Swarm cybersec adapters merged into production routing
  • /api/cron/eval-run re-runs the 52-case harness; total_score strictly higher than 12.5
  • Per-category score ≥ 80% on cybersec-adjacent categories (1, 7, 9, 10)
  • Research queue accumulating production captures with expected wrapper_reason distribution

Stage 1 — APK distribution

Next

Ship the Android workspace via direct APK download from /download. Gated on the Stage 0.5 eval lift — we don't publish a mobile surface against an unmerged adapter set.

Exit criteria

  • Stage 0.5 exit criteria met (cybersec adapters merged, eval lift verified)
  • APK signed by the cuilabs CUI release key and hosted at /download
  • Mobile chat parity with the web workspace for Cell + Hive tiers

Stage 2 — Per-tier observability + per-tier health

Next

Today /status surfaces the parent backend probe; per-tier health (Cell, Brood, Comb, Buzz, Hive, Swarm, Enclave) currently inherits the parent verdict. Stage 2 wires per-tier probes so the status board reflects the actual fan-out.

Exit criteria

  • Per-tier /api/health/<tier> endpoints serving real liveness + p50/p95 latency
  • Tier-by-tier status rows on /status driven by independent probes
  • Sentry release tagging extended to the new endpoints

Stage 3 — MCP HTTP transport + remote MCP

Next

The Bee MCP server today supports stdio (Claude Desktop, Cursor, VS Code, Zed). HTTP transport unlocks remote MCP — Bee usable from hosted clients without a local Python install.

Exit criteria

  • python -m bee.mcp_server --http <port> serves the same 11 tools over JSON-RPC
  • Authenticated via the same Bee API key used by /v1/chat/completions
  • Documented at /docs/mcp with a Hosted MCP install path

Commerce track — Billing, payments, premium routing

Done

Parallel track to the model work — running production cloud-tier subscriptions, per-tier usage counters, credit ledger, premium routing budgets, and the Stripe v2 cutover. Now stable; future commerce work is incremental rather than a stage.

Shipped

  • 8-tier ladder + per-tier counters (Cell, Brood, Comb, Buzz, Hive, Swarm, Enclave, Ignite)
  • Stripe v2 cutover with monthly + annual billing cycles + idempotent price seeding
  • credit_ledger append-only audit log + grantCredits / debitCredits / splitChargeAgainstWallet
  • Per-tier usage caps + premium-routing budget caps with optional hard-fail mode
  • VC Partner / Partner / BEE for Startups program intakes

Roadmap caveats

Stage names and exit criteria are stable; specific dates are not. We measure progress by eval lift, not calendar quarters. The internal source of truth is docs/product/roadmap.md in the main repo — this page is the customer-facing distillation, updated when stages flip status.