Skip to content

Docs

Build with Bee.

Bee exposes an OpenAI-compatible Chat Completions API. Drop-in compatible with any client that already speaks OpenAI; switch the base URL and key, and you're shipping.

Quickstart

Three steps. Sign up, mint a key, send your first request.

  1. 1. Open a workspace — free tier, no card required.
  2. 2. Generate an API key under Settings → API keys.
  3. 3. Send your first request:
curl
curl https://api.bee.cuilabs.io/chat/completions \
  -H "Authorization: Bearer $BEE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bee-comb",
    "messages": [
      { "role": "user", "content": "Explain post-quantum cryptography in one paragraph." }
    ]
  }'

Authentication

Bearer tokens. Pass your API key in the Authorization header.

HTTP
Authorization: Bearer sk-bee-...

Public-tier requests run over standard TLS 1.3. Bee Enclave Sovereign customer transport runs over the QNSP post-quantum stack (FIPS 203 ML-KEM + FIPS 204 ML-DSA); see /security for the rollout posture.

Chat completions

Same shape as OpenAI's /chat/completions — see the request schema below. Tools, JSON mode, and structured outputs are supported on all production models.

POST /chat/completions
{
  "model": "bee-hive",
  "messages": [
    { "role": "system", "content": "You are a senior security engineer." },
    { "role": "user",   "content": "Audit this Kyber key exchange." }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "tools": [/* function-calling spec */],
  "response_format": { "type": "json_object" }
}
200 response
{
  "id": "cmpl_01HZ...",
  "object": "chat.completion",
  "model": "bee-hive",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 287,
    "total_tokens": 329
  }
}

Models

Six production tiers. The adaptive router picks the right size automatically when you specify model: "bee"; pin a tier explicitly to control cost and latency.

  • bee-cell128K context · $0.15 / $0.60
  • bee-brood256K context · $0.30 / $1.20
  • bee-comb256K context · $0.50 / $1.50
  • bee-buzz256K context · $1.00 / $3.00
  • bee-hive256K context · $2.00 / $8.00
  • bee-swarm1M context · $5.00 / $15.00

Prices are USD per 1M tokens (input / output). Full lineup on the models page.

RAG (retrieval)

Upload documents and let Bee retrieve relevant chunks at inference time. Vector store is FAISS; embedding model is all-MiniLM-L6-v2 (384-dim).

POST /documents
# Upload (multipart)
curl https://api.bee.cuilabs.io/documents \
  -H "Authorization: Bearer $BEE_API_KEY" \
  -F "file=@whitepaper.pdf"

# Retrieve at inference time — pass document_ids in the chat request:
{
  "model": "bee-hive",
  "messages": [...],
  "documents": ["doc_abc123", "doc_def456"]
}

Streaming

Set stream: true. Responses arrive as Server-Sent Events with the same delta format OpenAI uses.

data: lines
data: {"choices":[{"delta":{"content":"Post-"}}]}
data: {"choices":[{"delta":{"content":"quantum"}}]}
data: {"choices":[{"delta":{"content":" cryptography"}}]}
data: [DONE]

Errors

Standard HTTP status codes. Body is { error: { code, message } }.

  • 400 — invalid request (schema)
  • 401 — missing or invalid API key
  • 402 — pool exhausted on hard-limited plan
  • 429 — rate limited; honour Retry-After
  • 500 — engine fault; safe to retry

Rate limits

Per-key, per-minute. Limits scale with plan tier; see pricing for the per-plan token allowance.

Every response includes:

response headers
X-Bee-RateLimit-Limit:     1000
X-Bee-RateLimit-Remaining: 873
X-Bee-RateLimit-Reset:     1714867200
X-Bee-Pool-Remaining:      9842917

More docs are on the way. SDK reference (Python, Node, Rust, Go), tool-use cookbook, and migration guides from OpenAI / Anthropic ship as we cut their APIs over.

Need something now? Ask support.