Skip to main content

Overview

The Blink Server acts as the control plane for AI agent deployments. Key architectural principles:
  • Agents are HTTP servers deployed as Docker containers
  • The control loop runs inside the server, not in agents
  • Communication is HTTP; chat streaming uses SSE and WebSocket to clients
  • State is centralized in PostgreSQL (chats, runs, deployments, files, logs, traces, KV)
The server also hosts the web UI and the HTTP API used by the CLI and SDK.

Server Components

  • API server handles chats, agents, webhooks, files, logs, traces, and devhook routing
  • WebSocket server is used for chat streaming and auth token handshakes
  • Startup runs database migrations before accepting traffic

Agent Execution Model

Agents are deployed as Docker containers using a configurable image (default: ghcr.io/coder/blink-agent:latest).

Container Structure

Each agent container includes:
ComponentPurpose
Agent bundleBuilt files staged into /app from the deployment output
Runtime wrapperStarts the agent and internal API server, proxies requests, injects auth
Internal API serverServes /kv, /chat, and /otlp/v1/traces for agent code and forwards to the Blink Server
OpenTelemetry CollectorCollects agent logs and forwards them to the server

Runtime Wiring (self-hosted)

On deployment the server:
  1. Downloads deployment output files, writes them to a temp dir, and adds a runtime wrapper (__wrapper.js).
  2. Launches a container and sets environment variables like ENTRYPOINT, PORT, INTERNAL_BLINK_API_SERVER_URL, INTERNAL_BLINK_API_SERVER_LISTEN_PORT, BLINK_REQUEST_URL, BLINK_REQUEST_ID, and BLINK_DEPLOYMENT_TOKEN.
  3. The wrapper starts an internal API server inside the container and patches fetch so internal API calls include x-blink-internal-auth.
  4. The wrapper runs the agent entrypoint on PORT+1 and proxies incoming requests on PORT to the agent.
  5. The OpenTelemetry collector starts and reads the agent log pipe.

Control Loop

The control loop is the core orchestration mechanism. It runs inside the server, not in agents.

Request Flow

  1. External event arrives (API call, Slack message, GitHub webhook)
  2. Server routes the event to the appropriate agent deployment
  3. Server invokes the agent’s /_agent/chat endpoint with an invocation token
  4. Agent processes the request and streams a response back (SSE)
  5. Server persists messages and run/step state to PostgreSQL and fans out to clients

Chat Run Lifecycle

  • Each chat run has one or more steps stored in the DB.
  • The server selects the latest step, invokes the active deployment, and streams chunks as they arrive.
  • If the response includes tool calls, the server creates a new step and continues the loop.
  • Interrupts cancel an in-flight step and restart with the latest state.

Streaming and Buffering

  • The server broadcasts message.chunk.added events to WebSocket and SSE clients.
  • The current streaming buffer is kept in memory to allow reconnects.
  • This in-memory session state is the main blocker for horizontal scaling today.

Chat Run Sequence

Why the Control Loop is Server-Side

Running the control loop in the server rather than agents provides:
  • Centralized state in PostgreSQL
  • Agent simplicity (no orchestration logic)
  • Observability and auditability
  • Consistent tool-call looping behavior
For more details about the control loop, see the agent structure guide.

Request Routing

For details on webhook routing and devhooks, see the webhooks and devhooks guide.

Communication

Server -> Agent

The server communicates with agents via HTTP:
EndpointMethodPurpose
/_agent/healthGETHealth check
/_agent/chatPOSTChat request, SSE response
/_agent/capabilitiesGETCheck supported handlers
/_agent/uiGETUI schema for dynamic inputs
/_agent/flush-otelPOSTFlush telemetry buffers
/_agent/*ANYCustom request handler
Older deployments may still be called via /sendMessages or /_agent/send-messages. All server -> agent calls include x-blink-invocation-token. Chat runs also include run, step, and chat ID headers.

Agent -> Server

Agents do not call the public API directly in containers. Instead, the wrapper exposes an internal API server:
  • /kv for agent key-value storage
  • /chat for chat CRUD and message operations
  • /otlp/v1/traces for trace export (logs are forwarded by the collector)
The wrapper forwards these to the Blink Server using the invocation token and the deployment token.

Data and Storage

PostgreSQL stores:
  • chat messages, runs, and steps
  • agents, deployments, and deployment targets
  • files and attachments
  • logs and traces (self-hosted)
Migrations run automatically at server startup.

Limitations

Current architectural constraints to be aware of:
LimitationDetails
Single node onlyIn-memory chat streaming buffers prevent horizontal scaling
Docker requiredAgents must run as Docker containers (no Kubernetes, ECS, etc.)
Local Docker daemonServer must have direct access to Docker socket
These limitations exist because Blink is in early access. We plan to support horizontal scaling and other deployment options in the future.