Opengram

Dispatch System

How Opengram batches, queues, and delivers messages to your agent workers.

The dispatch system is the bridge between the Opengram UI and your agent. When a user sends a message, the dispatch system collects it, optionally batches it with other pending messages, and makes it available for your worker to claim and process.

Dispatch modes

Configure the mode with the server.dispatch.mode field in opengram.config.json. Three modes are available:

immediate

Each message is dispatched the moment it arrives. No batching, no debouncing. Use this when latency matters more than efficiency, or when your agent processes each message independently.

sequential

Messages are dispatched one at a time in the order they were received. The next message is not dispatched until the current one is completed. This is useful when message order is critical and your agent cannot handle concurrent inputs.

batched_sequential (default)

Messages are collected into batches using a debounce window, then dispatched sequentially. This is the default mode and works well for most use cases -- it handles rapid-fire messages from users without overwhelming your agent.

Batching is controlled by three timing parameters:

ParameterDefaultDescription
batchDebounceMs600Wait this long after the last message before sealing the batch.
typingGraceMs2000Extra grace period if the user is still typing.
maxBatchWaitMs30000Maximum time to wait before forcing the batch to dispatch.
opengram.config.json
{
  "server": {
    "dispatch": {
      "mode": "batched_sequential",
      "batchDebounceMs": 600,
      "typingGraceMs": 2000,
      "maxBatchWaitMs": 30000
    }
  }
}

Input sources

A dispatch can be triggered by different input sources:

  • user_message -- the user sent one or more messages in the chat.
  • request_resolved -- the user resolved an interactive request (e.g. answered a question or submitted a form).

Your worker receives the input source in the dispatch payload so it can decide how to handle each case.

Worker claiming flow

Your agent worker pulls dispatches from Opengram using a claim-based protocol:

  1. Claim -- call POST /api/v1/dispatch/claim to claim a single dispatch, or POST /api/v1/dispatch/claim-many to claim up to N dispatches at once. The response includes the batch payload with all messages, chat context, and an agentIdHint indicating which agent configuration applies.
  2. Heartbeat -- while processing, call POST /api/v1/dispatch/{id}/heartbeat periodically to extend the lease. If the lease expires without a heartbeat, the dispatch is returned to the queue automatically by a background lease sweeper.
  3. Complete or fail -- call POST /api/v1/dispatch/{id}/complete when done, or POST /api/v1/dispatch/{id}/fail if something went wrong.

During processing, your worker can send messages, stream tokens, attach files, and create interactive requests using the standard API endpoints.

A successful claim returns 200 with the batch payload. When no work is available, the endpoint long-polls for up to claimWaitMs (default: 10 seconds) before returning 204 No Content. Your worker should loop and call claim again.

For the full batch payload schema and a complete claim loop example, see Building an Agent — The batch payload.

Lease sweeper

A background process periodically checks for batches whose lease has expired (the worker stopped heartbeating or crashed). Expired leases are automatically returned to the queue so another worker can claim them. The sweeper runs every schedulerTickMs (default: 500ms).

Autoscaling

The dispatch system supports autoscaling the number of concurrent workers it expects:

ParameterDefaultDescription
execution.autoscaleEnabledtrueWhether autoscaling is active.
execution.minConcurrency2Minimum concurrent workers.
execution.maxConcurrency10Maximum concurrent workers.
execution.scaleCooldownMs5000Cooldown period (ms) before scaling down.

These settings control how claim-many distributes work. When the queue is deep, Opengram signals workers to scale up. When idle, it scales back down after the cooldown period. The maximum number of batches returned by a single claim-many call is controlled by claim.claimManyLimit (default: 10, hard cap: 50).

Retry behavior

When your worker calls /fail with retryable: true, the dispatch is retried with exponential backoff:

  • Base delay: retryBaseMs (default: 500ms)
  • Maximum delay: retryMaxMs (default: 30s)
  • Maximum attempts: maxAttempts (default: 8)

Your worker controls whether a failure is retryable via the retryable field in the /fail request body. You can also override the next retry delay with retryDelayMs. If retryable is false, or all attempts are exhausted, the dispatch is marked as permanently failed and a system message is posted to the chat visible to the user.

Configuration reference

All dispatch settings live under server.dispatch in opengram.config.json. Every field is optional and falls back to its default.

ParameterDefaultDescription
mode"batched_sequential"Dispatch mode: immediate, sequential, or batched_sequential.
batchDebounceMs600Wait after last message before sealing a batch.
typingGraceMs2000Extra grace period while the user is typing.
maxBatchWaitMs30000Maximum time before forcing a batch to dispatch.
schedulerTickMs500Polling interval for the batch scheduler and lease sweeper.
leaseMs30000Default lease duration for claimed batches.
heartbeatIntervalMs5000Recommended heartbeat interval.
claimWaitMs10000Long-poll timeout for claim requests.
retryBaseMs500Exponential backoff base delay.
retryMaxMs30000Maximum backoff delay.
maxAttempts8Maximum retry attempts before permanent failure.
execution.autoscaleEnabledtrueWhether autoscaling is active.
execution.minConcurrency2Minimum concurrent workers.
execution.maxConcurrency10Maximum concurrent workers.
execution.scaleCooldownMs5000Cooldown before scaling down.
claim.claimManyLimit10Max batches per claim-many call (hard cap: 50).
opengram.config.json
{
  "server": {
    "dispatch": {
      "mode": "batched_sequential",
      "batchDebounceMs": 600,
      "typingGraceMs": 2000,
      "maxBatchWaitMs": 30000,
      "schedulerTickMs": 500,
      "leaseMs": 30000,
      "heartbeatIntervalMs": 5000,
      "claimWaitMs": 10000,
      "retryBaseMs": 500,
      "retryMaxMs": 30000,
      "maxAttempts": 8,
      "execution": {
        "autoscaleEnabled": true,
        "minConcurrency": 2,
        "maxConcurrency": 10,
        "scaleCooldownMs": 5000
      },
      "claim": {
        "claimManyLimit": 10
      }
    }
  }
}

On this page