Dispatch System
How Opengram batches, queues, and delivers messages to your agent workers.
The dispatch system is the bridge between the Opengram UI and your agent. When a user sends a message, the dispatch system collects it, optionally batches it with other pending messages, and makes it available for your worker to claim and process.
Dispatch modes
Configure the mode with the server.dispatch.mode field in opengram.config.json. Three modes are available:
immediate
Each message is dispatched the moment it arrives. No batching, no debouncing. Use this when latency matters more than efficiency, or when your agent processes each message independently.
sequential
Messages are dispatched one at a time in the order they were received. The next message is not dispatched until the current one is completed. This is useful when message order is critical and your agent cannot handle concurrent inputs.
batched_sequential (default)
Messages are collected into batches using a debounce window, then dispatched sequentially. This is the default mode and works well for most use cases -- it handles rapid-fire messages from users without overwhelming your agent.
Batching is controlled by three timing parameters:
| Parameter | Default | Description |
|---|---|---|
batchDebounceMs | 600 | Wait this long after the last message before sealing the batch. |
typingGraceMs | 2000 | Extra grace period if the user is still typing. |
maxBatchWaitMs | 30000 | Maximum time to wait before forcing the batch to dispatch. |
{
"server": {
"dispatch": {
"mode": "batched_sequential",
"batchDebounceMs": 600,
"typingGraceMs": 2000,
"maxBatchWaitMs": 30000
}
}
}Input sources
A dispatch can be triggered by different input sources:
user_message-- the user sent one or more messages in the chat.request_resolved-- the user resolved an interactive request (e.g. answered a question or submitted a form).
Your worker receives the input source in the dispatch payload so it can decide how to handle each case.
Worker claiming flow
Your agent worker pulls dispatches from Opengram using a claim-based protocol:
- Claim -- call
POST /api/v1/dispatch/claimto claim a single dispatch, orPOST /api/v1/dispatch/claim-manyto claim up to N dispatches at once. The response includes the batch payload with all messages, chat context, and anagentIdHintindicating which agent configuration applies. - Heartbeat -- while processing, call
POST /api/v1/dispatch/{id}/heartbeatperiodically to extend the lease. If the lease expires without a heartbeat, the dispatch is returned to the queue automatically by a background lease sweeper. - Complete or fail -- call
POST /api/v1/dispatch/{id}/completewhen done, orPOST /api/v1/dispatch/{id}/failif something went wrong.
During processing, your worker can send messages, stream tokens, attach files, and create interactive requests using the standard API endpoints.
A successful claim returns 200 with the batch payload. When no work is available, the endpoint long-polls for up to claimWaitMs (default: 10 seconds) before returning 204 No Content. Your worker should loop and call claim again.
For the full batch payload schema and a complete claim loop example, see Building an Agent — The batch payload.
Lease sweeper
A background process periodically checks for batches whose lease has expired (the worker stopped heartbeating or crashed). Expired leases are automatically returned to the queue so another worker can claim them. The sweeper runs every schedulerTickMs (default: 500ms).
Autoscaling
The dispatch system supports autoscaling the number of concurrent workers it expects:
| Parameter | Default | Description |
|---|---|---|
execution.autoscaleEnabled | true | Whether autoscaling is active. |
execution.minConcurrency | 2 | Minimum concurrent workers. |
execution.maxConcurrency | 10 | Maximum concurrent workers. |
execution.scaleCooldownMs | 5000 | Cooldown period (ms) before scaling down. |
These settings control how claim-many distributes work. When the queue is deep, Opengram signals workers to scale up. When idle, it scales back down after the cooldown period. The maximum number of batches returned by a single claim-many call is controlled by claim.claimManyLimit (default: 10, hard cap: 50).
Retry behavior
When your worker calls /fail with retryable: true, the dispatch is retried with exponential backoff:
- Base delay:
retryBaseMs(default: 500ms) - Maximum delay:
retryMaxMs(default: 30s) - Maximum attempts:
maxAttempts(default: 8)
Your worker controls whether a failure is retryable via the retryable field in the /fail request body. You can also override the next retry delay with retryDelayMs. If retryable is false, or all attempts are exhausted, the dispatch is marked as permanently failed and a system message is posted to the chat visible to the user.
Configuration reference
All dispatch settings live under server.dispatch in opengram.config.json. Every field is optional and falls back to its default.
| Parameter | Default | Description |
|---|---|---|
mode | "batched_sequential" | Dispatch mode: immediate, sequential, or batched_sequential. |
batchDebounceMs | 600 | Wait after last message before sealing a batch. |
typingGraceMs | 2000 | Extra grace period while the user is typing. |
maxBatchWaitMs | 30000 | Maximum time before forcing a batch to dispatch. |
schedulerTickMs | 500 | Polling interval for the batch scheduler and lease sweeper. |
leaseMs | 30000 | Default lease duration for claimed batches. |
heartbeatIntervalMs | 5000 | Recommended heartbeat interval. |
claimWaitMs | 10000 | Long-poll timeout for claim requests. |
retryBaseMs | 500 | Exponential backoff base delay. |
retryMaxMs | 30000 | Maximum backoff delay. |
maxAttempts | 8 | Maximum retry attempts before permanent failure. |
execution.autoscaleEnabled | true | Whether autoscaling is active. |
execution.minConcurrency | 2 | Minimum concurrent workers. |
execution.maxConcurrency | 10 | Maximum concurrent workers. |
execution.scaleCooldownMs | 5000 | Cooldown before scaling down. |
claim.claimManyLimit | 10 | Max batches per claim-many call (hard cap: 50). |
{
"server": {
"dispatch": {
"mode": "batched_sequential",
"batchDebounceMs": 600,
"typingGraceMs": 2000,
"maxBatchWaitMs": 30000,
"schedulerTickMs": 500,
"leaseMs": 30000,
"heartbeatIntervalMs": 5000,
"claimWaitMs": 10000,
"retryBaseMs": 500,
"retryMaxMs": 30000,
"maxAttempts": 8,
"execution": {
"autoscaleEnabled": true,
"minConcurrency": 2,
"maxConcurrency": 10,
"scaleCooldownMs": 5000
},
"claim": {
"claimManyLimit": 10
}
}
}
}