Broker Election
The Problem
Only one broker should write to the queue state at a time. Multiple active writers would cause constant CAS conflicts and potential data corruption. osqueue solves this with CAS-based leader election.
How Election Works
The broker's address and heartbeat timestamp are stored in queue.json alongside job data:
{
"broker": "0.0.0.0:8080",
"brokerHeartbeat": 1706000000000,
"jobs": []
}
Election Flow
When a broker starts, it calls tryElect():
- Read
queue.jsonfrom storage - If no state exists, create it with this broker as leader (via
createIfNotExists) - If state exists and the registered broker is this broker with a fresh heartbeat →
"already_leader" - If another broker has a fresh heartbeat →
"other_leader"(back off) - If the registered broker's heartbeat is stale (older than
heartbeatTimeoutMs) → attempt takeover via CAS write - If the CAS write succeeds →
"elected" - If the CAS write fails (another broker won the race) →
"conflict"
type ElectionResult =
| { status: "elected" }
| { status: "already_leader" }
| { status: "other_leader"; leader: string }
| { status: "conflict" };
Heartbeat-Based Liveness
The active broker periodically writes a register_broker mutation to update its heartbeat timestamp. The default heartbeat interval is 3 seconds, and the timeout is 10 seconds.
If a broker crashes or becomes unreachable, its heartbeat goes stale. A standby broker detects this and takes over.
Failover Sequence
Time ──────────────────────────────────────────────▶
Broker A: [elected] ──heartbeat──heartbeat── ✖ crash
│
Broker B: [other_leader]─────retry(10s)─────────┤
│
heartbeat stale (>10s) │
▼
Broker B: ────────────────────────────── [elected]
In the default production entrypoint (entrypoint.sh), two brokers run simultaneously on ports 8080 and 8081. One becomes leader; the other retries every 10 seconds via a shell loop (the BrokerElection class itself is single-shot — the retry logic is in the entrypoint script). If the leader dies, the standby takes over within one retry cycle.
Note: if a broker's own heartbeat goes stale (e.g., due to a temporary network partition), it falls through to the takeover logic and re-registers itself rather than returning "already_leader".
Leadership Checks
The active broker also periodically reads state to verify it's still the leader:
const isLeader = await election.checkLeadership();
if (!isLeader) {
console.log("Lost leadership — shutting down");
await server.stop();
}
This prevents split-brain: if a network partition resolves and another broker took over, the old leader self-demotes.