Skip to main content

Daily operations (runbook)

This page describes a practical, restart-safe workflow for running MesoLive automation day-to-day: startup, monitoring, reconnect recovery, and shutdown.

Persisted state (minimum viable)

To make your automation restart-safe, persist:

  • Event cursor: last processed EventSeqId (Event Hub)
  • Idempotency keys: any in-flight / recently submitted Control Hub intents (SendOrder, CancelOrder, ApplyPaperFills, etc.)
  • Async job ids: any in-flight JobId returned by prepare flows (entry/exit/adjustment)
  • Dedupe keys: processed EventId values (signals + updates) for your session window
  • Order/position mapping: your own mapping of (strategy/account/position) → live ids you care about

Startup checklist

Recommended connection order:

  1. Control Hub: connect, then verify agents/accounts/strategies.
  2. Data Hub: connect, then verify streaming data is flowing.
  3. Event Hub: connect, replay history from your persisted cursor, then subscribe for live callbacks.

CLI health checks (examples package)

# Control Hub: verify connected providers/agents
python -m mesolive_sdk_examples.agents_example list

# Control Hub: verify accounts and strategies
python -m mesolive_sdk_examples.accounts_example list --limit 50
python -m mesolive_sdk_examples.strategies_example list --limit 50

# Data Hub: verify streaming quotes work (pick a symbol you have access to)
python -m mesolive_sdk_examples.data_example underlying stream --symbol SPX --seconds 10

# Event Hub: verify callbacks are arriving
python -m mesolive_sdk_examples.events_example dump --seconds 10

Replay + subscribe (Event Hub)

On startup (and after reconnect), do both:

  • Replay from your persisted cursor using GetEventsSince
  • Subscribe to live callbacks (often Strategies=None to subscribe to all)

Your handlers should tolerate overlap between replay results and live callbacks by deduping on EventId.

For background, see:

Monitoring open positions with Data Hub

Streams are best-effort; build monitoring as a loop that can restart:

  • bootstrap open positions by listing strategies and then listing open positions (paged)
  • run one stream task per monitored position (or per account/strategy), with backoff on failures
  • periodically resync from snapshots (Get*Snapshot) to correct drift
  • start/stop monitoring based on OnPositionUpdate callbacks (Created/Closed)

Reconnect recovery

After any disconnect/reconnect, assume you missed events:

  • resubscribe to strategies
  • replay history from the last persisted cursor
  • restart streams and resync with snapshots
  • reconcile in-flight jobs: for any pending JobId, call GetPreparePosition*Status(JobId) and complete the workflow based on the returned status/result

This job-status reconciliation avoids “stuck” workflows where you were waiting for a completion callback that never arrives.

Shutdown checklist

To stop cleanly:

  • stop accepting new signals/work
  • wait for in-flight workflows to complete (or mark them abandoned and rely on idempotency recovery on next start)
  • persist your latest EventSeqId cursor and any dedupe/idempotency state
  • disconnect from hubs (Event Hub → Data Hub → Control Hub)