Skip to content

Workflows

Workflows are Flo’s durable orchestration layer. They compose Actions into multi-step business processes defined in YAML, with built-in resilience: retries, circuit breakers, health-weighted routing, signal handling, timeouts, polling, cron scheduling, stream triggers, and idempotency.

A workflow definition is a directed graph of steps. Each step either runs a target (an action, an inline plan, or a child workflow) or waits for an external signal. Every step declares transitions that map outcomes to the next step or to a terminal state. The engine walks the graph from the start step until it reaches a terminal.

Client ──start──▸ validate ──success──▸ charge ──success──▸ ship ──success──▸ flo.Completed
│ │ │
failure failure failure
▼ ▼ ▼
flo.Failed PaymentFailed flo.Failed
ConcernWithout Flo WorkflowsWith Flo Workflows
StateYour app manages a state machine in an external DB, guarding against crashesFlo persists every state transition to the UAL. Recovery replays the log.
RetriesHand-written retry loops with ad-hoc backoffDeclarative retry: block with exponential, linear, constant, or jittered backoff
FailoverIf the orchestrating process dies, work stallsAnother node picks up from the last committed state
RoutingManual circuit breaking, health scoring, fallback chainsInline plans with health-weighted selection, circuit breakers, and fallback values
Human-in-the-loopPolling, webhooks, custom queueswaitForSignal with typed signals, timeouts, and approval flows
SchedulingExternal cron daemonEmbedded schedule: block with 5-field cron or interval
Event-drivenWiring consumer groups to trigger logictrigger: block listens on a stream and starts runs per event

1. Define the workflow in a YAML file:

kind: Workflow
name: process-order
version: "1.0.0"
idempotency: required
start:
run: "@actions/validate-order"
transitions:
success: charge
failure: flo.Failed
steps:
charge:
run: "@actions/charge-payment"
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 1000
max_delay_ms: 30000
transitions:
success: ship
failure: flo.Failed
ship:
run: "@actions/create-shipment"
transitions:
success: flo.Completed
failure: flo.Failed

2. Register the actions the workflow calls:

Terminal window
flo action register validate-order --wasm ./validate.wasm
flo action register charge-payment --wasm ./charge.wasm
flo action register create-shipment --wasm ./ship.wasm

3. Deploy the workflow:

Terminal window
flo workflow create -f process-order.yaml

4. Start a run:

Terminal window
flo workflow start process-order '{"order_id": "ORD-123", "amount": 99.99}'
# → wfrun-1

5. Check status:

Terminal window
flo workflow status wfrun-1
# → {"run_id":"wfrun-1","workflow":"process-order","version":"1.0.0","status":"completed",...}

Every workflow YAML file has this top-level structure:

FieldRequiredDescription
kindyesMust be Workflow
nameyesUnique workflow name
versionyesArbitrary version string (clients can request specific versions)
idempotencynonone (default), optional, or required
startyesThe entry-point step
stepsnoNamed steps (map of step name → step definition)
terminalsnoCustom terminal states (map of name → {status: ...})
plansnoInline plan definitions (map of plan name → plan config)
schedulenoCron or interval schedule for automatic runs
triggernoStream trigger — starts a run per event
searchAttributesnoCustom queryable fields extracted from input

A step is the smallest unit of work. There are two kinds:

Executes a target and transitions based on the outcome:

charge:
run: "@actions/charge-payment" # target
inputMapping: '{"amount": "$.input.amount"}' # optional JSONPath transform
retry: # optional retry policy
max_attempts: 3
backoff: exponential
poll: # optional pending-outcome polling
maxAttempts: 10
backoff: exponential
transitions:
success: ship
failure: flo.Failed
pending: check_status # only if poll is configured
target_not_found: flo.Failed # execution-level outcome

Targets use a prefix to indicate what to invoke:

PrefixInvokesExample
@actions/A registered action (WASM or user-hosted)@actions/charge-stripe
@plan/An inline plan defined in the same workflow@plan/payment
@workflow/A child workflow (starts a nested run)@workflow/fulfillment

Pauses the workflow until an external signal arrives or a timeout expires:

await_approval:
waitForSignal:
type: approval # signal type to match
timeoutMs: 3600000 # 1 hour timeout (optional)
onTimeout: flo.Failed # transition on timeout (optional)
transitions:
success: fulfill # after signal received

When a signal of the matching type is delivered, the engine follows the success transition. If timeoutMs is configured and the timeout expires before a signal arrives, the engine either follows onTimeout (if set) or transitions the run to timed_out.

Each step produces an outcome string that maps to a transition target.

Business outcomes (from your action’s return value):

OutcomeMeaning
successAction completed successfully
failureAction reported a failure
timeoutAction timed out
pendingAction is still running (async/polling)

Execution-level outcomes (from the runtime, before your logic runs):

OutcomeMeaning
target_not_foundThe action or plan doesn’t exist
target_disabledThe action is disabled
execution_failureInternal error during dispatch

Execution-level outcomes cascade through a fallback chain: the engine tries target_not_foundexecution_failurefailure in order, using the first transition that matches. Business outcomes match exactly — no fallback.

Transitions map outcomes to the next step or to a terminal:

transitions:
success: next_step # go to named step
failure: flo.Failed # go to built-in terminal
timeout: PaymentFailed # go to custom terminal

Terminals end the workflow run. Four are built-in:

TerminalStatusDescription
flo.CompletedcompletedWorkflow succeeded
flo.FailedfailedWorkflow failed
flo.CancelledcancelledCancelled by user
flo.TimedOuttimed_outSignal timeout

You can define custom terminals that map to a base status:

terminals:
PaymentFailed:
status: failed
FraudDetected:
status: failed
OrderCompleted:
status: completed

Custom terminals appear in history events, giving you finer-grained tracking than the four base statuses.

A workflow run passes through these states:

pending → running ⇄ waiting → completed
→ failed
→ cancelled
→ timed_out
StatusDescription
pendingCreated but not yet started
runningActively executing steps
waitingBlocked on a signal, async action, or poll timer
completedReached a terminal with completed status
failedReached a terminal with failed status
cancelledCancelled by a user via flo workflow cancel
timed_outSignal wait timed out with no timeout target

Any run step can have a retry: block:

start:
run: "@actions/flaky-service"
retry:
max_attempts: 5 # total attempts including first
backoff: exponential_jitter # constant | linear | exponential | exponential_jitter
initial_delay_ms: 500 # delay before first retry
max_delay_ms: 30000 # cap on backoff delay
within_ms: 120000 # total time budget (optional)
transitions:
success: flo.Completed
failure: flo.Failed

When the step outcome is failure or execution_failure and retries remain, the engine re-executes the same step immediately (the retry counter increments). Retries reset when the step transitions to a different step.

StrategyDelay formula
constantinitial_delay_ms always
linearinitial_delay_ms × (attempt + 1)
exponentialinitial_delay_ms × 2^attempt, capped at max_delay_ms
exponential_jitterExponential + up to 25% random jitter

Plans let you define multiple executors for the same logical step — a failover chain with smart routing. Plans are defined inline in the workflow YAML under the plans: key.

plans:
payment:
selection: health-weighted
executors:
- name: stripe-primary
action: "@actions/charge-stripe"
priority: 100
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 1000
max_delay_ms: 30000
breaker:
failure_threshold: 5
cooldown_ms: 30000
half_open_max_calls: 3
rate_limit:
max_per_second: 100
max_per_minute: 5000
tracking:
mode: async
timeout_ms: 300000
- name: adyen-fallback
action: "@actions/charge-adyen"
priority: 50
retry:
max_attempts: 2
backoff: exponential_jitter
initial_delay_ms: 500
max_delay_ms: 10000
health:
window_ms: 300000
decay: 0.9
min_samples: 10
cache:
ttl_ms: 3600000
key: "charge:{input.customer_id}:{input.idempotency_key}"
invalidate_on: ["payment.refunded", "customer.deleted"]
fallback:
value: '{"status": "declined", "reason": "all_providers_unavailable"}'
condition: exhausted
errors:
retryable: ["timeout", "rate_limited", "temporary_failure"]
fatal: ["invalid_card", "fraud_detected", "insufficient_funds"]

Reference the plan from a step:

start:
run: "@plan/payment"
transitions:
success: notify
failure: flo.Failed
StrategyBehavior
static-orderAlways try executors in priority order (highest first)
round-robinRotate the starting executor across requests
randomRandom selection for even load distribution
health-weightedPrefer executors with higher health scores; requires health: config

Each executor can have an independent circuit breaker:

breaker:
failure_threshold: 5 # consecutive failures before opening
cooldown_ms: 30000 # how long to stay open before half-open
half_open_max_calls: 3 # probe requests allowed in half-open state

States:

  • Closed — Normal operation, requests flow through
  • Open — Executor is skipped (after failure_threshold consecutive failures)
  • Half-open — After cooldown_ms, allows half_open_max_calls probe requests. If probes succeed → closed. If probes fail → open again.
closed ──(N consecutive failures)──▸ open ──(cooldown expires)──▸ half_open
▲ │
└──────────(probe succeeds)───────────────────────────────────────┘
open ◂──────────(probe fails)─────────────────────────────────────┘

When a circuit breaker is open, the engine emits a plan_breaker_skip history event and moves to the next executor without attempting the action. This means a failing executor is taken out of rotation within seconds rather than consuming retries on every request.

rate_limit:
max_per_second: 100
max_per_minute: 5000
max_per_hour: 50000 # all three are optional

Some executors (e.g., user-hosted HTTP actions) report their outcome asynchronously via webhook:

tracking:
mode: async # sync (default) | async
timeout_ms: 300000 # timeout for async outcome

When mode: async, the workflow parks in waiting until the action result arrives or the timeout expires.

When selection: health-weighted, the engine tracks per-executor success rates and uses them to prefer healthier executors:

health:
window_ms: 300000 # rolling window for health calculation
decay: 0.9 # exponential decay factor (0–1]
min_samples: 10 # minimum samples before health-weighted kicks in

Until min_samples is reached, the engine falls back to priority-based ordering.

Health Score (0.0–1.0) is computed as:

score = (api_success_rate × 0.7) + (business_success_rate × 0.3)

With circuit breaker penalties:

  • Open breaker → score × 0.1
  • Half-open breaker → score × 0.5

At each plan invocation, executors are sorted by descending health score and tried in that order. A healthy executor with no failures scores 1.0; an executor with an open breaker scores ≤ 0.1.

Health state is in-memory only — it resets to defaults on server restart. This is intentional:

  • Fast convergence: With a failure_threshold of 5, the engine rediscovers a broken executor within a handful of requests.
  • No stale data: An executor that was down before restart may have recovered. Starting from a clean slate avoids unnecessary penalization.
  • Restart = recovery: Every executor gets a fresh chance. This matches how circuit breaker libraries like Hystrix, resilience4j, and Polly behave.

Health metrics are tracked per executor and updated on every action completion (synchronous or async). The engine emits history events for observability:

EventMeaning
plan_executor_startBeginning an attempt on this executor
plan_executor_successExecutor returned success
plan_executor_retryRetrying the same executor
plan_executor_exhaustedExecutor’s retries exhausted, moving to next
plan_breaker_skipSkipped executor because circuit breaker is open
cache:
ttl_ms: 3600000 # cache TTL
key: "charge:{input.customer_id}:{input.idempotency_key}"
invalidate_on:
- payment.refunded
- customer.deleted

When all executors fail, the plan can return a static fallback:

fallback:
value: '{"status": "declined", "reason": "all_providers_unavailable"}'
condition: exhausted # exhausted (all executors failed) | any_error

Classify error codes to control retry behavior:

errors:
retryable:
- timeout
- rate_limited
- temporary_failure
fatal:
- invalid_card
- fraud_detected

Retryable errors trigger the executor’s retry policy. Fatal errors skip retries and immediately try the next executor (or return failure).


Signals are typed external events delivered to a running workflow. They’re used for human-in-the-loop approvals, webhooks, callbacks, or any scenario where the workflow must wait for an asynchronous external event.

steps:
await_approval:
waitForSignal:
type: manager_approval # signal type to match
timeoutMs: 86400000 # 24 hour timeout
onTimeout: OrderRejected # step or terminal on timeout
transitions:
success: fulfill_order # after signal received
Terminal window
flo workflow signal <run-id> --type manager_approval '{"decision": "approved"}'

When the signal arrives:

  1. It’s stored in the run’s signal history
  2. If the run is waiting for this signal type, the engine resumes and follows the success transition
  3. If the run is not waiting, the signal is stored for future matching

If timeoutMs is set and the timeout expires:

  • If onTimeout is set → transition to that step or terminal
  • If onTimeout is not set → the run transitions to timed_out status

The engine periodically checks waiting runs for timeouts and resumes them automatically.


When an action returns a pending outcome (e.g., a payment that’s still processing), you can configure the step to poll with backoff until a terminal outcome arrives:

steps:
check_status:
run: "@actions/check-payment-status"
poll:
initialDelayMs: 2000 # wait before first poll
maxAttempts: 30 # max poll attempts
backoff: exponential # constant | linear | exponential | exponential_jitter
baseDelayMs: 2000 # base delay between polls
maxDelayMs: 30000 # cap on poll delay
transitions:
success: notify
failure: payment_failed
timeout: payment_timeout # max attempts exceeded

Each poll re-executes the action. If the outcome is still pending, the engine parks the run and schedules the next poll. If the outcome becomes success or failure, the engine follows the corresponding transition.


Steps can transform their input using JSONPath expressions. This lets you wire data from the workflow input or from previous step outputs into the current step’s input.

steps:
enrich:
run: "@actions/enrich-customer"
inputMapping: '{"customer_id": "$.input.customer_id", "domain": "$.input.company_domain"}'
transitions:
success: charge
failure: flo.Failed
charge:
run: "@actions/charge-payment"
inputMapping: '{"email": "$.steps.enrich.output.email", "amount": "$.input.amount"}'
transitions:
success: flo.Completed
failure: flo.Failed
Path PatternResolves To
$.inputThe workflow run’s input JSON
$.input.field.subfieldA nested field from the input
$.steps.{name}.outputThe full output of a previously completed step
$.steps.{name}.output.fieldA nested field from a step’s output
$.steps.{name}.outcomeThe outcome string of a step (success, failure, etc.)
$.flo.run_idThe current workflow run ID
$.flo.timestampCurrent epoch timestamp in milliseconds

String values in the input mapping that start with $. are treated as path references and resolved at runtime. Non-path values are passed through as-is.


Workflows support idempotency keys to prevent duplicate processing:

kind: Workflow
name: process-order
version: "1.0.0"
idempotency: required # none | optional | required
ModeBehavior
noneNo idempotency checking (default)
optionalIdempotency key accepted but not required
requiredEvery workflow start must include an idempotency key

When an idempotency key is provided and a run with the same key already exists for this workflow, the existing run ID is returned instead of creating a new run.

Terminal window
flo workflow start process-order '{"order_id":"ORD-123"}' --idempotency-key order-123
# → wfrun-1
flo workflow start process-order '{"order_id":"ORD-123"}' --idempotency-key order-123
# → wfrun-1 (same run, no duplicate)

Workflows can run on a recurring schedule using cron expressions or fixed intervals:

kind: Workflow
name: reconcile-accounts
version: "1.0.0"
schedule:
cron: "0 */6 * * *" # every 6 hours
max_concurrent: 1 # at most 1 run at a time
input: '{"mode": "full"}' # input for scheduled runs
start:
run: "@actions/reconcile"
transitions:
success: generate_report
failure: flo.Failed
steps:
generate_report:
run: "@actions/reconcile-report"
transitions:
success: flo.Completed
failure: flo.Failed
schedule:
interval: 30000 # every 30 seconds
max_concurrent: 1
input: '{"mode": "incremental"}'

Standard 5-field cron: minute hour day-of-month month day-of-week

FieldRangeSpecial Characters
Minute0–59* , - /
Hour0–23* , - /
Day of month1–31* , - /
Month1–12* , - /
Day of week0–7 (0 and 7 = Sunday)* , - /

Examples:

ExpressionMeaning
*/5 * * * *Every 5 minutes
0 */6 * * *Every 6 hours
0 9 * * 1-59 AM weekdays
0 0 1 * *Midnight on the 1st of each month
30 2 * * 02:30 AM every Sunday
FieldDefaultDescription
cronCron expression (mutually exclusive with interval)
intervalInterval in milliseconds (mutually exclusive with cron)
max_concurrent1Maximum concurrent runs from this schedule
input"{}"Input JSON override for scheduled runs
pausedfalseWhether the schedule starts paused

Disabling a workflow (flo workflow disable) pauses the schedule. Re-enabling resumes it.


Stream triggers start a workflow run for each event on a Flo stream. This turns a workflow into an event-driven processor.

kind: Workflow
name: order-processor
version: "1.0.0"
trigger:
stream: orders # source stream name
namespace: prod # source namespace (optional)
consumer_group: wf-orders # consumer group (optional, auto-generated)
mode: shared # shared | exclusive | key_shared
batch_size: 1 # events per run (1 = single, >1 = array)
start:
run: "@actions/process-order"
transitions:
success: flo.Completed
failure: flo.Failed

The event payload becomes $.input inside the workflow and is accessible via JSONPath.

FieldDefaultDescription
streamSource stream name (required)
namespaceworkflow’s namespaceSource stream namespace
consumer_groupwf-{workflow_name}Consumer group name
modesharedshared (competing consumers), exclusive (single consumer), key_shared (partition-key affinity)
batch_size1Events per workflow run

Search attributes let you extract queryable fields from workflow input for filtering and discovery:

searchAttributes:
- name: customer_id
type: string
from: input.customer_id
- name: order_amount
type: number
from: input.amount
- name: created_at
type: timestamp
from: input.timestamp
TypeDescription
stringText value
numberNumeric value
timestampEpoch timestamp

A step can start a child workflow using the @workflow/ prefix:

start:
run: "@workflow/payment-flow"
transitions:
success: flo.Completed
failure: flo.Failed

The parent workflow waits for the child to reach a terminal state, then follows the corresponding transition. The child’s completion maps to success; the child’s failure maps to failure.

You can also specify a version:

start:
run: "@workflow/payment-flow:2.0.0"
transitions:
success: flo.Completed
failure: flo.Failed

Workflows can be disabled at runtime to prevent new runs from starting:

Terminal window
flo workflow disable reconcile-accounts

When disabled:

  • The cron schedule is paused
  • Manual starts via flo workflow start are blocked
  • Existing running instances are not affected

To re-enable:

Terminal window
flo workflow enable reconcile-accounts

Workflow definitions are validated on create. The validator checks:

CategoryChecks
Structurekind, name, version, and start are present
ReferencesAll @actions/*, @plan/*, and @workflow/* targets are well-formed
TransitionsEvery transition target is a valid step name, custom terminal, or built-in terminal
ReachabilityAll steps are reachable from start (warnings for unreachable steps)
DuplicatesNo duplicate step names, terminal names, or executor names within a plan
PlansEach plan has at least one executor; executor configs are valid
HealthDecay in (0, 1], min_samples > 0, window_ms > 0
SignalswaitForSignal steps warn if no timeout is configured

Validation errors are reported with error codes (E1xx–E5xx) and human-readable messages.


  1. The engine starts at the start step
  2. For run steps:
    • If inputMapping is set, resolve JSONPath references against $.input and $.steps.*
    • Invoke the target (action, plan, or child workflow)
    • WASM actions complete synchronously; user-hosted actions may complete asynchronously
    • If the action returns pending and poll: is configured, schedule the next poll
    • On outcome, check retries (if failure and retries remain, re-execute)
    • Follow the matching transition to the next step or terminal
  3. For waitForSignal steps:
    • Check if a matching signal was already received before parking
    • If yes, follow the success transition immediately
    • If no, set status to waiting and record the timeout deadline
  4. Repeat until a terminal state is reached or MAX_ADVANCE_STEPS (256) is hit

When a user-hosted action doesn’t complete immediately, the workflow parks:

  1. Run status → waiting
  2. The action run ID and step name are stored
  3. The engine periodically checks (checkPendingActions) for completed action results
  4. When the action completes, the engine resumes the workflow from that step and continues advancing

Every significant state change is recorded as a history event:

Event TypeDetail
workflow_startedInput JSON
step_startedStep name
step_completedStep name
step_retryStep name
waiting_for_signalSignal type
signal_receivedSignal type
signal_matchedSignal type
signal_timeoutTimeout target
action_not_foundAction name
action_disabledAction name
awaiting_actionAction name
action_completedOutcome
workflow_completedTerminal name
workflow_failedTerminal name
workflow_cancelledReason
workflow_timed_outDetail

Workflow definitions and run state are persisted to the Unified Append Log (UAL). On node restart, the engine replays persisted entries to rebuild in-memory state:

  • workflow_create entries restore the definition registry
  • workflow_start entries restore the run registry

This ensures workflows survive node restarts without external databases.


This example demonstrates inline plans, health-weighted routing, circuit breakers, polling, and caching:

kind: Workflow
name: payment-processing
version: "1.0.0"
idempotency: required
plans:
charge-payment:
selection: health-weighted
executors:
- name: stripe-primary
action: "@actions/charge-stripe"
priority: 100
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 1000
max_delay_ms: 30000
within_ms: 120000
breaker:
failure_threshold: 5
cooldown_ms: 30000
half_open_max_calls: 3
rate_limit:
max_per_second: 100
tracking:
mode: async
timeout_ms: 300000
- name: adyen-fallback
action: "@actions/charge-adyen"
priority: 50
retry:
max_attempts: 2
backoff: exponential_jitter
health:
window_ms: 300000
decay: 0.9
min_samples: 10
cache:
ttl_ms: 3600000
key: "charge:{input.customer_id}:{input.idempotency_key}"
invalidate_on: ["payment.refunded"]
fallback:
value: '{"status": "declined", "reason": "all_providers_unavailable"}'
condition: exhausted
errors:
retryable: ["timeout", "rate_limited"]
fatal: ["invalid_card", "fraud_detected"]
start:
run: "@plan/charge-payment"
inputMapping: '{"customer_id": "$.input.customer_id", "amount": "$.input.amount"}'
transitions:
success: notify_customer
pending: check_payment_status
failure: flo.Failed
steps:
check_payment_status:
run: "@actions/check-payment-status"
inputMapping: '{"payment_id": "$.steps._start.output.payment_id"}'
poll:
initialDelayMs: 2000
maxAttempts: 30
backoff: exponential
baseDelayMs: 2000
maxDelayMs: 30000
transitions:
success: notify_customer
failure: flo.Failed
notify_customer:
run: "@actions/send-notification"
inputMapping: '{"customer_id": "$.input.customer_id", "template": "payment_success"}'
transitions:
success: flo.Completed
failure: flo.Completed # notification failure doesn't fail the workflow

Complete Example: Order Processing with Signals

Section titled “Complete Example: Order Processing with Signals”
kind: Workflow
name: order-processing
version: "1.0.0"
idempotency: required
terminals:
OrderCompleted:
status: completed
OrderRejected:
status: failed
PaymentFailed:
status: failed
start:
run: "@actions/validate-order"
transitions:
success: process_payment
failure: flo.Failed
steps:
process_payment:
run: "@actions/charge-payment"
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 1000
transitions:
success: check_approval
failure: PaymentFailed
check_approval:
run: "@actions/check-approval-needed"
transitions:
success: fulfill # no approval needed
failure: await_approval # high-value order, needs approval
await_approval:
waitForSignal:
type: approval
timeoutMs: 3600000 # 1 hour
onTimeout: OrderRejected
transitions:
success: fulfill
fulfill:
run: "@actions/ship-order"
transitions:
success: OrderCompleted
failure: flo.Failed

Trigger the approval:

Terminal window
flo workflow signal wfrun-42 --type approval '{"decision": "approved", "approver": "manager@co.com"}'

  • Actions — Actions are the building blocks that workflows compose
  • Stream Processing — For continuous, stateless data pipelines (filter/map/aggregate)
  • Streams — Source data for stream triggers
  • KV Store — Used by actions for state lookups