Skip to content

Failures

FailureClass

Bases: StrEnum

Classification of a pipeline failure for deciding retry behavior.

  • RECOVERABLE: The failure is transient and the node should be retried.
  • TERMINAL: The failure is permanent and the pipeline should stop.
  • AMBIGUOUS: The failure may or may not be recoverable; limited retries are attempted.

FailurePolicy

Bases: BaseModel

Retry behavior for a given failure class.

Controls max_retries and backoff_seconds (jittered).

Example

policy = FailurePolicy(max_retries=5, backoff_seconds=2.0)
policy = FailurePolicy(max_retries=0, backoff_seconds=0.0)

NodeContext dataclass

Context passed to failure classifiers for making classification decisions.

Carries the node_id that raised, the 1-based attempt number, and the run_id of the pipeline.

classify(exc, context, classifiers=None)

Determine the failure class of an exception.

Custom classifiers are checked first in order. If none return a result, built-in rules apply:

  • ContractViolationTERMINAL
  • BudgetExceededRECOVERABLE
  • TimeoutErrorRECOVERABLE
  • ValueErrorAMBIGUOUS
  • Everything else → AMBIGUOUS

RetryBudget

Tracks retry counts per (run_id, node_id) pair and enforces limits.

increment(run_id, node_id)

Increment and return the retry count for the given run/node pair.

exhausted(run_id, node_id, policy)

Return True if the retry count has reached the policy's max_retries.