Skip to content

Cost Tracking

KNOWN_MODELS = {'gpt-4o': (2.5, 10.0), 'gpt-4o-mini': (0.15, 0.6), 'gpt-4-turbo': (10.0, 30.0), 'claude-opus-4-6': (15.0, 75.0), 'claude-sonnet-4-6': (3.0, 15.0), 'claude-haiku-4-5': (0.8, 4.0), 'gemini-1.5-pro': (3.5, 10.5), 'gemini-1.5-flash': (0.35, 1.05)} module-attribute

estimate_cost_usd(model, input_tokens, output_tokens=0)

Estimate the USD cost for a given model and token counts.

Looks up the model in KNOWN_MODELS and calculates cost based on per-million-token pricing. Returns 0.0 for unknown models.

ExecutionBudget

Bases: BaseModel

Optional budget constraints for a pipeline run.

All fields are optional — None means no limit on that dimension. Set max_tokens_total, max_cost_usd, and/or max_latency_ms to cap resource usage across all nodes.

Example

budget = ExecutionBudget.unlimited()
budget = ExecutionBudget(max_tokens_total=10_000, max_cost_usd=0.50)

unlimited() classmethod

Return a budget with no constraints on any dimension.

NodeUsage dataclass

Observed resource usage for a single node execution.

Records tokens_used, cost_usd, and latency_ms for the given node_id.

CostTracker

Accumulates per-node resource usage and checks it against budgets.

Usage is keyed by node_id and accumulated across multiple calls to record() for the same node.

Example

tracker = CostTracker()
tracker.record(NodeUsage(node_id="llm", tokens_used=500, cost_usd=0.01, latency_ms=200))
tracker.check_budget(ExecutionBudget(max_tokens_total=1000), "llm")
print(tracker.total_tokens)  # 500

total_tokens property

Total tokens consumed across all nodes.

total_cost_usd property

Total cost in USD across all nodes.

total_latency_ms property

Total latency in milliseconds across all nodes.

record(usage)

Record a node execution's resource usage, accumulating with any prior usage for that node.

check_budget(budget, node_id)

Check aggregate usage across all nodes against the budget.

Raises BudgetExceeded if any dimension is exceeded.

summary()

Return a copy of the per-node usage map.

BudgetExceeded

Bases: Exception

Raised when cumulative resource usage exceeds a budget limit.

Carries the dimension ("tokens", "cost", or "latency"), the limit that was exceeded, the actual cumulative value, and the node_id that triggered the violation.

ModelHint dataclass

Hint for model selection on a per-node basis.

Binds a node_id to a preferred_model with an optional fallback_model and per-node max_tokens limit.

FallbackPolicy dataclass

Policy that downgrades a model when cumulative spend crosses a budget threshold.

When resolve_model sees that tracker.total_cost_usd / budget.max_cost_usd has reached at_budget_pct, it swaps preferred_model for fallback_model.

resolve_model(declared_model, tracker, budget, fallbacks)

Pick the effective model by checking fallback policies against current spend.

Iterates fallbacks in order. For each policy whose preferred_model matches declared_model and whose spend threshold has been reached, returns fallback_model. If nothing triggers, returns declared_model unchanged. None always passes through as None.