Cost Tracking
KNOWN_MODELS = {'gpt-4o': (2.5, 10.0), 'gpt-4o-mini': (0.15, 0.6), 'gpt-4-turbo': (10.0, 30.0), 'claude-opus-4-6': (15.0, 75.0), 'claude-sonnet-4-6': (3.0, 15.0), 'claude-haiku-4-5': (0.8, 4.0), 'gemini-1.5-pro': (3.5, 10.5), 'gemini-1.5-flash': (0.35, 1.05)}
module-attribute
estimate_cost_usd(model, input_tokens, output_tokens=0)
Estimate the USD cost for a given model and token counts.
Looks up the model in KNOWN_MODELS and calculates cost based on
per-million-token pricing. Returns 0.0 for unknown models.
ExecutionBudget
Bases: BaseModel
Optional budget constraints for a pipeline run.
All fields are optional — None means no limit on that dimension.
Set max_tokens_total, max_cost_usd, and/or max_latency_ms to
cap resource usage across all nodes.
Example
budget = ExecutionBudget.unlimited()
budget = ExecutionBudget(max_tokens_total=10_000, max_cost_usd=0.50)
unlimited()
classmethod
Return a budget with no constraints on any dimension.
NodeUsage
dataclass
Observed resource usage for a single node execution.
Records tokens_used, cost_usd, and latency_ms for the given
node_id.
CostTracker
Accumulates per-node resource usage and checks it against budgets.
Usage is keyed by node_id and accumulated across multiple calls to
record() for the same node.
Example
tracker = CostTracker()
tracker.record(NodeUsage(node_id="llm", tokens_used=500, cost_usd=0.01, latency_ms=200))
tracker.check_budget(ExecutionBudget(max_tokens_total=1000), "llm")
print(tracker.total_tokens) # 500
total_tokens
property
Total tokens consumed across all nodes.
total_cost_usd
property
Total cost in USD across all nodes.
total_latency_ms
property
Total latency in milliseconds across all nodes.
record(usage)
Record a node execution's resource usage, accumulating with any prior usage for that node.
check_budget(budget, node_id)
Check aggregate usage across all nodes against the budget.
Raises BudgetExceeded if any dimension is exceeded.
summary()
Return a copy of the per-node usage map.
BudgetExceeded
Bases: Exception
Raised when cumulative resource usage exceeds a budget limit.
Carries the dimension ("tokens", "cost", or "latency"), the
limit that was exceeded, the actual cumulative value, and the
node_id that triggered the violation.
ModelHint
dataclass
Hint for model selection on a per-node basis.
Binds a node_id to a preferred_model with an optional
fallback_model and per-node max_tokens limit.
FallbackPolicy
dataclass
Policy that downgrades a model when cumulative spend crosses a budget threshold.
When resolve_model sees that tracker.total_cost_usd / budget.max_cost_usd
has reached at_budget_pct, it swaps preferred_model for fallback_model.
resolve_model(declared_model, tracker, budget, fallbacks)
Pick the effective model by checking fallback policies against current spend.
Iterates fallbacks in order. For each policy whose preferred_model
matches declared_model and whose spend threshold has been reached,
returns fallback_model. If nothing triggers, returns declared_model
unchanged. None always passes through as None.