Benchmark proof

Paid API Benchmark Scoring

The Paid API Benchmark Lab is designed for agents and crawlers that need comparable evidence before spending. Scores are built from fixtures, unpaid x402 challenge metadata, schema evidence, and saved readiness reports when Ontario has them.

Scoring Weights

CategoryPoints
Uptime20
X402 Payment Correctness25
Schema Quality15
Price Clarity15
Network Asset Clarity10
Report History15

Safety policy: no signed payment headers, no paid x402 settlement calls, and no facilitator settle call.

No paid calls are made

Benchmark scoring is a pre-payment evaluation. It does not sign a payment header, submit a payment payload, or call a facilitator settle route.

Evidence: The benchmark payload exposes safety.paid_settlement_calls_made=false and safety.facilitator_settle_called=false.

Fix: Use the benchmark to shortlist endpoints, then run a separate controlled settlement test only when your own policy allows spending.

Readiness reports are reused first

If Ontario already has saved x402 readiness reports for an endpoint or origin, the benchmark uses that history for report-derived signals. Fixture reports are used when no saved history exists.

Evidence: Each benchmark row includes report_history.source, actual_report_count, fixture_report_count, and matching_strategy.

Fix: Generate a fresh readiness report with /api/verify/x402-readiness to replace fixture-only evidence for a real endpoint.

Scores are decomposed

Every total score includes a category-level breakdown, so agents can distinguish a cheap endpoint with weak schema from an expensive endpoint with strong report history.

Evidence: The score_breakdown object lists category scores, maximum points, explanations, and the signals behind each category.

Fix: Improve the weakest category first: publish clearer price fields, fix network or asset ambiguity, add OpenAPI, or keep report history fresh.

Sources

Open the benchmark lab