from runloop_api_client import Runloop
rl = Runloop( bearer_token="YOUR API KEY"
)
#start the run
benchmark_run_view = rl.benchmarks.start_run(
    benchmark_id="bmd_2zmp3Mu3LhWu7yDVIfq3m",
)
# complete the run
benchmark_run_view = rl.benchmarks.runs.complete(
    "bmd_2zmp3Mu3LhWu7yDVIfq3m",
)
//
Benchmark Your AI Coding Agents

Public Benchmarks

Validate your AI coding agents against industry-standard performance tests with on-demand access to comprehensive benchmark suites. Get standardized metrics, performance tracking, and transparent scoring across all benchmark types.

public-hero
//
Streamlined AI Evaluation

Everything You Need to Know

Runloop transforms complex, resource-intensive AI agent evaluation into an accessible solution with standardized metrics and transparent scoring. Our platform integrates seamlessly with existing infrastructure, automatically allocating compute resources and test environments within secure, isolated containers. This reduces both time and cost while supporting iterative improvement cycles for development teams of all sizes.

//
get in touch

Get Started with Public Benchmarks

Ready to validate your AI coding agents? Contact our team to learn more about Runloop's Public Benchmarks platform and get started with industry-standard testing.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.