Skip to content

Feat/cross rate early exit#605

Open
ushaket wants to merge 7 commits intovllm-project:mainfrom
ushaket:feat/cross-rate-early-exit
Open

Feat/cross rate early exit#605
ushaket wants to merge 7 commits intovllm-project:mainfrom
ushaket:feat/cross-rate-early-exit

Conversation

@ushaket
Copy link
Contributor

@ushaket ushaket commented Feb 23, 2026

Summary

This PR adds cross-rate early-exit behavior for multi-rate benchmark profiles so benchmarks stop escalating once a terminal failure condition is hit at a lower rate/stream. It also makes rate/stream ordering deterministic (ascending) and updates user-facing docs/help text to reflect multi-rate semantics and skip behavior.

Details

  • Added shared failure-check logic in profile flow to detect terminal scheduler constraints (request_processing=stop_all) from the previous benchmark state.
  • Updated AsyncProfile.next_strategy() to stop scheduling higher rates after a terminal failure; continue normally on stop_local.
  • Updated ConcurrentProfile.next_strategy() with the same early-exit behavior for stream escalation.
  • Updated SweepProfile.next_strategy() so synchronous and throughput always run, with early-exit applied only during async-rate continuation.
  • Sorted multi-value rates/streams ascending in argument resolution for deterministic progression; added warning logs when input order is changed.
  • Updated CLI help and README to document per-profile --rate semantics, multi-value behavior, and failure-triggered skipping.
  • Added unit tests covering:
    • sorting behavior
    • continuation on normal completion (stop_local)
    • early exit on terminal failures (stop_all)
    • sweep-specific phase behavior and edge cases

Test Plan

  • Run unit tests for benchmark profiles:
    • pytest tests/unit/benchmark/test_profiles.py
  • Sanity-check existing benchmark profile tests still pass:
    • pytest tests/unit/benchmark -k profile
  • Run a multi-rate benchmark and verify higher rates are skipped after a terminal failure:
    • guidellm benchmark ... --profile constant --rate 1 --rate 5 --rate 10

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from 0d2a434 to 0698383 Compare February 23, 2026 14:27
@sjmonson sjmonson self-requested a review February 23, 2026 18:37
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from 6d7b4d0 to 2719655 Compare February 23, 2026 20:30
When running multiple rates (constant, poisson, concurrent profiles) or
sweeping, stop escalating to higher rates if a failure constraint
(over-saturation, max errors, error rate) triggers at a lower rate.

- Sort rates/streams ascending in AsyncProfile and ConcurrentProfile
- Add _should_stop_escalating() on base Profile class using stop_all
  as the failure signal (vs stop_local for normal completions)
- Skip failure check after throughput phase in SweepProfile since
  over-saturation is expected at maximum load
- Log warning when rate order is changed by sorting
- Update CLI help and README with multi-rate documentation
- Add comprehensive unit tests for all profile types

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from 2719655 to 4ef5ffb Compare February 24, 2026 13:36
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
This reverts commit 2538f49.

Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from ed54a85 to 6ffc31d Compare February 24, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop multi-rate benchmarks after first failure threshold

1 participant