Skip to content

Fix worker status for unusable terminal backend responses#615

Open
ushaket wants to merge 6 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response
Open

Fix worker status for unusable terminal backend responses#615
ushaket wants to merge 6 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response

Conversation

@ushaket
Copy link
Contributor

@ushaket ushaket commented Feb 26, 2026

Summary

This PR fixes a scheduler correctness bug where requests could be marked as completed even when the backend resolved without a usable terminal response. It adds explicit terminal-response validation in the worker so malformed/empty terminal results are surfaced as errored with a clear diagnostic instead of being counted as successful requests.

Details

  • Added terminal response validation in WorkerProcess to guard final status transitions.
  • Updated request finalization logic so unusable terminal responses set:
    • status: errored
    • error message: [UNUSABLE_BACKEND_RESPONSE] backend resolved without a usable terminal response payload
  • Implemented GenerationResponse-aware usability criteria:
    • usable if non-empty text or output_metrics.total_tokens > 0
    • None terminal response is always unusable
    • non-GenerationResponse fallback remains bool(response) for generic/test compatibility
  • Added regression tests in tests/unit/scheduler/test_worker.py for:
    • no terminal response object -> errored
    • empty GenerationResponse -> errored
    • token-bearing GenerationResponse with empty text -> completed
  • Preserved existing cancellation/error flow behavior outside terminal-response validation.

Test Plan

  • Run targeted worker regression tests:
    • uv run pytest -q tests/unit/scheduler/test_worker.py -k "terminal_response or empty_generation_response or generation_response_with_tokens or invalid_initialization"
  • Verify expected outcomes:
    • missing terminal response is not marked completed
    • empty GenerationResponse is not marked completed
    • non-empty signal (output_tokens > 0) is accepted as completed
  • Confirm no lint issues on touched files.

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket marked this pull request as draft February 26, 2026 09:38
@ushaket
Copy link
Contributor Author

ushaket commented Feb 26, 2026

This introduce worker dependency on GenerationResponse, not sure if that's the right way to go,
The reason I added it is that the we need to somehow know what we're looking for, in the future we might have ResponseT that doesn't return text and total_tokens, so if we want to keep this check in the worker, we'll need to be aware of the different ResponseT possible

@sjmonson
Copy link
Collaborator

sjmonson commented Mar 2, 2026

Hmm, yeah I am not a fan of depending on GenerationResponse in the worker. Can you maybe just check inside the OpenAIRequestHandler.compile_streaming and OpenAIRequestHandler.compile_non_streaming methods? It does not generalize as nicely but what qualifies as an unusable response may differ between backends/endpoints. If you throw an exception in those methods it should propagate up to the worker.

ushaket added 3 commits March 2, 2026 23:55
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket marked this pull request as ready for review March 2, 2026 21:59
@ushaket
Copy link
Contributor Author

ushaket commented Mar 2, 2026

Done

Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase and get rid of the merge commits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker marks completed/successful when final response has no usable data

2 participants