Skip to content

Streaming requests are using vLLM specific options (not part of the OpenAI API) #603

@rgerganov

Description

@rgerganov

Bug Description

Streaming requests are using the vLLM specific option continuous_usage_stats which is not part of the standard OpenAI API, e.g.:

{
  "model": "google/gemma3",
  "stream": true,
  "stream_options": {
    "include_usage": true,
    "continuous_usage_stats": true
  },
  ...
}

I'd like to benchmark vLLM instances behind a reverse proxy that is performing strict OpenAI API validation and this is not currently possible because the proxy returns an HTTP 4xx error for each request of guidellm.

Expected Behavior

Having an option which disables continuous_usage_stats and thus making the API request compatible with the OpenAI API spec

Steps to Reproduce

Run guidellm benchmark and observe an error for every request being made.

Operating System

Ubuntu 24.04

Python Version

Python 3.12.3

GuideLLM Version

0.5.3

Installation Method

pip install guidellm

Installation Details

No response

Error Messages or Stack Traces

Additional Context

No response

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions