-
Notifications
You must be signed in to change notification settings - Fork 141
Streaming requests are using vLLM specific options (not part of the OpenAI API) #603
Copy link
Copy link
Closed
Copy link
Description
Bug Description
Streaming requests are using the vLLM specific option continuous_usage_stats which is not part of the standard OpenAI API, e.g.:
{
"model": "google/gemma3",
"stream": true,
"stream_options": {
"include_usage": true,
"continuous_usage_stats": true
},
...
}
I'd like to benchmark vLLM instances behind a reverse proxy that is performing strict OpenAI API validation and this is not currently possible because the proxy returns an HTTP 4xx error for each request of guidellm.
Expected Behavior
Having an option which disables continuous_usage_stats and thus making the API request compatible with the OpenAI API spec
Steps to Reproduce
Run guidellm benchmark and observe an error for every request being made.
Operating System
Ubuntu 24.04
Python Version
Python 3.12.3
GuideLLM Version
0.5.3
Installation Method
pip install guidellm
Installation Details
No response
Error Messages or Stack Traces
Additional Context
No response
Reactions are currently unavailable