Tags · vllm-project/guidellm

v0.5.4

PATCH Change UI template version to match branch

Signed-off-by: Samuel Monson <smonson@redhat.com>

Mar 12, 2026
fc31a9e
zip
tar.gz
Notes

v0.5.3

Added rampup to constant rate type (#549)

## Summary

Simply allows a linear rampup of the constant rate profile.

## Test Plan

The simplest test is to run a short constant test with 4 requests per
second, with a long rampup. You can see how it ramps as expected.
There are also new tests.

## Related Issues

Fulfills part of the goals of #428 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Jan 23, 2026
f9f1e31
zip
tar.gz
Notes

v0.5.2

OpenAI API-Key Support (#535)

## Summary

A basic set of changes to add the api key as a bearer token to all
relevant requests.

## Details

- The user passes the api key in as an argument to the backend
- OpenAI's protocol specifies that it should be specified as a bearer
token
- Headers are merged, because requests can have 
- This PR also cleans up dead code that was unused since the refactor
- I excluded the API key from the info data structure for security
purposes. Let me know if some info belongs there, like a boolean value
specifying if an API key is provided, or if a cryptic hash of the token
would be helpful. But otherwise I think it's good as-is.

## Test Plan

Run a vLLM server with the option `--api-key <your API key>` passed in.
After doing that, run a PR with this not specified, guidellm would
usually fail. Try with the options as documented in this PR's content,
and it should work.

## Related Issues

- Resolves: #491 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Jan 16, 2026
26dfc10
zip
tar.gz
Notes

v0.5.1

Record output_tokens for incomplete requests (#519)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Sets `continuous_usage_stats` to get token usage on incomplete requests.
If usage is still unavailable fall back to iteration count.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
In v0.3.0 and earlier the number of iterations was used as proxy for
output token count in incomplete requests that did not return usage
metrics. In v0.4.0 this behavior was removed which lead to large
discrepancies in output token count based on the percentage of the
benchmark consisting of incomplete requests.

This PR restore the original behavior of falling back to number of
iterations. Additionally it sets the `continuous_usage_stats` flag to
enable usage metrics on every iteration, when available.

## Test Plan

<!--
List the steps needed to test this PR.
-->
- Run a long-generation, high concurrency benchmark using a
`max-seconds` constraint. For incomplete requests check that
output_tokens is greater than 0 for some requests.

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #514 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Jan 12, 2026
3a2b13d
zip
tar.gz
Notes

v0.5.0

Allow unlimited connections per-worker (#488)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
By default each httpx client supports a maximum of 100 connections
([ref](https://www.python-httpx.org/advanced/resource-limits/)). We want
this uncapped as connection count is maintained by a semaphore.

## Test Plan

<!--
List the steps needed to test this PR.
-->
- See #487

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #487 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Dec 5, 2025
adfa108
zip
tar.gz
Notes

v0.4.0

Release v0.4.0

Nov 14, 2025
f6175cd
zip
tar.gz
Notes

v0.3.1

Configurable max_tokens/max_completion_tokens key (#399)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

Makes the `max_tokens` request key configurable through an environment
variable per endpoint type. Defaults to `max_tokens` for legacy
`completions` and `max_completion_tokens` for `chat/completions`

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a
dict mapping from route name -> output tokens key. Default is
`{"text_completions": "max_tokens", "chat_completions":
"max_completion_tokens"}`

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Closes #395
- Closes #269
- Related #210

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>