Skip to content

Tags: vllm-project/guidellm

Tags

v0.5.4

Toggle v0.5.4's commit message
PATCH Change UI template version to match branch

Signed-off-by: Samuel Monson <smonson@redhat.com>

v0.5.3

Toggle v0.5.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Added rampup to constant rate type (#549)

## Summary

Simply allows a linear rampup of the constant rate profile.

## Test Plan

The simplest test is to run a short constant test with 4 requests per
second, with a long rampup. You can see how it ramps as expected.
There are also new tests.

## Related Issues

Fulfills part of the goals of #428 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

v0.5.2

Toggle v0.5.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
OpenAI API-Key Support (#535)

## Summary

A basic set of changes to add the api key as a bearer token to all
relevant requests.

## Details

- The user passes the api key in as an argument to the backend
- OpenAI's protocol specifies that it should be specified as a bearer
token
- Headers are merged, because requests can have 
- This PR also cleans up dead code that was unused since the refactor
- I excluded the API key from the info data structure for security
purposes. Let me know if some info belongs there, like a boolean value
specifying if an API key is provided, or if a cryptic hash of the token
would be helpful. But otherwise I think it's good as-is.

## Test Plan

Run a vLLM server with the option `--api-key <your API key>` passed in.
After doing that, run a PR with this not specified, guidellm would
usually fail. Try with the options as documented in this PR's content,
and it should work.

## Related Issues

- Resolves: #491 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

v0.5.1

Toggle v0.5.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Record output_tokens for incomplete requests (#519)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Sets `continuous_usage_stats` to get token usage on incomplete requests.
If usage is still unavailable fall back to iteration count.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
In v0.3.0 and earlier the number of iterations was used as proxy for
output token count in incomplete requests that did not return usage
metrics. In v0.4.0 this behavior was removed which lead to large
discrepancies in output token count based on the percentage of the
benchmark consisting of incomplete requests.

This PR restore the original behavior of falling back to number of
iterations. Additionally it sets the `continuous_usage_stats` flag to
enable usage metrics on every iteration, when available.

## Test Plan

<!--
List the steps needed to test this PR.
-->
- Run a long-generation, high concurrency benchmark using a
`max-seconds` constraint. For incomplete requests check that
output_tokens is greater than 0 for some requests.

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #514 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

v0.5.0

Toggle v0.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Allow unlimited connections per-worker (#488)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
By default each httpx client supports a maximum of 100 connections
([ref](https://www.python-httpx.org/advanced/resource-limits/)). We want
this uncapped as connection count is maintained by a semaphore.

## Test Plan

<!--
List the steps needed to test this PR.
-->
- See #487

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #487 

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

v0.4.0

Toggle v0.4.0's commit message
Release v0.4.0

v0.3.1

Toggle v0.3.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Configurable max_tokens/max_completion_tokens key (#399)

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

Makes the `max_tokens` request key configurable through an environment
variable per endpoint type. Defaults to `max_tokens` for legacy
`completions` and `max_completion_tokens` for `chat/completions`

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a
dict mapping from route name -> output tokens key. Default is
`{"text_completions": "max_tokens", "chat_completions":
"max_completion_tokens"}`

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Closes #395
- Closes #269
- Related #210

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

v0.3.0

Toggle v0.3.0's commit message
Release v0.3.0

v0.2.1

Toggle v0.2.1's commit message
Release version 0.2.1

v0.2.0

Toggle v0.2.0's commit message
Release version 0.2.0