Fix the guidellm benchmark --sample-requests command line option#591
Merged
sjmonson merged 2 commits intovllm-project:mainfrom Feb 11, 2026
Merged
Fix the guidellm benchmark --sample-requests command line option#591sjmonson merged 2 commits intovllm-project:mainfrom
sjmonson merged 2 commits intovllm-project:mainfrom
Conversation
sjmonson
requested changes
Feb 9, 2026
Collaborator
sjmonson
left a comment
There was a problem hiding this comment.
Not a huge fan of this because its papering over some bad design, but it will do for now.
The request_samples defaults are all over the place; by default we should not sample so needs this patch as well:
diff --git a/src/guidellm/benchmark/benchmarker.py b/src/guidellm/benchmark/benchmarker.py
index 56cdb9a7..c0caba40 100644
--- a/src/guidellm/benchmark/benchmarker.py
+++ b/src/guidellm/benchmark/benchmarker.py
@@ -64,7 +64,7 @@ class Benchmarker(
environment: Environment,
warmup: TransientPhaseConfig,
cooldown: TransientPhaseConfig,
- sample_requests: int | None = 20,
+ sample_requests: int | None = None,
prefer_response_metrics: bool = True,
progress: (
BenchmarkerProgress[BenchmarkAccumulatorT, BenchmarkT] | None
diff --git a/src/guidellm/benchmark/schemas/base.py b/src/guidellm/benchmark/schemas/base.py
index 9a41171f..9370c215 100644
--- a/src/guidellm/benchmark/schemas/base.py
+++ b/src/guidellm/benchmark/schemas/base.py
@@ -273,7 +273,7 @@ class BenchmarkConfig(StandardBaseDict):
description="Constraint definitions applied to scheduler strategy execution",
)
sample_requests: int | None = Field(
- default=20,
+ default=None,
description="Request count for statistical sampling in final metrics",
)
warmup: TransientPhaseConfig = Field(
diff --git a/src/guidellm/benchmark/schemas/generative/entrypoints.py b/src/guidellm/benchmark/schemas/generative/entrypoints.py
index 45d9a4b2..e85a5ba5 100644
--- a/src/guidellm/benchmark/schemas/generative/entrypoints.py
+++ b/src/guidellm/benchmark/schemas/generative/entrypoints.py
@@ -252,7 +252,7 @@ class BenchmarkGenerativeTextArgs(StandardBaseModel):
)
# Benchmarker configuration
sample_requests: int | None = Field(
- default=10,
+ default=None,
description="Number of requests to sample for detailed metrics (None for all)",
)
warmup: int | float | dict | TransientPhaseConfig | None = Field(
Contributor
Author
|
@sjmonson thanks for reviewing - would you prefer those as a follow-up commit or rolled into the original? Re "by default we should not sample" - that'd be my preference too, I think it'd be better to opt-in. |
Collaborator
No preference, either is fine. |
The default behaviour of saving requests in the benchmarks.json file can result in extrememly large JSON output there. This can be quite unwieldy if all one needs is benchmark results. Such verbose logging could fill the root or other local filesystem, and accidentally cause benchmark or machine failure. I had previously used --sample-requests=0 to avoid this problem, but it regressed at some point late last year (v0.4?). I am having good success with this patch to address the issue though. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nathan Scott <nathans@redhat.com> Reviewed-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com> Reviewed-by: Nathan Scott <nathans@redhat.com>
1c38def to
7f183ee
Compare
sjmonson
approved these changes
Feb 11, 2026
sjmonson
added a commit
that referenced
this pull request
Mar 12, 2026
## Summary Fix the guidellm benchmark --sample-requests command line option ## Details The default behavior of saving requests in the benchmarks.json file can result in very large JSON output files. These can be quite unwieldy if all one needs is benchmark results. Such verbose logging could fill the root or other local filesystem, and accidentally cause benchmark or machine failure. I had previously used --sample-requests=0 to avoid this problem, but it regressed at some point late last year (v0.4?). I am having good success with this patch to address the issue though. --- ## Use of AI 🤖 Generated with [Claude Code](https://claude.com/claude-code) - [ ] "I certify that all code in this PR is my own, except as noted below." (none of this code is my own, it is 100% generated by AI) ## Use of AI - [ ] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the guidellm benchmark --sample-requests command line option
Details
The default behavior of saving requests in the benchmarks.json file can result in very large JSON output files. These can be quite unwieldy if all one needs is benchmark results.
Such verbose logging could fill the root or other local filesystem, and accidentally cause benchmark or machine failure.
I had previously used --sample-requests=0 to avoid this problem, but it regressed at some point late last year (v0.4?). I am having good success with this patch to address the issue though.
Use of AI
🤖 Generated with Claude Code
(none of this code is my own, it is 100% generated by AI)
Use of AI
## WRITTEN BY AI ##)