-
-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph
nvidia
v1
#35376
opened Feb 26, 2026 by
izhuhaoran
Loading…
[Bug] correct out dtype of rms_norm_gated native path
bug
Something isn't working
#35369
opened Feb 26, 2026 by
zufangzhu
Loading…
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression
bug
Something isn't working
multi-modality
Related to multi-modality (#4194)
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
#35368
opened Feb 26, 2026 by
linyueqian
Loading…
[WIP] [Feature] Add Qwen3-ForcedAligner support via token classification pooling
documentation
Improvements or additions to documentation
new-model
Requests to new models
qwen
Related to Qwen models
[WIP] [Bugfix] Fix MXFP4 weight_loader crash on per-expert 2-D weights
bug
Something isn't working
#35366
opened Feb 26, 2026 by
haosdent
Loading…
[Bugfix] Fix SymmMemCommunicator disabled on SM 10.3/12.0 GPUs
bug
Something isn't working
#35360
opened Feb 26, 2026 by
haosdent
Loading…
feat: add trace spans for failed API requests
frontend
#35359
opened Feb 26, 2026 by
RichardoMrMu
Loading…
[Bugfix][Frontend] Fix reasoning-end detection to check prompt tail o…
bug
Something isn't working
frontend
#35358
opened Feb 26, 2026 by
Julien-ser
Loading…
5 tasks done
[Bugfix] Use is_integrated to detect UMA GPUs for memory reporting
bug
Something isn't working
nvidia
#35356
opened Feb 26, 2026 by
haosdent
Loading…
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
#35354
opened Feb 26, 2026 by
LucasWilkinson
Loading…
1 of 2 tasks
[Bug] Fix missing <think> tag after tool call in MiniMax 2.1
bug
Something isn't working
#35352
opened Feb 26, 2026 by
stingoChen
Loading…
3 of 5 tasks
[XPU] special handle for pooler models w8a16 gemm
#35351
opened Feb 26, 2026 by
yma11
Loading…
5 tasks
[Model Runner V2] Add model states [1/N]
nvidia
v1
#35350
opened Feb 26, 2026 by
WoosukKwon
Loading…
Fix Qwen 3.5 tool calling problem
qwen
Related to Qwen models
#35347
opened Feb 26, 2026 by
sunqingn7
Loading…
5 tasks
Cpu dispatcher
ci/build
cpu
Related to CPU backends
documentation
Improvements or additions to documentation
nvidia
performance
Performance-related issues
rocm
Related to AMD ROCm
v1
#35346
opened Feb 26, 2026 by
majian4work
Loading…
feat(kv-offload): Strategy B — AdaptiveOffloadingPolicy + non-blocking loads
kv-connector
v1
#35343
opened Feb 26, 2026 by
Srinivasoo7
Loading…
5 tasks
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency
v1
#35342
opened Feb 26, 2026 by
Srinivasoo7
Loading…
5 tasks
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends
documentation
Improvements or additions to documentation
rocm
Related to AMD ROCm
v1
#35334
opened Feb 25, 2026 by
gshtras
Loading…
[Perf] Optimize model runner v2 ONLY add when PR is ready to merge/full CI is needed
v1
v2
prepare_inputs copy logic, 6.1% E2E throughput improvement
ready
#35333
opened Feb 25, 2026 by
yewentao256
Loading…
[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#35330
opened Feb 25, 2026 by
yewentao256
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-02-22.