[core] add bucket padding to tpu_model_runner #14995

Chenyaaang · 2025-03-18T03:34:29Z

Add bucket padding to tpu, instead of padding to the power of 2, if num_token < bucket_padding_gap, pad to the nearest power of 2, if num_token > bucket_padding_gap, the padding size is increased by bucket_padding_gap.
For example, bucket_padding_gap = 64, max_num_batch_tokens = 512, then the paddings will be 16, 32, 64, 128, 192, 256, 320, 384, 448, 512. This helps reduce the computation cost for large num_tokens, e.g. num_tokens = 300, instead of padding to 512, now pad to 320.

FIX #14581

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

github-actions · 2025-03-18T03:34:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

vllm/config.py

vllm/v1/worker/tpu_model_runner.py

yaochengji · 2025-03-18T18:06:59Z

@Chenyaaang Thanks for your contribution, left some comments above!

@robertgshaw2-redhat I know there's another configuration option cudagraph_capture_sizes, which is similar to bucket_padding_gap in this PR, do you think we should merge them into one, and has default value for different platform?

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

mergify · 2025-03-18T23:04:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chenyaaang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

alexm-redhat

Good idea!

DarkLight1337 · 2025-03-20T04:37:19Z

Please fix the merge conflict

NickLucche

looks good! Also in favor of unifying with cudagraph_capture_sizes

vllm/v1/worker/tpu_model_runner.py

mergify · 2025-03-20T14:04:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chenyaaang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang · 2025-03-21T04:21:01Z

Do you think that we should allow the user to specify this? I would think that we should try to keep this internal unless there is a strong reason why someone would toggle this?

I think the param is useful for online serving, by exposing this parameter, users can adjust the gap based on the model size. My understanding is for example the gap is small then it will lead to a long pre-compile time, that's something customer can adjust.

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

robertgshaw2-redhat · 2025-03-21T14:24:15Z

This PR seems to have broken CUDA

alexm-redhat · 2025-03-21T14:25:48Z

@Chenyaaang thanks for reducing the padding gap, this is useful. Could you please address @robertgshaw2-redhat comment and fix the build (so we can merge it). Thanks!

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang · 2025-03-21T20:27:43Z

@Chenyaaang thanks for reducing the padding gap, this is useful. Could you please address @robertgshaw2-redhat comment and fix the build (so we can merge it). Thanks!

I've fixed the build and replied to @robertgshaw2-redhat's comment.

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

lsy323 · 2025-03-24T20:56:02Z

tests/tpu/test_compilation.py

-# Check we have 4 compiled codes
-assert len(compiled_codes) == 4
+# Check we have 3 compiled codes
+assert len(compiled_codes) == 3


@Chenyaaang Thanks for looking into this! Could we branch out for v0 and v1? In v0 it should be 4 compiled code.

Done, thanks!

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

This reverts commit 7d92244. Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

This reverts commit f7bdb02. Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

add bucket padding to tpu

048e3b1

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 18, 2025 03:34

mergify bot added ci/build v1 labels Mar 18, 2025

revert test.txt

5c93dfd

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

yaochengji reviewed Mar 18, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

move bucket_padding to compilationConfig and add unit test

4f8f3d2

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

mergify bot added the needs-rebase label Mar 18, 2025

Merge remote-tracking branch 'origin/main' into bucket_padding

da80880

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

mergify bot removed the needs-rebase label Mar 18, 2025

fix bug

3869ec4

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

alexm-redhat approved these changes Mar 19, 2025

View reviewed changes

alexm-redhat enabled auto-merge (squash) March 19, 2025 21:09

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

NickLucche suggested changes Mar 20, 2025

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 20, 2025

Chenyaaang added 2 commits March 20, 2025 16:46

fix comments

21ba037

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Merge remote-tracking branch 'origin/main' into bucket_padding

7e18e5a

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

auto-merge was automatically disabled March 20, 2025 17:22
Head branch was pushed to by a user without write access

mergify bot removed the needs-rebase label Mar 20, 2025

Chenyaaang added 2 commits March 21, 2025 04:23

fix comments

722a8f6

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Merge remote-tracking branch 'origin/main' into bucket_padding

cbdb517

initialize compilation config in vllmconfig

2bd994a

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang and others added 5 commits March 24, 2025 11:01

Merge branch 'vllm-project:main' into bucket_padding

d39a642

covert to enviornment variable

88c56b6

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

nit

f34b27e

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

rever

8d3a1e8

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

update test

f7bdb02

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

lsy323 reviewed Mar 24, 2025

View reviewed changes

update tpu/test_compilation to differentiate V0 and V1

7d92244

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang force-pushed the bucket_padding branch from a725d2b to 7d92244 Compare March 24, 2025 21:15

robertgshaw2-redhat and others added 3 commits March 24, 2025 21:17

updated

3f4f850

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

Revert "update tpu/test_compilation to differentiate V0 and V1"

dcfd108

This reverts commit 7d92244. Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Revert "update test"

7535bd9

This reverts commit f7bdb02. Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

Chenyaaang force-pushed the bucket_padding branch from 33a8bbc to 7535bd9 Compare March 24, 2025 22:15

Chenyaaang added 2 commits March 25, 2025 18:02

Merge remote-tracking branch 'origin/main' into bucket_padding

3e25f0b

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

add unit test to the existing test_tpu_model_runner

ea00dca

Signed-off-by: Chenyaaang <llccyy1212@gmail.com>

robertgshaw2-redhat enabled auto-merge (squash) March 25, 2025 21:25

robertgshaw2-redhat approved these changes Mar 25, 2025

View reviewed changes

robertgshaw2-redhat merged commit ac3cd6e into vllm-project:main Mar 25, 2025
33 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Chenyaaang mentioned this pull request Apr 23, 2025

Introduce PaddingConfig to combine GPU cudagraph_capture_sizes and TPU num_tokens_paddings #17081

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] add bucket padding to tpu_model_runner #14995

[core] add bucket padding to tpu_model_runner #14995

Chenyaaang commented Mar 18, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 18, 2025

yaochengji commented Mar 18, 2025

mergify bot commented Mar 18, 2025

alexm-redhat left a comment

DarkLight1337 commented Mar 20, 2025

NickLucche left a comment

mergify bot commented Mar 20, 2025

Chenyaaang commented Mar 21, 2025

robertgshaw2-redhat commented Mar 21, 2025

alexm-redhat commented Mar 21, 2025

Chenyaaang commented Mar 21, 2025

lsy323 Mar 24, 2025

Chenyaaang Mar 24, 2025

[core] add bucket padding to tpu_model_runner #14995

[core] add bucket padding to tpu_model_runner #14995

Conversation

Chenyaaang commented Mar 18, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 18, 2025

yaochengji commented Mar 18, 2025

mergify bot commented Mar 18, 2025

alexm-redhat left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Mar 20, 2025

NickLucche left a comment

Choose a reason for hiding this comment

mergify bot commented Mar 20, 2025

Chenyaaang commented Mar 21, 2025

robertgshaw2-redhat commented Mar 21, 2025

alexm-redhat commented Mar 21, 2025

Chenyaaang commented Mar 21, 2025

lsy323 Mar 24, 2025

Choose a reason for hiding this comment

Chenyaaang Mar 24, 2025

Choose a reason for hiding this comment

Chenyaaang commented Mar 18, 2025 •

edited by github-actions bot

Loading