[V1][TPU] Change kv cache shape. #15145

vanbasten23 · 2025-03-19T17:29:03Z

This PR changes the kv cache shape from [num_blocks, block_size, num_kv_heads, head_size] to [num_blocks, block_size, num_kv_heads * head_size], in accordance with the ragged paged attention kernel change pytorch/xla#8851, in order to unblock the multi-chip scenario:

before the change, the ragged paged attention kernel will fail on some certain scenario ( eg if num_kv_head == 1 and dtype=bfloat16) because in this case num_kv_head is at the 2nd minor dimension and it will have implicit padding and waste memory. But if we change kv cache shape as in this PR, the num_kv_heads is no longer in 2nd minor dimension. For detail, please refer to the kernel change PR
we can avoid key = key.view(num_tokens, self.num_kv_heads, self.head_size).

Test plan:

VLLM_USE_V1=1 python vllm/examples/offline_inference/tpu.py 2>&1 | tee out.txt
VLLM_USE_V1=1 pytest -s -v vllm/tests/entrypoints/llm/test_accuracy.py::test_lm_eval_accuracy_v1_engine 2>&1 | tee out.txt

cc @miladm @bythew3i

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

github-actions · 2025-03-19T17:29:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

yaochengji

LGTM, thanks!

bythew3i

Thanks Xiongfei!

alexm-redhat

Thanks @vanbasten23 !

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

update torch_xla wheel and kv cache shape

6a3e160

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

mergify bot added ci/build v1 labels Mar 19, 2025

vanbasten23 marked this pull request as ready for review March 19, 2025 17:50

vanbasten23 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 19, 2025 17:50

vanbasten23 requested a review from yaochengji March 19, 2025 18:23

vanbasten23 changed the title ~~[V1][TPU] Change kv cache shape for better performance~~ [V1][TPU] Change kv cache shape. Mar 19, 2025

yaochengji approved these changes Mar 19, 2025

View reviewed changes

bythew3i approved these changes Mar 19, 2025

View reviewed changes

alexm-redhat approved these changes Mar 19, 2025

View reviewed changes

alexm-redhat enabled auto-merge (squash) March 19, 2025 18:46

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

WoosukKwon disabled auto-merge March 19, 2025 19:16

WoosukKwon merged commit b0e96aa into vllm-project:main Mar 19, 2025
28 of 51 checks passed

vanbasten23 mentioned this pull request Mar 19, 2025

[DO NOT REVIEW YET] Integrate with the write-to-kvcache Pallas kernel #15067

Draft

gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025

[V1][TPU] Change kv cache shape. (vllm-project#15145)

ed7c990

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1][TPU] Change kv cache shape. (vllm-project#15145)

8a54154

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[V1][TPU] Change kv cache shape. (vllm-project#15145)

93faabb

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][TPU] Change kv cache shape. #15145

[V1][TPU] Change kv cache shape. #15145

vanbasten23 commented Mar 19, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 19, 2025

yaochengji left a comment

bythew3i left a comment

alexm-redhat left a comment

[V1][TPU] Change kv cache shape. #15145

[V1][TPU] Change kv cache shape. #15145

Conversation

vanbasten23 commented Mar 19, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 19, 2025

yaochengji left a comment

Choose a reason for hiding this comment

bythew3i left a comment

Choose a reason for hiding this comment

alexm-redhat left a comment

Choose a reason for hiding this comment

vanbasten23 commented Mar 19, 2025 •

edited by github-actions bot

Loading