[torch.compile] hide slicing under custom op for inductor #8384

youkaichao · 2024-09-11T23:36:51Z

when inductor sees a view being mutated, it will copy the tensor.

hiding the slicing operation under custom op solves the issue for inductor.

github-actions · 2024-09-11T23:37:03Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-09-11T23:37:35Z

cc @laithsakka

youkaichao · 2024-09-12T05:54:46Z

after this pr, the major bug of inductor ( it tries to copy kvcache) should be fixed.

however, there are still some inefficiencies w.r.t. inductor. it cannot deal with inplace custom op well.

bnellnm · 2024-09-12T19:19:24Z

Thanks for finding and fixing this!

laithsakka · 2024-09-12T20:27:43Z

This is fixed on pytorch 2.5 by setting TORCHDYNAMO_AUTO_FUNCTIONALIZED_V2=1

laithsakka · 2024-09-12T20:28:21Z

can you elaborate more on "there are still some inefficiencies w.r.t. inductor. it cannot deal with inplace custom op well."

youkaichao · 2024-09-12T20:31:27Z

you can take a look at the log when you run with inductor:

# GPU blocks: 790

when I turn on inductor, the blocks become smaller, which means inductor takes more memory and we have less memory for the kv cache.

laithsakka · 2024-09-18T15:22:50Z

takes more memory and we have less memory for the kv cache.

I see i will create an issue to revisit this, for the models i ran inductor on , it was slightly better than not using torch.compile. I will sync with you on that
pytorch/pytorch#136269

…ct#8384) Signed-off-by: Alvant <alvasian@yandex.ru>

…ct#8384) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

youkaichao added 2 commits September 11, 2024 15:51

remove slice

13aa781

only register and change reshape_and_cache_flash

d1d1fc4

youkaichao added 7 commits September 11, 2024 20:00

fix inductor memory increase

a87fcad

use inductor

6ed0984

use dummy weight for easy test

9f198b0

remove empty like

ec931fe

remove empty like

3f6d186

resume custom op

983b97c

resume eager

b36f931

youkaichao added 2 commits September 11, 2024 23:28

add comments

e132fe7

add comments

0244c3c

youkaichao merged commit 7de49aa into vllm-project:main Sep 12, 2024
25 of 29 checks passed

youkaichao deleted the fix_inductor branch September 12, 2024 07:11

laithsakka mentioned this pull request Sep 18, 2024

[VLLM/Inductor] Inductor reported to uses more memory and less efficient that pytorch/pytorch#136269

Closed

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[torch.compile] hide slicing under custom op for inductor (vllm-proje…

77908bb

…ct#8384) Signed-off-by: Alvant <alvasian@yandex.ru>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[torch.compile] hide slicing under custom op for inductor (vllm-proje…

7e8988a

…ct#8384) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] hide slicing under custom op for inductor #8384

[torch.compile] hide slicing under custom op for inductor #8384

youkaichao commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

youkaichao commented Sep 11, 2024

youkaichao commented Sep 12, 2024

bnellnm commented Sep 12, 2024

laithsakka commented Sep 12, 2024

laithsakka commented Sep 12, 2024

youkaichao commented Sep 12, 2024

laithsakka commented Sep 18, 2024

[torch.compile] hide slicing under custom op for inductor #8384

[torch.compile] hide slicing under custom op for inductor #8384

Conversation

youkaichao commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

youkaichao commented Sep 11, 2024

youkaichao commented Sep 12, 2024

bnellnm commented Sep 12, 2024

laithsakka commented Sep 12, 2024

laithsakka commented Sep 12, 2024

youkaichao commented Sep 12, 2024

laithsakka commented Sep 18, 2024