Skip to content

[torch.compile] hide slicing under custom op for inductor #8384

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 11 commits into from
Sep 12, 2024

Conversation

youkaichao
Copy link
Member

see pytorch/pytorch#131192

when inductor sees a view being mutated, it will copy the tensor.

hiding the slicing operation under custom op solves the issue for inductor.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member Author

cc @laithsakka

@youkaichao
Copy link
Member Author

after this pr, the major bug of inductor ( it tries to copy kvcache) should be fixed.

however, there are still some inefficiencies w.r.t. inductor. it cannot deal with inplace custom op well.

@youkaichao youkaichao merged commit 7de49aa into vllm-project:main Sep 12, 2024
25 of 29 checks passed
@youkaichao youkaichao deleted the fix_inductor branch September 12, 2024 07:11
@bnellnm
Copy link
Contributor

bnellnm commented Sep 12, 2024

Thanks for finding and fixing this!

@laithsakka
Copy link

This is fixed on pytorch 2.5 by setting TORCHDYNAMO_AUTO_FUNCTIONALIZED_V2=1

@laithsakka
Copy link

can you elaborate more on "there are still some inefficiencies w.r.t. inductor. it cannot deal with inplace custom op well."

@youkaichao
Copy link
Member Author

you can take a look at the log when you run with inductor:

# GPU blocks: 790

when I turn on inductor, the blocks become smaller, which means inductor takes more memory and we have less memory for the kv cache.

@laithsakka
Copy link

takes more memory and we have less memory for the kv cache.

I see i will create an issue to revisit this, for the models i ran inductor on , it was slightly better than not using torch.compile. I will sync with you on that
pytorch/pytorch#136269

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants