-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[VLM][Model] TP support for ViTs #7186
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
[VLM][Model] TP support for ViTs #7186
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
f214028
to
414040f
Compare
[Intermediate status] |
With the following simple test codes, I can successfully run all listed models on both
|
This is great and thank you so much for the implementation and thorough testing coverage. I will take a look this week and get back to you! |
@ywang96 @DarkLight1337 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've run all the models again with your test file, so let's get this in! Thank you for the work! @ChristopherCho
Ah... this will actually break the CPU test. How about using transformers |
@ywang96 |
@ywang96 I've checked the error message and found some issues here.
However, for other VLM models, they're deselected (as I think we can avoid the error by just importing |
@ChristopherCho I see, let me try to move those import statements to inside the run_test call and see if that helps |
#8055 (comment) |
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alvant <alvasian@yandex.ru>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
As a follow-up PR to #6942, I've implemented the TP version of various ViTs. The following models have been changed:
Following the
Idefics2VisionAttention
, I've used thememory_efficient_attention_forward
fromxformers
.To load the models correctly, the
load_weights
part of the models that use these ViTs should also be updated. Thus, the following models have been changed.