-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[core] Add tags parameter to wake_up() #15500
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
Conversation
Signed-off-by: Eric <erictang000@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cc @youkaichao
Signed-off-by: Eric <erictang000@gmail.com>
vllm/executor/executor_base.py
Outdated
logger.warning("Executor is not sleeping.") | ||
return | ||
time_before_wakeup = time.perf_counter() | ||
self.collective_rpc("wake_up") | ||
self.collective_rpc("wake_up", kwargs=dict(tags=tags)) | ||
time_after_wakeup = time.perf_counter() | ||
self.is_sleeping = False | ||
logger.info("It took %.6f seconds to wake up.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add the wake-up tags in the logging.
we should also track how many tags are sleeping / waken up, and set self.is_sleeping = False
only after all tags are waken up.
since you cannot access the allocator in the executor, i'm fine with hard-coding sleeping_tags = ("weights", "kv_caches")
when we call sleep()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
@create_new_process_for_each_test() | ||
@pytest.mark.parametrize("model, use_v1", [("meta-llama/Llama-3.2-1B", True), | ||
("meta-llama/Llama-3.2-1B", False)]) | ||
def test_end_to_end_with_tags(monkeypatch: pytest.MonkeyPatch, model: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo it's too heavy to test both w/ and w/o flags in the ci. let's remove this test and only test it in the api server then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the test logic under the existing test_end_to_end
to avoid the reinitialization for now if that's better? can also delete entirely but maybe good to have a check for memory utilization looking correct with the wake_up("weights") call since that's the core motivation for this pr.
Signed-off-by: Eric <erictang000@gmail.com>
This pull request has merge conflicts that must be resolved before it can be |
…up_tags Signed-off-by: Eric <erictang000@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just nits. Leave to @youkaichao
Signed-off-by: Eric <erictang000@gmail.com>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, thanks for adding the functionality! one comment is please use query_params
instead of json
, to keep consistent with the rest code.
in addition, update the tests using params
, as is done in #14373
Signed-off-by: Eric <erictang000@gmail.com>
…up_tags Signed-off-by: Eric <erictang000@gmail.com>
fixed! |
Please fix the merge conflict |
This pull request has merge conflicts that must be resolved before it can be |
…up_tags Signed-off-by: Eric <erictang000@gmail.com>
fixed! |
Signed-off-by: Eric <erictang000@gmail.com>
Signed-off-by: Eric <erictang000@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Signed-off-by: Eric <erictang000@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: Eric <erictang000@gmail.com>
This is a memory optimization method implemented based on this [fix](vllm-project/vllm#15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.
This is a memory optimization method implemented based on this [fix](vllm-project/vllm#15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.
This is a memory optimization method implemented based on this [fix](vllm-project/vllm#15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.
This is a memory optimization method implemented based on this [fix](vllm-project/vllm#15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.
This is a memory optimization method implemented based on this [fix](vllm-project/vllm#15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.
Addresses #15254
Adds optional tags parameter for all calls to wake_up (for both online and offline mode). Previous behavior for calling wake_up() with no parameters should remain unchanged (will reallocate both weights and kv_cache together), but now the user has the option to call wake_up(tags=["weights"]), then wake_up(tags=["kv_cache"] in order to support better weight updating for RLHF (more details in #15254)
Also fixes tests/entrypoints/openai/test_sleep.