Skip to content

[core] Add tags parameter to wake_up() #15500

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 8 commits into from
Apr 2, 2025

Conversation

erictang000
Copy link
Contributor

@erictang000 erictang000 commented Mar 25, 2025

Addresses #15254

Adds optional tags parameter for all calls to wake_up (for both online and offline mode). Previous behavior for calling wake_up() with no parameters should remain unchanged (will reallocate both weights and kv_cache together), but now the user has the option to call wake_up(tags=["weights"]), then wake_up(tags=["kv_cache"] in order to support better weight updating for RLHF (more details in #15254)

Also fixes tests/entrypoints/openai/test_sleep.

Signed-off-by: Eric <erictang000@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cc @youkaichao

Signed-off-by: Eric <erictang000@gmail.com>
logger.warning("Executor is not sleeping.")
return
time_before_wakeup = time.perf_counter()
self.collective_rpc("wake_up")
self.collective_rpc("wake_up", kwargs=dict(tags=tags))
time_after_wakeup = time.perf_counter()
self.is_sleeping = False
logger.info("It took %.6f seconds to wake up.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add the wake-up tags in the logging.

we should also track how many tags are sleeping / waken up, and set self.is_sleeping = False only after all tags are waken up.

since you cannot access the allocator in the executor, i'm fine with hard-coding sleeping_tags = ("weights", "kv_caches") when we call sleep()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

@create_new_process_for_each_test()
@pytest.mark.parametrize("model, use_v1", [("meta-llama/Llama-3.2-1B", True),
("meta-llama/Llama-3.2-1B", False)])
def test_end_to_end_with_tags(monkeypatch: pytest.MonkeyPatch, model: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it's too heavy to test both w/ and w/o flags in the ci. let's remove this test and only test it in the api server then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the test logic under the existing test_end_to_end to avoid the reinitialization for now if that's better? can also delete entirely but maybe good to have a check for memory utilization looking correct with the wake_up("weights") call since that's the core motivation for this pr.

Signed-off-by: Eric <erictang000@gmail.com>
Copy link

mergify bot commented Mar 27, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @erictang000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 27, 2025
…up_tags

Signed-off-by: Eric <erictang000@gmail.com>
@mergify mergify bot removed the needs-rebase label Mar 27, 2025
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just nits. Leave to @youkaichao

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 27, 2025
Signed-off-by: Eric <erictang000@gmail.com>
Copy link

mergify bot commented Mar 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @erictang000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 28, 2025
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, thanks for adding the functionality! one comment is please use query_params instead of json , to keep consistent with the rest code.

in addition, update the tests using params, as is done in #14373

Signed-off-by: Eric <erictang000@gmail.com>
…up_tags

Signed-off-by: Eric <erictang000@gmail.com>
@mergify mergify bot removed the needs-rebase label Mar 31, 2025
@erictang000
Copy link
Contributor Author

fixed!

@DarkLight1337
Copy link
Member

Please fix the merge conflict

Copy link

mergify bot commented Apr 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @erictang000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 1, 2025
…up_tags

Signed-off-by: Eric <erictang000@gmail.com>
@mergify mergify bot removed the needs-rebase label Apr 1, 2025
@erictang000
Copy link
Contributor Author

fixed!

@vllm-bot vllm-bot merged commit ddb94c2 into vllm-project:main Apr 2, 2025
33 of 37 checks passed
StevenShi-23 pushed a commit to StevenShi-23/vllm that referenced this pull request Apr 3, 2025
Signed-off-by: Eric <erictang000@gmail.com>
Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025
Signed-off-by: Eric <erictang000@gmail.com>
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Eric <erictang000@gmail.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
Signed-off-by: Eric <erictang000@gmail.com>
hiyouga pushed a commit to volcengine/verl that referenced this pull request Apr 10, 2025
This is a memory optimization method implemented based on this
[fix](vllm-project/vllm#15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
zszheng pushed a commit to zszheng/verl that referenced this pull request Apr 11, 2025
This is a memory optimization method implemented based on this
[fix](vllm-project/vllm#15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
yanfeng98 pushed a commit to yanfeng98/fork-verl that referenced this pull request Apr 11, 2025
This is a memory optimization method implemented based on this
[fix](vllm-project/vllm#15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
wangyuchen333 pushed a commit to wangyuchen333/verl that referenced this pull request Apr 25, 2025
This is a memory optimization method implemented based on this
[fix](vllm-project/vllm#15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
yhyang201 pushed a commit to yhyang201/verl that referenced this pull request Apr 26, 2025
This is a memory optimization method implemented based on this
[fix](vllm-project/vllm#15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants