-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[V1] [Feature] Collective RPC #15444
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
Conversation
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
I have verified it can pass the tests when tp-size=1. I may get access to a multi-GPU machine within several days. I can test it after that or someone else can help to verify it? Thanks! |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
cc @russellb for security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @wwl2755.
The collective_rpc
changes look good but we may want to look at the cloudpickle
change more closely.
IIRC cloudpickle
is generally slower than pickle
and so using it as the default here might have an impact to performance of other aspects (I'm thinking about the multi-modal case in particular where this path is used to transfer large data to the engine). One possibility is to have it configured via an env var so that it can be turned on if needed.
Something like VLLM_PICKLE
= pickle
, cloudpickle
, or disabled
. And we may actually want to aim to make disabled
the default for security reasons (though will need to modify some built-in things like the multi-modal case to work without it).
Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0
from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?
I'm OK with the change as it is here from a security perspective. Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem. |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Yes,
Maybe we could make
Sure. With the updated commit, |
I'm not so familiar with the CI/CD build. After I have removed the use of |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff? |
No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7 It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of |
I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method? One naive solution I can think of now is that we restrict the Do you have any suggestion? |
There's a tension here between the ultimate flexibility of pickle and providing a secure interface. I'm working on some improvements here, but I don't want to give too much detail because the details are a bit sensitive. For now, don't worry about it. |
I see. Thank you for commenting. |
Besides the |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
The comments are addressed by the latest commits. PTAL. Thank you! @njhill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wwl2755
It seems weird, because I have tested on my local machine and it can passed all the tests from |
Retrying |
This one is a legit failure: https://buildkite.com/vllm/ci/builds/16314#0195d722-84b7-4833-a6e3-734e8e657969/206-571. @wwl2755 perhaps you could add a third custom type in serial_utils.py like this: def custom_enc_hook(obj: Any) -> Any:
if isinstance(obj, torch.Tensor):
# NOTE(rob): it is fastest to use numpy + pickle
# when serializing torch tensors.
# https://gist.github.com/tlrmchlsmth/8067f1b24a82b6e2f90450e7764fa103 # noqa: E501
return msgpack.Ext(CUSTOM_TYPE_TENSOR, pickle.dumps(obj.numpy()))
if isinstance(obj, FunctionType):
return msgpack.Ext(CUSTOM_TYPE_CLOUDPICKLE, cloudpickle.dumps(obj))
return msgpack.Ext(CUSTOM_TYPE_PICKLE, pickle.dumps(obj))
def custom_ext_hook(code: int, data: memoryview) -> Any:
if code == CUSTOM_TYPE_TENSOR:
return torch.from_numpy(pickle.loads(data))
if code == CUSTOM_TYPE_PICKLE:
return pickle.loads(data)
if code == CUSTOM_TYPE_CLOUDPICKLE:
return cloudpickle.loads(data)
raise NotImplementedError(f"Extension type code {code} is not supported") |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
I have retried the test script again in my local machine.
After modifying based on your suggestion, all tests are passed. PTAL. Thanks! @njhill |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Hi, @njhill I see that it got some CI failed in some kernel and TPU tests, which I don't think relevant to this PR. Are these failures required to be addressed before merging? |
@wwl2755 could you merge in the latest main branch one more time? It should address those failures. Thanks! |
Updated. |
Collective RPC is supported now in vLLM (nightly), so we no longer need to always set `VLLM_ENABLE_V1_MULTIPROCESSING=0` when we use vLLM V1. vllm-project/vllm#15444 But we still need to set that when we enable the full_determinism mode, as vLLM does not guarantee the reproducibility of the results by default, for the sake of performance. We need to turn off multiprocessing to make the scheduling deterministic. This is not needed and will be ignored for V0. Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
FIX: #15430 and #15349.
Support
LLM
to directly usecollective_rpc()
mentioned in [Feature]: [V1] Collective RPC #15430 and [Bug]: LLM.collective_rpc is broken in v1 by default #15349.Use cloudpickle to make sure the local functions can still be serialized