[MISC] Dump model runner inputs when crashing #8305

comaniac · 2024-09-09T19:30:47Z

To better reproduce the model runner crashing due to illegal memory access and possibly other errors, this PR introduces a utility that dumps model runner inputs when crashing. Since the model runner inputs may be long, I dumped them using pickle for now. Any suggestions or better ideas are welcome.

cc @robertgshaw2-neuralmagic @simon-mo @DarkLight1337

github-actions · 2024-09-09T19:30:58Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/worker/model_runner_base.py

DarkLight1337 · 2024-09-10T01:57:16Z

Can we add instructions in the GitHub issues template so users can share their logs upon encountering such errors?

comaniac · 2024-09-10T04:11:51Z

Can we add instructions in the GitHub issues template so users can share their logs upon encountering such errors?

Good point. Will do

youkaichao · 2024-09-10T07:05:39Z

do we need to add a flag for this? looks like some debugging feature that can also be added in https://docs.vllm.ai/en/latest/getting_started/debugging.html

robertgshaw2-redhat · 2024-09-10T07:07:18Z

do we need to add a flag for this? looks like some debugging feature that can also be added in https://docs.vllm.ai/en/latest/getting_started/debugging.html

The goal is to be able to get logs from production usage to help track down hard to replicate bugs (like illegal mem access in prefix caching). So having a flag defeats the purpose

youkaichao · 2024-09-10T07:10:28Z

makes sense then. please ignore my comment.

comaniac · 2024-09-10T18:20:19Z

@DarkLight1337 added to issue template. PTAL.

DarkLight1337

LGTM. We should be careful when loading untrusted pickle files though.

comaniac · 2024-09-11T01:09:07Z

It should be fine as we never load it automatically? But yeah you may get virus if someone post a malicious pickle file to an issue...

vllm-project#8305 was recently added to dump model running inputs when encountering a fatal error. If this happens during decode however it will include the kvcache tensors which are typically huge (~60GB in the case I was testing), and can therefore take minutes to write to disk. When this happens the engine loop is blocked and health checks time-out causing the server to be killed. This change replaces kvcache tensors with their dtype + shape. With this the pickling is sub-second and the filesize in my test case was 7KB.

Signed-off-by: Alvant <alvasian@yandex.ru>

Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Dump inputs when crash

d364a93

simon-mo reviewed Sep 9, 2024

View reviewed changes

vllm/worker/model_runner_base.py Outdated Show resolved Hide resolved

comments

e70f66d

doc

e974d89

DarkLight1337 approved these changes Sep 11, 2024

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 11, 2024

comaniac enabled auto-merge (squash) September 11, 2024 15:59

comaniac merged commit a65cb16 into vllm-project:main Sep 12, 2024
67 of 68 checks passed

comaniac deleted the dump_inputs branch September 12, 2024 16:11

njhill mentioned this pull request Sep 16, 2024

[Misc] Don't dump contents of kvcache tensors on errors #8527

Merged

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[MISC] Dump model runner inputs when crashing (vllm-project#8305)

088630a

Signed-off-by: Alvant <alvasian@yandex.ru>

comaniac mentioned this pull request Jan 30, 2025

[MISC] Remove model input dumping when exception #12582

Merged

wallashss mentioned this pull request Feb 17, 2025

[Core][Feature] Input metadata dump on crash #13407

Open

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[MISC] Dump model runner inputs when crashing (vllm-project#8305)

be636dc

Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Dump model runner inputs when crashing #8305

[MISC] Dump model runner inputs when crashing #8305

comaniac commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

DarkLight1337 commented Sep 10, 2024

comaniac commented Sep 10, 2024

youkaichao commented Sep 10, 2024

robertgshaw2-redhat commented Sep 10, 2024

youkaichao commented Sep 10, 2024

comaniac commented Sep 10, 2024

DarkLight1337 left a comment

comaniac commented Sep 11, 2024

[MISC] Dump model runner inputs when crashing #8305

[MISC] Dump model runner inputs when crashing #8305

Conversation

comaniac commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

DarkLight1337 commented Sep 10, 2024

comaniac commented Sep 10, 2024

youkaichao commented Sep 10, 2024

robertgshaw2-redhat commented Sep 10, 2024

youkaichao commented Sep 10, 2024

comaniac commented Sep 10, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

comaniac commented Sep 11, 2024