Skip to content

[VLM] Florence-2 supports online serving #16164

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2025

Conversation

Isotr0py
Copy link
Collaborator

@Isotr0py Isotr0py commented Apr 7, 2025

Example command to launch the server:

vllm serve microsoft/Florence-2-large --tokenizer facebook/bart-large --trust-remote-code --chat-template examples/template_florence2.jinja

Inference:

chat_response = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url,
                    },
                },
                {"type": "text", "text": "<DETAILED_CAPTION>"},
            ],
        }
    ],
)

FIX #15968

Isotr0py added 2 commits April 6, 2025 17:28
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Copy link

github-actions bot commented Apr 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Isotr0py Isotr0py requested a review from DarkLight1337 April 7, 2025 07:18
@mergify mergify bot added documentation Improvements or additions to documentation frontend labels Apr 7, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can successfully run the model using the chat template and task token (which I have added to the PR description), thanks for working on this!

@DarkLight1337
Copy link
Member

However it seems that test_common.py is now failing...

@DarkLight1337
Copy link
Member

I suggest updating the chat template to handle the task token externally.

Signed-off-by: Isotr0py <2037008807@qq.com>
@Isotr0py
Copy link
Collaborator Author

Isotr0py commented Apr 7, 2025

However it seems that test_common.py is now failing...

Oh, it's just because tokenizer.encode missing add_special_tokens=False. Common tests should pass now.

@DarkLight1337
Copy link
Member

Can confirm, this should be good to go then

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 7, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 7, 2025 08:41
@vllm-bot vllm-bot merged commit 7c80368 into vllm-project:main Apr 7, 2025
44 of 49 checks passed
@Isotr0py Isotr0py deleted the florence-2-online branch April 7, 2025 13:37
lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Isotr0py <2037008807@qq.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
Signed-off-by: Isotr0py <2037008807@qq.com>
@PedroMiolaSilva
Copy link

@Isotr0py nice, thanks a lot!

Just a quick question, I've being trying to use the tasks from Florence that are suppose to return the position from the objects (OD, OCR_WITH_REGION, etc) and I'm getting an empty response, or just the object names but no box positions. Description tasks work fine.

This is how I'm using it:

docker run \ --runtime nvidia \ -e VLLM_USE_V1=0 \ --gpus 0 \ --ipc=host \ -p "8000:8000" \ --env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \ -v "${HF_HOME}:/root/.cache/huggingface" \ -v "$(pwd):/app" \ vllm/vllm-openai:latest \ --tensor-parallel-size 1 \ --model microsoft/Florence-2-base\ --tokenizer facebook/bart-large \ --gpu-memory-utilization 0.95 \ --trust-remote-code \ --chat-template /app/template_florence2.jinja \ --max-model-len 1024 \ --max-num-seqs 8 \ --dtype float16

With the following cURL to test it:

curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "microsoft/Florence-2-base", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://media.istockphoto.com/id/1152862811/pt/foto/bavarian-dance.jpg?s=1024x1024&w=is&k=20&c=ie6_xDrdmnkDd4Udqn8n2kP_xeRpjkGdTduvO3J4KT4=" } }, {"type":"text","text":"<OCR_WITH_REGION>"} ] } ] }'

An the response:
{"id":"chatcmpl-a9740a45bc1441dab29ad22fe4c1a791","object":"chat.completion","created":1744725371,"model":"microsoft/Florence-2-base","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"iStockCredit: Foo TooEditorial use only1152862811","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":590,"total_tokens":637,"completion_tokens":47,"prompt_tokens_details":null},"prompt_logprobs":null}

Is there anything I'm missing or doing it wrong?

@Isotr0py
Copy link
Collaborator Author

@PedroMiolaSilva Can you try to use this tokenizer(Isotr0py/Florence-2-tokenizer) and add skip_special_tokens=False to extra_body as sampling parameters?

@PedroMiolaSilva
Copy link

@Isotr0py it worked, thanks a lot!

For this model, shouldn't the default value for skip_special_tokens be set to True?

@Isotr0py
Copy link
Collaborator Author

For this model, shouldn't the default value for skip_special_tokens be set to True?

No, because Florence-2 uses special tokens like <loc_{x}>; x=1, ,2, ..., 999 to present locations, if we set skip_special_tokens=True, these special tokens will be skipped in tokenizer decoding. So you will get just the object names without positions locations.

@PedroMiolaSilva
Copy link

I'm sorry, I mean set to False, because by default it is set to true right?

@Isotr0py
Copy link
Collaborator Author

because by default it is set to true right?

Yes. It's set to True by default in SamplingParams.

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Crashing server running Florence-2 when trying to call as multi modal
4 participants