Performance of triton+trtllm on llava-onevision compared to vllm and sglang #689

alexemme · 2025-02-03T15:01:20Z

Hello everyone,
I have been experimenting with the llava-onevision model.
The conversion of the model to trt-llm and serving it with Triton Inference Server works well,
but I have the impression that I haven’t optimized the configuration of the individual models properly.

I performed quick benchmarks of the same model on vLLM and sglang (the deployer recommended in the onevision documentation).
The results show that, with zero concurrency, Triton is faster.
However, as concurrency increases, Triton performs better than vLLM, but significantly worse than sglang.

I tried various parameter combinations, especially with batch_size and inflight-batching,
which helped improve inference times, but not enough to match the performance of SGLANG.

Looking at the /metrics endpoint, everything seems fine.
The only strange thing is that the cumulative queue time for preprocessing and multimodal_encoders is quite high,
but not high enough to fully explain the large performance difference.
For example, to complete 300 total requests with a concurrency of 30,
Triton takes around 65 seconds, whereas SGLANG takes about 50 seconds.

My question is:
Since TRT-LLM as an inference backend appears to be faster than SGLANG's inference engine
(this can be observed from the lower e2e latency on single inference requests),
can I always expect better performance with high concurrency by finding the right combination of parameters,
or could SGLANG have additional optimizations that inherently prevent achieving the same performance with Triton Inference Server?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of triton+trtllm on llava-onevision compared to vllm and sglang #689

Performance of triton+trtllm on llava-onevision compared to vllm and sglang #689

alexemme commented Feb 3, 2025

Performance of triton+trtllm on llava-onevision compared to vllm and sglang #689

Performance of triton+trtllm on llava-onevision compared to vllm and sglang #689

Comments

alexemme commented Feb 3, 2025