TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714

dduck999 · 2025-02-27T22:58:29Z

I followed this guide: https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#quick-start

I got this working but this only serves one model. Not only that, it breaks my understanding of the one-model equals one-folder structure of the Triton model repository. Inflight batching of a model consists of 4-5 model folders, then how would I add a another model which consists of another 4-5 model folders, and how would they be named?

Is inflight batching only an option (but every example for Trion + TRT-LLM backend is always inflight batching!) Is what I wanted to do simply be accomplished by copying the engine files (rank0.engine, rank1.engine etc.) into the model folders?

I've been stuck on this for a week now. Any hints/help/comments would be grateful!

adityarajsahu · 2025-03-12T05:38:05Z

Hi @dduck999, I am trying to deploy Qwen2.5-0.5B-Instruct model on Triton using tensorrtllm_backend, I have converted the model to TensorRT-LLM engine, but after that I am unable to deploy the engine on Triton. Can you please tell me the steps to deploy the engine on Triton along with the docker image and package versions used?

dduck999 · 2025-03-12T07:42:40Z

Hi @adityarajsahu , have you built the folder structure along with the template files as in the example? Then have you run the script to write into these files?

adityarajsahu · 2025-03-12T09:11:28Z

Hey @dduck999, just found the issue, thanks.

dduck999 · 2025-03-12T09:52:57Z

Hey @dduck999, just found the issue, thanks.

@adityarajsahu great! If you ever find a way to host more than one model behind one Triton TRT-LLM endpoint, please share 😊

jadhosn · 2025-03-24T02:53:44Z

If you ever find a way to host more than one model behind one Triton TRT-LLM endpoint, please share 😊

Have you tried custom Python BLS or an ensemble? You can host N-many models behind a single ensemble endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714

TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714

dduck999 commented Feb 27, 2025 •

edited

Loading

adityarajsahu commented Mar 12, 2025

dduck999 commented Mar 12, 2025

adityarajsahu commented Mar 12, 2025

dduck999 commented Mar 12, 2025 •

edited

Loading

jadhosn commented Mar 24, 2025

TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714

TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714

Comments

dduck999 commented Feb 27, 2025 • edited Loading

adityarajsahu commented Mar 12, 2025

dduck999 commented Mar 12, 2025

adityarajsahu commented Mar 12, 2025

dduck999 commented Mar 12, 2025 • edited Loading

jadhosn commented Mar 24, 2025

dduck999 commented Feb 27, 2025 •

edited

Loading

dduck999 commented Mar 12, 2025 •

edited

Loading