-
Notifications
You must be signed in to change notification settings - Fork 121
TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model) #714
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
Comments
Hi @dduck999, I am trying to deploy Qwen2.5-0.5B-Instruct model on Triton using tensorrtllm_backend, I have converted the model to TensorRT-LLM engine, but after that I am unable to deploy the engine on Triton. Can you please tell me the steps to deploy the engine on Triton along with the docker image and package versions used? |
Hi @adityarajsahu , have you built the folder structure along with the template files as in the example? Then have you run the script to write into these files? |
Hey @dduck999, just found the issue, thanks. |
@adityarajsahu great! If you ever find a way to host more than one model behind one Triton TRT-LLM endpoint, please share 😊 |
Have you tried custom Python BLS or an ensemble? You can host N-many models behind a single ensemble endpoint. |
I followed this guide: https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#quick-start
I got this working but this only serves one model. Not only that, it breaks my understanding of the one-model equals one-folder structure of the Triton model repository. Inflight batching of a model consists of 4-5 model folders, then how would I add a another model which consists of another 4-5 model folders, and how would they be named?
Is inflight batching only an option (but every example for Trion + TRT-LLM backend is always inflight batching!) Is what I wanted to do simply be accomplished by copying the engine files (rank0.engine, rank1.engine etc.) into the model folders?
I've been stuck on this for a week now. Any hints/help/comments would be grateful!
The text was updated successfully, but these errors were encountered: