TensorRT-LLM provides 8x higher performance for AI inferencing on NVIDIA hardware.
As companies like d-Matrix squeeze into the lucrative artificial intelligence market with coveted inferencing infrastructure, AI leader NVIDIA today announced TensorRT-LLM software, a library of LLM inference tech designed to speed up AI inference processing.
Jump to:
What is TensorRT-LLM?
TensorRT-LLM is an open-source library that runs on NVIDIA Tensor Core GPUs. It is designed to give developers a space to experiment with building new large language models, the bedrock of generative AI like ChatGPT.
In particular, TensorRT-LLM covers inference — a refinement of an AI’s training or the way the system learns how to connect concepts and make predictions — and defining, optimizing and executing LLMs. TensorRT-LLM aims…