NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library

TensorRT-LLM provides 8x higher performance for AI inferencing on NVIDIA hardware.

An illustration of LLM inferencing.
An illustration of LLM inferencing. Image credit: NVIDIA

As companies like d-Matrix squeeze into the lucrative artificial intelligence market with coveted inferencing infrastructure, AI leader NVIDIA today announced TensorRT-LLM software, a library of LLM inference tech designed to speed up AI inference processing.

Jump to:

What is TensorRT-LLM?

TensorRT-LLM is an open-source library that runs on NVIDIA Tensor Core GPUs. It is designed to give developers a space to experiment with building new large language models, the bedrock of generative AI like ChatGPT.

In particular, TensorRT-LLM covers inference — a refinement of an AI’s training or the way the system learns how to connect concepts and make predictions — and defining, optimizing and executing LLMs. TensorRT-LLM aims…



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.