Following the introduction of Copilot, its latest smart assistant for Windows 11, Microsoft is once again pushing the integration of generative AI with Windows. At the ongoing Ignite 2023 developer conference in Seattle, the company announced a partnership with Nvidia on TensorRT-LLM, which promises to improve the user experience on Windows desktops and laptops with RTX GPUs.
The new release is set to introduce support for new large language models, making demanding AI workloads more accessible. Particularly notable is its compatibility with OpenAI’s Chat API, enabling local execution (rather than the cloud) on PCs and workstations with RTX GPUs starting with 8GB of VRAM.
Nvidia’s TensorRT-LLM library was released just last month and is said to help improve the performance of large language models (LLM) using Tensor Cores on RTX graphics cards. It provides Python APIs for developers to define LLMs and build fast TensorRT engines without deep knowledge of C++ or CUDA.
With the release of TensorRT-LLM v0.6.0, navigating the complexities of custom generator AI projects will become simpler due to the introduction of AI Workbench. It is an integrated toolkit that facilitates quick creation, testing, and optimization of pre-trained generative AI models and LLMs. The platform is also expected to enable developers to streamline collaboration and deployment while ensuring efficient and scalable model development.
Recognizing the importance of supporting AI developers, Nvidia and Microsoft are also releasing DirectML enhancements. These optimizations accelerate foundational AI models like Llama 2 and Stable Diffusion, giving developers increased options for cross-vendor deployment and setting new standards for performance.
The new TensorRT-LLM library update also promises substantial improvements in inference performance, with up to five times faster speed. This update extends support for additional popular LLMs, including Mistral 7B and Nemotron-3 8B, and extends the capabilities of fast and accurate local LLMs to a wider range of portable Windows devices.
The integration of TensorRT-LLM for Windows with OpenAI’s chat API through a new wrapper will allow hundreds of AI-powered projects and applications to run natively on RTX-equipped PCs. This will potentially eliminate the need to rely on cloud services and ensure the security of private and proprietary data on Windows 11 PCs.
The future of AI on Windows 11 PCs still has a long way to go. With AI models becoming increasingly available and developers continuing to innovate, harnessing the power of Nvidia’s RTX GPUs could be a game-changer. However, it’s too early to say whether this will be the final piece of the puzzle that Microsoft so desperately needs to fully unlock the capabilities of AI on Windows PCs.