To better prepare the hundreds of thousands of businesses that use VMware’s cloud infrastructure for the age of generative AI, VMware Inc. (NYSE: VMW) and NVIDIA (NASDAQ: NVDA) today announced the extension of their strategic relationship.
With the help of NVIDIA and VMware’s Private AI Foundation, businesses will be able to deploy generative AI applications like chatbots, virtual assistants, search engines, and summarization tools. The platform will be a turnkey AI-optimized solution based on VMware Cloud Foundation and including NVIDIA accelerated compute and generative AI software.
VMware CEO Raghu Raghuram recently said, “Generative AI and multi-cloud are the perfect match.” “Client data resides in the cloud, on the edge, and in the customer’s own data centers. Working with NVIDIA, we’ll alleviate companies’ worries about data privacy, security, and control so that they may safely operate generative AI workloads in close proximity to their data.
According to NVIDIA CEO and founder Jensen Huang: “Enterprises everywhere are racing to integrate generative AI into their businesses.” Using their own data and bespoke applications, hundreds of thousands of clients in industries like as finance, healthcare, manufacturing, and more will have access to the whole stack of software and computing that our company and VMware are now collaborating on.
Full-Stack Computing to Supercharge Generative AI
Companies want to speed up the time it takes to reap the advantages of generative AI applications by simplifying the process of creating, testing, and releasing them. McKinsey predicts that generative AI might boost global GDP by as much as $4.4 trillion per year.(1)
Businesses can take advantage of this capability with VMware Private AI Foundation and NVIDIA by tailoring large language models, creating more secure and private models for internal use, providing generative AI as a service to their users, and safely running large-scale inference workloads.
With the platform’s built-in AI technologies, businesses will be able to efficiently use validated models based on their own data. The platform will be based on NVIDIA AI Enterprise and VMware Cloud Foundation, and its intended advantages would include:
With an architecture that protects sensitive information and allows only authorized users access, businesses will be able to deploy AI services close to their data without worrying about compromising security.
From NVIDIA NeMoTM through Llama 2 and beyond, as well as on major OEM hardware configurations and, in the future, on public cloud and service provider solutions, businesses will have a broad variety of options for building and running their models.
NVIDIA-accelerated infrastructure has been shown in recent industry benchmarks to achieve performance levels comparable to, and in some instances even higher than, bare metal.
In order to expedite the fine-tuning and deployment of generative AI models, GPU scaling improvements in virtualized settings will allow AI workloads to scale over up to 16 vGPUs/GPUs in a single virtual machine and across several nodes.
In order to save expenses, we will pool all of our computing resources (CPUs, GPUs, and DPUs) and use them as effectively as possible to benefit all of our teams.
In order to facilitate direct I/O transfer from storage to GPUs without involving the CPU, VMware vSAN Express Storage Architecture will provide performance-optimized NVMe storage and enable GPUDirect® storage via RDMA.
To further facilitate the execution of multi-GPU models without inter-GPU bottlenecks, vSphere has been deeply integrated with NVIDIA’s NVSwitchTM technology.
Time to Value and Rapid Deployment are Made Possible by the vSphere Deep Learning VM Images and Image Repository, which provide a solid turnkey solution image with pre-installed frameworks and performance-optimized libraries.
Included in NVIDIA AI Enterprise, the platform’s operating system, is the NVIDIA NeMo cloud-native framework, which facilitates the creation, personalization, and remote deployment of generative AI models by businesses of any size. NeMo provides businesses with a simple, inexpensive, and quick solution to implement generative AI by combining customisation frameworks, guardrail toolkits, data curation tools, and pretrained models.
TensorRT for Large Language Models (TRT-LLM) speeds and improves inference performance on the most recent LLMs on NVIDIA GPUs, therefore NeMo relies on it when implementing generative AI in production. Businesses will be able to use their own data in conjunction with NeMo and VMware’s hybrid cloud architecture to create and operate their own generative AI models.
At VMware Explore 2023, NVIDIA and VMware will showcase how corporate developers can leverage the NVIDIA AI Workbench to remotely configure and deploy production-grade generative AI in VMware systems using community models like Llama 2 from Hugging Face. The VMware Private AI Foundation has widespread ecosystem support. Dell Technologies, HPE, and Lenovo will be among the first to offer systems that supercharge enterprise LLM customization and inference workloads with NVIDIA L40S GPUs, NVIDIA BlueField®-3 DPUs, and NVIDIA ConnectX®-7 SmartNICs, all of which are supported by the NVIDIA VMware Private AI Foundation with NVIDIA.
When compared to the NVIDIA A100 Tensor Core GPU, the NVIDIA L40S GPU delivers up to 1.2x faster generative AI inference performance and up to 1.7x faster training performance.
Accelerating, offloading, and isolating the massive computing load of cloud-native AI services like virtualization, networking, storage, and security is made possible by NVIDIA BlueField-3 DPUs.
Some of the most demanding AI workloads in the world will benefit greatly from the intelligent, accelerated networking provided by NVIDIA ConnectX-7 SmartNICs in the data center.
VMware and NVIDIA’s 10-year cooperation continues with the launch of the VMware Private AI Foundation. Through their collaborative engineering efforts, VMware’s cloud infrastructure is now capable of running NVIDIA AI Enterprise at bare-metal levels of performance. In addition, VMware Cloud Foundation’s resource management and infrastructure flexibility are a win-win for both companies’ respective clients.