Microsoft Azure delivers game-changing efficiency for generative AI Inference


Microsoft Azure has delivered industry-leading outcomes for AI inference workloads amongst cloud service suppliers in the latest MLPerf Inference outcomes revealed publicly by MLCommons. The Azure outcomes had been achieved utilizing the brand new NC H100 v5 collection digital machines (VMs) powered by NVIDIA H100 NVL Tensor Core GPUs and bolstered the dedication from Azure to designing AI infrastructure that’s optimized for coaching and inferencing within the cloud.

The evolution of generative AI fashions

Fashions for generative AI are quickly increasing in dimension and complexity, reflecting a prevailing development within the {industry} towards ever-larger architectures. Business-standard benchmarks and cloud-native workloads constantly push the boundaries, with fashions now reaching billions and even trillions of parameters. A chief instance of this development is the current unveiling of Llama2, which boasts a staggering 70 billion parameters, marking it as MLPerf’s most important take a look at of generative AI up to now (determine 1). This monumental leap in mannequin dimension is obvious when evaluating it to earlier {industry} requirements such because the Giant Language Mannequin GPT-J, which pales as compared with 10x fewer parameters. Such exponential progress underscores the evolving calls for and ambitions throughout the AI {industry}, as clients try to sort out more and more advanced duties and generate extra refined outputs.

Tailor-made particularly to deal with the dense or generative inferencing wants that fashions like Llama 2 require, the Azure NC H100 v5 VMs marks a big leap ahead in efficiency for generative AI purposes. Its purpose-driven design ensures optimized efficiency, making it a really perfect selection for organizations searching for to harness the facility of AI with reliability and effectivity. With the NC H100 v5-series, clients can count on enhanced capabilities with these new requirements for his or her AI infrastructure, empowering them to sort out advanced duties with ease and effectivity. 

Graph highlighting that the size of the models in the MLPerf Benchmarking suite is increasing, up to 70 billion parameters.
Determine 1: Evolution of the dimensions of the fashions within the MLPerf Inference benchmarking suite. 

Nevertheless, the transition to bigger mannequin sizes necessitates a shift towards a unique class of {hardware} that’s able to accommodating the big fashions on fewer GPUs. This paradigm shift presents a novel alternative for high-end programs, highlighting the capabilities of superior options just like the NC H100 v5 collection. Because the {industry} continues to embrace the period of mega-models, the NC H100 v5 collection stands prepared to fulfill the challenges of tomorrow’s AI workloads, providing unparalleled efficiency and scalability within the face of ever-expanding mannequin sizes.

a person sitting at a table using a laptop

Azure AI infrastucture

World-class infrastructure efficiency for AI workloads

Enhanced efficiency with purpose-built AI infrastructure

The NC H100 v5-series shines with purpose-built infrastructure, that includes a superior {hardware} configuration that yields exceptional efficiency beneficial properties in comparison with its predecessors. Every GPU inside this collection is provided with 94GB of HBM3 reminiscence. This substantial enhance in reminiscence capability and bandwidth interprets in a 17.5% increase in reminiscence dimension and a 64% increase in reminiscence bandwidth over the earlier generations. . Powered by NVIDIA H100 NVL PCIe GPUs and 4th-generation AMD EPYC™ Genoa processors, these digital machines function as much as 2 GPUs, alongside as much as 96 non-multithreaded AMD EPYC Genoa processor cores and 640 GiB of system reminiscence.

In immediately’s announcement from MLCommons, the NC H100 v5 collection premiered efficiency leads to the MLPerf Inference v4.0 benchmark suite. Noteworthy amongst these achievements is a 46% efficiency achieve over competing merchandise geared up with GPUs of 80GB of reminiscence (determine 2), solely primarily based on the spectacular 17.5% enhance in reminiscence dimension (94 GB) of the NC H100 v5-series. This leap in efficiency is attributed to the collection’ capability to suit the big fashions into fewer GPUs effectively. For smaller fashions like GPT-J with 6 billion parameters, there’s a notable 1.6x speedup from the earlier era (NC A100 v4) to the brand new NC H100 v5. This enhancement is especially advantageous for patrons with dense Inferencing jobs, because it allows them to run a number of duties in parallel with better pace and effectivity whereas using fewer assets.

chart, bar chart, waterfall chart
Determine 2: Azure outcomes on the mannequin Llama2 (70 billion parameters) from MLPerf Inference v4.0 in March 2024 (4.0-0004) and (4.0-0068). 

Efficiency delivering a aggressive edge

The rise in efficiency is vital not simply in comparison with earlier generations of comparable infrastructure options Within the MLPerf benchmarks outcomes, Azure’s NC H100 v5 collection digital machines outcomes are standout in comparison with different cloud computing submissions made. Notably, when in comparison with cloud choices with smaller reminiscence capacities per accelerator, akin to these with 16GB reminiscence per accelerator, the NC H100 v5 collection VMs exhibit a considerable efficiency increase. With practically six instances the reminiscence per accelerator, Azure’s purpose-built AI infrastructure collection demonstrates a efficiency speedup of 8.6x to 11.6x (determine 3). This represents a efficiency enhance of fifty% to 100% for each byte of GPU reminiscence, showcasing the unparalleled capability of the NC H100 v5 collection. These outcomes underscore the collection’ capability to steer the efficiency requirements in cloud computing, providing organizations a strong resolution to deal with their evolving computational necessities.

Figure 3: The throughput of the Azure NC H100 v5 virtual machine is up to 11.6 times higher that its equivalents with 16GB of memory per GPU.
Determine 3: Efficiency outcomes on the mannequin GPT-J (6 billion parameters) from MLPerf Inference v4.0 in March 2024 on Azure NC H100 v5 (4.0-0004) and an providing with 16GB of reminiscence per accelerator (4.0-0045) – with one accelerator every.

In conclusion, the launch of the NC H100 v5 collection marks a big milestone in Azure’s relentless pursuit of innovation in cloud computing. With its excellent efficiency, superior {hardware} capabilities, and seamless integration with Azure’s ecosystem, the NC H100 v5 collection is revolutionizing the panorama of AI infrastructure, enabling organizations to completely leverage the potential of generative AI Inference workloads. The newest MLPerf Inference v4.0 outcomes underscore the NC H100 v5 collection’ unparalleled capability to excel in probably the most demanding AI workloads, setting a brand new customary for efficiency within the {industry}. With its distinctive efficiency metrics and enhanced effectivity, the NC H100 v5 collection reaffirms its place as a frontrunner within the realm of AI infrastructure, empowering organizations to unlock new potentialities and obtain better success of their AI initiatives. Moreover, Microsoft’s dedication, as introduced through the NVIDIA GPU Expertise Convention (GTC), to proceed innovating by introducing much more highly effective GPUs to the cloud, such because the NVIDIA  Grace Blackwell GB200 Tensor Core GPUs, additional enhances the prospects for advancing AI capabilities and driving transformative change within the cloud computing panorama.

Study extra about Azure generative AI



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox