MLPerf 3.1 adds large language model benchmarks for inference

September 11, 2023 1:01 PM

Head over to our on-demand library to view classes from VB Transform 2023. Register Here

MLCommons is rising its suite of MLPerf AI benchmarks with the addition of testing for large language fashions (LLMs) for inference and a brand new benchmark that measures efficiency of storage techniques for machine studying (ML) workloads.

MLCommons is a vendor impartial, multi-stakeholder group that goals to supply a stage enjoying subject for distributors to report on completely different points of AI efficiency with the MLPerf set of benchmarks. The new MLPerf Inference 3.1 benchmarks launched at this time are the second main replace of the outcomes this 12 months, following the three.0 outcomes that got here out in April. The MLPerf 3.1 benchmarks embrace a large set of knowledge with greater than 13,500 efficiency outcomes.

Submitters embrace: ASUSTeK, Azure, cTuning, Connect Tech, Dell, Fujitsu, Giga Computing, Google, H3C, HPE, IEI, Intel, Intel-Habana-Labs, Krai, Lenovo, Moffett, Neural Magic, Nvidia, Nutanix, Oracle, Qualcomm, Quanta Cloud Technology, SiMA, Supermicro, TTA and xFusion.

Continued efficiency enchancment

A standard theme throughout MLPerf benchmarks with every replace is the continued enchancment in efficiency for distributors — and the MLPerf 3.1 Inference outcomes comply with that sample. While there are a number of sorts of testing and configurations for the inference benchmarks, MLCommons founder and govt director David Kanter stated in a press briefing that many submitters improved their efficiency by 20% or extra over the three.0 benchmark.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to entry the on-demand library for all of our featured classes.

Beyond continued efficiency beneficial properties, MLPerf is constant to increase with the 3.1 inference benchmarks.

“We’re evolving the benchmark suite to reflect what’s going on,” he stated. “Our LLM benchmark is brand new this quarter and really reflects the explosion of generative AI large language models.”

What the brand new MLPerf Inference 3.1 LLM benchmarks are all about

This isn’t the primary time MLCommons has tried to benchmark LLM efficiency.

Back in June, the MLPerf 3.0 Training benchmarks added LLMs for the primary time. Training LLMs, nonetheless, is a really completely different activity than working inference operations.

“One of the critical differences is that for inference, the LLM is fundamentally performing a generative task as it’s writing multiple sentences,” Kanter stated.

The MLPerf Training benchmark for LLM makes use of the GPT-J 6B (billion) parameter model to carry out textual content summarization on the CNN/Daily Mail dataset. Kanter emphasised that whereas the MLPerf coaching benchmark focuses on very large basis fashions, the precise activity MLPerf is performing with the inference benchmark is consultant of a wider set of use circumstances that extra organizations can deploy.

“Many folks simply don’t have the compute or the data to support a really large model,” stated Kanter. “The actual task we’re performing with our inference benchmark is text summarization.”

Inference isn’t nearly GPUs — no less than in line with Intel

While high-end GPU accelerators are sometimes on the high of the MLPerf itemizing for coaching and inference, the massive numbers should not what all organizations are wanting for — no less than in line with Intel.

Intel silicon is effectively represented on the MLPerf Inference 3.1 with outcomes submitted for Habana Gaudi accelerators, 4th Gen Intel Xeon Scalable processors and Intel Xeon CPU Max Series processors. According to Intel, the 4th Gen Intel Xeon Scalable carried out effectively on the GPT-J information summarization activity, summarizing one paragraph per second in real-time server mode.

In response to a query from VentureBeat throughout the Q&A portion of the MLCommons press briefing, Intel’s senior director of AI merchandise Jordan Plawner commented that there’s variety in what organizations want for inference.

“At the end of the day, enterprises, businesses and organizations need to deploy AI in production and that clearly needs to be done in all kinds of compute,” stated Plawner. “To have so many representatives of both software and hardware showing that it [inference] can be run in all kinds of compute is really a leading indicator of where the market goes next, which is now scaling out AI models, not just building them.”

Nvidia claims Grace Hopper MLPef Inference beneficial properties, with extra to come back

While Intel is eager to point out how CPUs are priceless for inference, GPUs from Nvidia are effectively represented within the MLPerf Inference 3.1 benchmarks.

The MLPerf Inference 3.1 benchmarks are the primary time Nvidia’s GH200 Grace Hopper Superchip was included. The Grace Hopper superchip pairs an Nvidia CPU, together with a GPU to optimize AI workloads.

“Grace Hopper made a very strong first showing delivering up to 17% more performance versus our H100 GPU submissions, which we’re already delivering across the board leadership,” Dave Salvator, director of AI at Nvidia, stated throughout a press briefing.

The Grace Hopper is meant for the most important and most demanding workloads, however that’s not all that Nvidia goes after. The Nvidia L4 GPUs have been additionally highlighted by Salvator for their MLPerf Inference 3.1 outcomes.

“L4 also had a very strong showing up to 6x more performance versus the best x86 CPUs submitted this round,” he stated.

VentureBeat’s mission is to be a digital city sq. for technical decision-makers to achieve data about transformative enterprise know-how and transact. Discover our Briefings.

…. to be continued
Read the Original Article
Copyright for syndicated content material belongs to the linked Source : VentureBeat – https://venturebeat.com/ai/mlperf-3-1-adds-large-language-model-benchmarks-for-inference/

Tags:large MLPerf

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

MLPerf 3.1 adds large language model benchmarks for inference

Continued efficiency enchancment

Event

What the brand new MLPerf Inference 3.1 LLM benchmarks are all about

Inference isn’t nearly GPUs — no less than in line with Intel

Nvidia claims Grace Hopper MLPef Inference beneficial properties, with extra to come back

Pahdo Labs raises $15M for anime-inspired game world and UGC platform

How data and automation are unlocking the future of subscription businesses

RelatedPosts

Recommended.

Tags

Categories

Archives