On 2nd Apr of 2025, MLCommons announced the release of the MLPerf Inference v5.0 benchmark suite, showcasing significant advancements in machine learning system performance. This latest version highlights the growing focus on generative AI scenarios, driven by recent hardware and software innovations that have dramatically improved performance over the past year.

The MLPerf Inference v5.0 results reveal a substantial increase in submissions for generative AI benchmarks, particularly the Llama 2 70B test. This benchmark, which implements a large generative AI inference workload, has seen a 2.5x increase in submissions compared to the previous year. The performance results for Llama 2 70B have also shown remarkable improvement, with the median score doubling and the best score achieving a 3.3x increase over Inference v4.0.
David Kanter, head of MLPerf at MLCommons, commented, “It’s clear now that much of the ecosystem is focused squarely on deploying generative AI, and that the performance benchmarking feedback loop is working. We’re seeing an unprecedented flood of new generations of accelerators paired with innovative software techniques, setting new records for generative AI inference performance.”
MLPerf Inference v5.0 introduces four new benchmarks, reflecting the rapid advancements in the AI community:
Llama 3.1 405B: A new benchmark utilizing a model with 405 billion parameters, supporting input and output lengths up to 128,000 tokens. This benchmark tests general question-answering, math, and code generation tasks.
Llama 2 70B Interactive: An extension of the Llama 2 70B benchmark with low-latency requirements, designed for interactive chatbots and next-generation reasoning systems.
RGAT: A new datacenter benchmark implementing a graph neural network (GNN) model for applications such as recommendation systems and fraud detection.
Automotive PointPainting: An edge benchmark for 3D object detection in camera feeds, relevant to self-driving cars and other automotive applications.
This round of MLPerf Inference results includes 17,457 performance results from 23 submitting organizations, including AMD, Intel, Google, NVIDIA, and more. The benchmark suite continues to provide critical technical information for customers procuring and tuning AI systems, driving innovation, performance, and energy efficiency across the industry.
David Kanter added, “The continuing growth in the community of submitters is a testament to the importance of accurate and trustworthy performance metrics to the AI community. We are increasing the scale of AI models being trained and deployed, achieving new levels of interactive responsiveness, and deploying more energy-efficient systems.”
For more information on the MLPerf Inference v5.0 benchmark results, visit https://mlcommons.org/






