Skip to content

ML.ENERGY Research and Tech Blog

LLM Inference Energy: A Longitudinal Analysis

The ML.ENERGY Leaderboard went from v2.0 (September 2024) to v3.0 (December 2025) with major changes: up-to-date models, hardware, software, and datasets. The v3.0 blog post covered the details of the v3.0 results, but how to they compare to the times v2.0? Are we making progress on energy efficiency? In this short post, we would like to look at the impact of software optimizations on energy efficiency over time, using the Llama 3.1 family as a case study.

Diagnosing Inference Energy Consumption with the ML.ENERGY Leaderboard v3.0

With The ML.ENERGY Benchmark v3.0 we released in December 2025, we expanded our scope to up-to-date important models, tasks, and GPU hardware. This included 46 models across 7 tasks, producing 1,858 configurations on NVIDIA H100 and B200 GPUs.1 As always, latest benchmarking results are public and can be browsed on The ML.ENERGY Leaderboard.

In this post, we first present empirical observations from measurements, and then develop a reasoning framework that explains why we observe certain energy behaviors.