Power limiting RTX 3090 GPU to increase power efficiency
I plotted this chart and thought I'd share it in case it was useful to others. It is the tok/s output at different power limits with a RTX 3090 during single-inferencing. While maximum efficiency is achieved around 211W, this reduces output by around 20%
Running between 260W-280W gives good energy savings while maintaining nearly maximum output.
While this gives a good rule of thumb, the actual numbers will vary with the model used an particularly if batch inferencing instead of single-inferencing.
Why power limit a GPU?
Why would you voluntarily leave performance on the table when you paid a lot of money for a GPU? There are several reasons:
- The most important reason is that you don't need to leave a lot of performance on the table, the default power limits on consumer GPUs tried to squeeze the last drops of performance out of the GPU even at the expense of much higher power consumption.
By dropping performance by low single-digit percentage points, you can save double digit percentage points of power. - Reducing peak and sustained power consumption means that you will not need as powerful and expensive PSU to power the GPUs.
In some cases where otherwise multiple-PSUs are required, this can potentially eliminate additional PSUs or allow you to use cheaper and lower power rated PSUs which saves on costs and reduces complexity.
Code and Data
I had a request to share the data for the chart and I share the data and chart plotting code below:
Idle power
Peak and sustained power is just one side of the equation and can help increase efficiency and reduce initial purchase costs as well as create a simpler and more compact AI server by reducing the number of PSUs required.
However there are two other things to consider:
- Controlling idle power consumption; and
- How to power multiple high performance GPUs in a single server in an efficient way.
If you'd like to see these articles, subscribe and get alerted when these follow-up articles become available.