<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Janky AI]]></title><description><![CDATA[Powered by duct tape and dreams.]]></description><link>https://jankyai.droidgram.com/</link><image><url>https://jankyai.droidgram.com/favicon.png</url><title>Janky AI</title><link>https://jankyai.droidgram.com/</link></image><generator>Ghost 5.86</generator><lastBuildDate>Tue, 05 May 2026 10:12:12 GMT</lastBuildDate><atom:link href="https://jankyai.droidgram.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Notable LLMs that are Apache/MIT licensed]]></title><description><![CDATA[<p>Sometimes you want LLMs that are unencumbered by non-commercial licenses. Below is a list of some notable LLMs that have friendly license agreements.</p><ul><li>Mistral family<ul><li>Mistral 7B, Mixtral 8*7B, Mixtral 8*22B</li><li>Mistral <a href="https://mistral.ai/news/mistral-nemo/?ref=jankyai.droidgram.com" rel="noreferrer">Nemo</a> 12GB with quantization aware training for good FP8 performance</li></ul></li><li>Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B</li></ul>]]></description><link>https://jankyai.droidgram.com/notable-llms-that-are-apache-mit-licensed/</link><guid isPermaLink="false">6699542274c0a200017641c5</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Thu, 18 Jul 2024 17:52:35 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/Default_an_abstract_image_that_evokes_open_source_large_langua_3.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/Default_an_abstract_image_that_evokes_open_source_large_langua_3.jpg" alt="Notable LLMs that are Apache/MIT licensed"><p>Sometimes you want LLMs that are unencumbered by non-commercial licenses. Below is a list of some notable LLMs that have friendly license agreements.</p><ul><li>Mistral family<ul><li>Mistral 7B, Mixtral 8*7B, Mixtral 8*22B</li><li>Mistral <a href="https://mistral.ai/news/mistral-nemo/?ref=jankyai.droidgram.com" rel="noreferrer">Nemo</a> 12GB with quantization aware training for good FP8 performance</li></ul></li><li>Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (MoE)</li><li>Phi family<ul><li>Phi-1, Phi-1.5</li><li>Phi-2 2.7B</li><li>Phi-3 3.8B, 7B, 14B</li></ul></li><li>Yi family<ul><li>Yi 34B, 9B and 6B</li><li>Yi 1.5 34B, 9B, and 6B</li></ul></li><li>Falcon 7B, Falcon 40B, LLM360/K2, OLMo-7B, </li><li>Neo_7B</li><li>IBM Granite models</li><li>XVERSE 7B/13B/65B</li><li>Snowflake Arctic</li><li>Grok</li><li>DeepSeek-Coder-V2</li><li>Danube 2, Danube 3 (For small models)</li></ul>]]></content:encoded></item><item><title><![CDATA[Reducing idle power consumption for Nvidia P100 and P40 GPUs]]></title><description><![CDATA[<p>One overlooked aspect of GPU usage is the power they consume when idle. Idle power draw refers to the amount of electricity a GPU consumes when it&apos;s not performing intensive tasks. This can significantly impact both energy consumption and electricity costs over time.</p><p>Without any tricks, a P40</p>]]></description><link>https://jankyai.droidgram.com/reducing-idle-power-consumption-for-nvidia-p100-and-p40-gpus/</link><guid isPermaLink="false">6693959274c0a200017640b9</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Sun, 14 Jul 2024 20:52:35 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/Default_gpu_pcb_surrounded_by_flames_with_a_black_background_2.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/Default_gpu_pcb_surrounded_by_flames_with_a_black_background_2.jpg" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"><p>One overlooked aspect of GPU usage is the power they consume when idle. Idle power draw refers to the amount of electricity a GPU consumes when it&apos;s not performing intensive tasks. This can significantly impact both energy consumption and electricity costs over time.</p><p>Without any tricks, a P40 with VRAM loaded can burn 45W at idle. With some tweaks, this idle power can be reduced to around 10W.</p><h2 id="idle-power-draw-10w-vs-45w">Idle Power Draw: 10W vs 45W</h2><p>Let&apos;s consider the impact of the difference between a 45W and 10W  idle draw. While the difference might seem small at first glance, the cumulative effect over a year can be substantial.</p><h3 id="annual-energy-consumption">Annual Energy Consumption</h3><p>To calculate the annual energy consumption, we use the formula: Energy&#xA0;= Power &#xD7; Time&#xA0;/ 1000</p><p>Assuming the GPUs are idle 24 hours a day for 365 days a year, we get:</p><ul><li><strong>10W GPU:</strong> 10W &#xD7; 24 &#xD7; 365 / 1000 = 88 kWh</li><li><strong>45W GPU:</strong> 45W &#xD7; 24 &#xD7; 365 / 1000 = 394 kWh</li></ul><h3 id="annual-cost-of-electricity">Annual Cost of Electricity</h3><p>The cost of electricity can vary substantially from place to place, but where I live it is approximately $0.25 per kWh. Which then gives the annual costs as follows:</p><table>
<thead>
<tr>
<th><strong>GPU Idle Power Draw (W)</strong></th>
<th style="text-align:right"><strong>Annual Energy Consumption (kWh)</strong></th>
<th style="text-align:right"><strong>Annual Cost ($)</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td style="text-align:right">88</td>
<td style="text-align:right">$22.00</td>
</tr>
<tr>
<td>45</td>
<td style="text-align:right">394</td>
<td style="text-align:right">$99.00</td>
</tr>
</tbody>
</table>
<p><em>Table 1: Annual cost comparison of P40 idling at 10W vs 45W</em></p>
<p>The difference in idle power draw between 10W and 45W might seem minor on a per-second basis, but over the span of a year, it results in significant energy consumption and cost differences, especially when you put multiple GPUs in a system.</p><h2 id="p40-idle-state-quirks">P40 idle state quirks</h2><p>The P40 has only P0 and P8 states and idle draw can be as low as 10W when VRAM is empty, but the P40 seems to have a quirk when content is loaded into VRAM: the power draw can be 45W even when the GPU is performing no work.</p><p>Luckily, there are ways to work around this and reduce idle power draw by directly adjusting pstates.</p><h3 id="reducing-idle-power-draw-by-directly-adjusting-pstates">Reducing idle power draw by directly adjusting pstates</h3><p>A library and CLI utilities to manage pstates <a href="https://github.com/sasha0552/nvidia-pstate?ref=jankyai.droidgram.com" rel="noreferrer">here</a>: </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/sasha0552/nvidia-pstate?ref=jankyai.droidgram.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - sasha0552/nvidia-pstate: A library and CLI utilities for managing performance states of NVIDIA GPUs.</div><div class="kg-bookmark-description">A library and CLI utilities for managing performance states of NVIDIA GPUs. - sasha0552/nvidia-pstate</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">sasha0552</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/ca68088ee824108b5e7f18ebb90168e6d85fd0d5db4ef3f53c6627b3d833f220/sasha0552/nvidia-pstate" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"></div></a></figure><p>and <a href="https://github.com/sasha0552/nvidia-pstated?ref=jankyai.droidgram.com" rel="noreferrer">daemon</a>:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/sasha0552/nvidia-pstated?ref=jankyai.droidgram.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - sasha0552/nvidia-pstated: A daemon that automatically manages the performance states of NVIDIA GPUs.</div><div class="kg-bookmark-description">A daemon that automatically manages the performance states of NVIDIA GPUs. - sasha0552/nvidia-pstated</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">sasha0552</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/784f3ed33d9c9df94c2de6779a3c966d33c5d7a19aa66be250a6ed1b0dc9848e/sasha0552/nvidia-pstated" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"></div></a></figure><p>Patches to automatically drop pstates while idle for llama.cpp and vLLM are available <a href="https://github.com/sasha0552/ToriLinux/tree/main/airootfs/home/tori/.local/share/tori/patches?ref=jankyai.droidgram.com" rel="noreferrer">here</a>:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/sasha0552/ToriLinux/tree/main/airootfs/home/tori/.local/share/tori/patches?ref=jankyai.droidgram.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">ToriLinux/airootfs/home/tori/.local/share/tori/patches at main &#xB7; sasha0552/ToriLinux</div><div class="kg-bookmark-description">Linux LiveCD for offline AI training and inference. - sasha0552/ToriLinux</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">sasha0552</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/dd18b3391e6e5a38bc9fd589fdf78037ff1809c23833de133e6d67b1015df90c/sasha0552/ToriLinux" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"></div></a></figure><p>There&apos;s also a separate project that aims to do something similar called <a href="https://github.com/crashr/gppm?ref=jankyai.droidgram.com" rel="noreferrer">gppm</a>, which aims to handle multiple cards and llama.cpp instances independently.</p><h2 id="p100-has-no-pstates">P100 has no pstates</h2><p>The P100 is a datacentre GPU that was originally designed for training workloads. Since the target workload aimed at continuous maximum utilization, these GPUs have no low power pstates.</p><p>Even at idle with no data loaded into VRAM, these can consume just under 30W of idle power. Put four of them in a server and you have 120W of idle power just for the GPUs.</p><table>
<thead>
<tr>
<th><strong>GPU Idle Power Draw (W)</strong></th>
<th style="text-align:right"><strong>Annual Energy Consumption (kWh)</strong></th>
<th style="text-align:right"><strong>Annual Cost ($)</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>120</td>
<td style="text-align:right">1,051</td>
<td style="text-align:right">$263.00</td>
</tr>
</tbody>
</table>
<p><em>Table 2: Annual cost of running 4xP100s at idle power</em></p>
<p>Given this power profile, you would choose P100s if:</p><ul><li>You expect to have high utilization with little idle time</li><li>You want to run computations in batches and will turn off the server when batches are done</li><li>You want the server to double as a space heater or have money to burn</li></ul><p>Since the P100 is not very popular for home use due to this idle power issue and having only 16GB of VRAM compared to the P40&apos;s 24GB, the prices of P100s on the 2nd hand market have remained relatively low even as the P40 prices have skyrocketed.</p><h3 id="but-what-if">But what if...</h3><p>One last possibility of power saving is to mount the GPU onto a riser to enable disconnection of power to the GPU and then performing PCIe hot-unplugging to save power. This could theoretically save power at the expense of start-up latency.</p><p>Getting PCIe hot-plugging to work on consumer grade hardware maybe challenging and frustrating (massive understatement alert).</p><h2 id="what-about-operating-power">What about operating power?</h2><p>Idle power is only only one aspect, see this article on how to manage active power to maximize efficiency:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://jankyai.droidgram.com/power-limiting-rtx-3090-gpu-to-increase-power-efficiency/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Power limiting RTX 3090 GPU to increase power efficiency</div><div class="kg-bookmark-description">I plotted this chart and thought I&#x2019;d share it in case it was useful to others. It is the tok/s output at different power limits with a RTX 3090 during single-inferencing. While maximum efficiency is achieved around 211W, this reduces output by around 20% Running between 260W-280W gives good</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://jankyai.droidgram.com/content/images/size/w256h256/format/jpeg/2024/06/favicon.jpg" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"><span class="kg-bookmark-author">Janky AI</span><span class="kg-bookmark-publisher">DeltaSqueezer</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://jankyai.droidgram.com/content/images/2024/07/3090-hr.png" alt="Reducing idle power consumption for Nvidia P100 and P40 GPUs"></div></a></figure><h2 id="cooling-gpus">Cooling GPUs</h2><p>One final challenge with re-purposing these datacenter GPUs for home use is that the cards do not have active cooling, instead relying on forced air cooling from the server.</p><p>Cooling these cards also has certain factors to be considered and is not straight forward - or at least not if you don&apos;t want hairdryer levels of screaming fans in the server. Subscribe using the link below to get our guide on the options for cooling these GPUs while retaining your sanity!</p><h2 id="readers-comments">Readers&apos; comments</h2><blockquote>
<p>Thanks for the inspiration.</p>
<p>I just updated someone else&apos;s repo (PR pending approval) to give .net control of the same API that nvidia_pstate is using because unfortunately the python script didn&apos;t enumerate my Tesla GPUs.</p>
<p>Here&apos;s my fork of the .net wrapper: <a href="https://github.com/maz-net-au/NvAPIWrapper?ref=jankyai.droidgram.com">https://github.com/maz-net-au/NvAPIWrapper</a></p>
<p>You can control it like this: (8 is for P8, use 16 to restore the default, auto-switching mode)</p>
<pre><code class="language-c">PhysicalGPUHandle[] handles = GPUApi.EnumTCCPhysicalGPUs();
foreach (PhysicalGPUHandle ph in handles)
{
   GPUApi.SetForcePstate(ph, 8, 2); // the 2 is from nvidia_pstate python script
}
</code></pre>
<p>I&apos;m keeping the units at P8 and watching for GPU utilization, allowing P0 for 2 mins after the last poll detected utilisation above 10%. I.e. as soon as you start inference, I allow the cards to switch to P0 and if unused for a couple of minutes, it forces them back to P8.</p>
<p>My frankenstien&apos;s monster of a Dell R720XD has 2x Tesla P40&apos;s and 2x Tesla T4&apos;s in it and if I leave llama.cpp and ComfyUI both running, just the idle P0 power usage heats up the compute units and runs the chassis fans at 80%. This is all a convaluted fix for the issue of not wanting to piss off my wife with the soothing hum of server fans.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Sometimes when you don't have 340 GB of VRAM]]></title><description><![CDATA[<p>You just have to resort to running on your computer with 12 sticks of 32GB RAM!</p><figure class="kg-card kg-embed-card"><iframe width="200" height="150" src="https://www.youtube.com/embed/TX0eppc88TU?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen title="NVIDIA Nemotron-4 340B Q8_0 running on AMD Epyc 9374F - real time generation speed"></iframe></figure>]]></description><link>https://jankyai.droidgram.com/sometimes-when-you-dont-have-340-gb-of-vram/</link><guid isPermaLink="false">6691abd874c0a2000176409f</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Fri, 12 Jul 2024 22:21:04 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/nemotron.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/nemotron.jpg" alt="Sometimes when you don&apos;t have 340 GB of VRAM"><p>You just have to resort to running on your computer with 12 sticks of 32GB RAM!</p><figure class="kg-card kg-embed-card"><iframe width="200" height="150" src="https://www.youtube.com/embed/TX0eppc88TU?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen title="NVIDIA Nemotron-4 340B Q8_0 running on AMD Epyc 9374F - real time generation speed"></iframe></figure>]]></content:encoded></item><item><title><![CDATA[How many GPUs do you want to cram into your box? Yes.]]></title><description><![CDATA[<p>Custom case, or special server cards to fit 4 GPUs into a case? No need, we&apos;ll just squash them in there.</p><p>Congrats to stonedoubt for this tetris-like feat and great thermal density!</p><figure class="kg-card kg-embed-card"><blockquote class="reddit-embed-bq" style="height:500px">
<a href="https://www.reddit.com/r/LocalLLaMA/comments/1dz81sf/behold_my_dumb_sht/?ref=jankyai.droidgram.com">Behold my dumb sh*t &#x1F602;&#x1F602;&#x1F602;</a><br> by
<a href="https://www.reddit.com/user/stonedoubt/?ref=jankyai.droidgram.com">u/stonedoubt</a> in
<a href="https://www.reddit.com/r/LocalLLaMA/?ref=jankyai.droidgram.com">LocalLLaMA</a>
</blockquote>
<script async src="https://embed.reddit.com/widgets.js" charset="UTF-8"></script></figure>]]></description><link>https://jankyai.droidgram.com/how-many-gpus-do-you-want-to-cram-in-your-box-yes/</link><guid isPermaLink="false">668d76dc74c0a20001763fd2</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Tue, 09 Jul 2024 17:45:43 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/behold-my-dumb-sh-t-v0-e74or9th1jbd1-1.webp" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/behold-my-dumb-sh-t-v0-e74or9th1jbd1-1.webp" alt="How many GPUs do you want to cram into your box? Yes."><p>Custom case, or special server cards to fit 4 GPUs into a case? No need, we&apos;ll just squash them in there.</p><p>Congrats to stonedoubt for this tetris-like feat and great thermal density!</p><figure class="kg-card kg-embed-card"><blockquote class="reddit-embed-bq" style="height:500px">
<a href="https://www.reddit.com/r/LocalLLaMA/comments/1dz81sf/behold_my_dumb_sht/?ref=jankyai.droidgram.com">Behold my dumb sh*t &#x1F602;&#x1F602;&#x1F602;</a><br> by
<a href="https://www.reddit.com/user/stonedoubt/?ref=jankyai.droidgram.com">u/stonedoubt</a> in
<a href="https://www.reddit.com/r/LocalLLaMA/?ref=jankyai.droidgram.com">LocalLLaMA</a>
</blockquote>
<script async src="https://embed.reddit.com/widgets.js" charset="UTF-8"></script></figure>]]></content:encoded></item><item><title><![CDATA[4xV100 SXM Build]]></title><description><![CDATA[<p>This <a href="https://github.com/l4rz/building-a-poor-mans-supercomputer?ref=jankyai.droidgram.com" rel="noreferrer">build</a> is several years old so the prices quoted are much higher than today. I&apos;ve been very interested in V100 SXM builds as more of these come onto the market and prices fall. I&apos;ve not yet pulled the trigger on such a build as prices</p>]]></description><link>https://jankyai.droidgram.com/4xv100-sxm-build/</link><guid isPermaLink="false">668b9b3774c0a20001763fb0</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Mon, 08 Jul 2024 08:03:39 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/c4130-assy-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/c4130-assy-1.jpg" alt="4xV100 SXM Build"><p>This <a href="https://github.com/l4rz/building-a-poor-mans-supercomputer?ref=jankyai.droidgram.com" rel="noreferrer">build</a> is several years old so the prices quoted are much higher than today. I&apos;ve been very interested in V100 SXM builds as more of these come onto the market and prices fall. I&apos;ve not yet pulled the trigger on such a build as prices were still a little too high for my liking and parts are also tricky to source and some of the sources look dubious, but I&apos;ll fo sure be keeping my eye on this.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/l4rz/building-a-poor-mans-supercomputer?ref=jankyai.droidgram.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - l4rz/building-a-poor-mans-supercomputer: I&#x2019;ve built a 4x V100 box for less than $5,500.</div><div class="kg-bookmark-description">I&#x2019;ve built a 4x V100 box for less than $5,500. Contribute to l4rz/building-a-poor-mans-supercomputer development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt="4xV100 SXM Build"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">l4rz</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/988696c41543fc666359bac5bc777fc972733a7323ed8dc78d0f6ee6431c3fe3/l4rz/building-a-poor-mans-supercomputer" alt="4xV100 SXM Build"></div></a></figure><p>Lots of great janky takeaways from this build including the DIY heatsink and discussion on avoiding paying for a $350 precision torque wrench by using your fingers to tighten the screws.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://jankyai.droidgram.com/content/images/2024/07/fabricated-heatsink.jpg" class="kg-image" alt="4xV100 SXM Build" loading="lazy" width="2000" height="1335" srcset="https://jankyai.droidgram.com/content/images/size/w600/2024/07/fabricated-heatsink.jpg 600w, https://jankyai.droidgram.com/content/images/size/w1000/2024/07/fabricated-heatsink.jpg 1000w, https://jankyai.droidgram.com/content/images/size/w1600/2024/07/fabricated-heatsink.jpg 1600w, https://jankyai.droidgram.com/content/images/2024/07/fabricated-heatsink.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Pay $100 for a heatsink? No! We&apos;ll make our own! </span></figcaption></figure>]]></content:encoded></item><item><title><![CDATA[Power limiting RTX 3090 GPU to increase power efficiency]]></title><description><![CDATA[<p>I plotted this chart and thought I&apos;d share it in case it was useful to others. It is the tok/s output at different power limits with a RTX 3090 during single-inferencing. While maximum efficiency is achieved around 211W, this reduces output by around 20%</p><p>Running between 260W-280W</p>]]></description><link>https://jankyai.droidgram.com/power-limiting-rtx-3090-gpu-to-increase-power-efficiency/</link><guid isPermaLink="false">667d2cb54b453c0001455662</guid><dc:creator><![CDATA[DeltaSqueezer]]></dc:creator><pubDate>Thu, 27 Jun 2024 09:15:49 GMT</pubDate><media:content url="https://jankyai.droidgram.com/content/images/2024/07/3090-hr.png" medium="image"/><content:encoded><![CDATA[<img src="https://jankyai.droidgram.com/content/images/2024/07/3090-hr.png" alt="Power limiting RTX 3090 GPU to increase power efficiency"><p>I plotted this chart and thought I&apos;d share it in case it was useful to others. It is the tok/s output at different power limits with a RTX 3090 during single-inferencing. While maximum efficiency is achieved around 211W, this reduces output by around 20%</p><p>Running between 260W-280W gives good energy savings while maintaining nearly maximum output.</p><p>While this gives a good rule of thumb, the actual numbers will vary with the model used an particularly if batch inferencing instead of single-inferencing.</p><h3 id="why-power-limit-a-gpu">Why power limit a GPU?</h3><p>Why would you voluntarily leave performance on the table when you paid a lot of money for a GPU? There are several reasons:</p><ul><li>The most important reason is that you don&apos;t need to leave a lot of performance on the table, the default power limits on consumer GPUs tried to squeeze the last drops of performance out of the GPU even at the expense of much higher power consumption. <br><br>By dropping performance by low single-digit percentage points, you can save double digit percentage points of power.</li><li>Reducing peak and sustained power consumption means that you will not need as powerful and expensive PSU to power the GPUs. <br><br>In some cases where otherwise multiple-PSUs are required, this can potentially eliminate additional PSUs or allow you to use cheaper and lower power rated PSUs which saves on costs and reduces complexity.</li></ul><h3 id="code-and-data">Code and Data</h3><p>I had a request to share the data for the chart and I share the data and chart plotting code below:</p><figure class="kg-card kg-code-card"><pre><code class="language-python">import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# 3090 Power data
growth_data = [(100, 18), (125, 20), (150, 42), (175, 64), (187, 76), (200, 83), (225, 91), (250, 97), (265, 98), (275, 98), (280, 99), (285, 101), (300, 101), (325, 102), (350, 103), (375, 104)]

# Convert data to numpy arrays
x = np.array([t[0] for t in growth_data])
y = np.array([t[1] for t in growth_data])
yo = np.array([t[1]/t[0] for t in growth_data])

# Define the Gompertz function
def gompertz(x, a, b, c):
    return a * np.exp(-b * np.exp(-c * (x)))

def gompertzx(x, a, b, c):
    return a * np.exp(-b * np.exp(-c * (x))) / x

# Initial guess for parameters
p0 = [100, 0.1, 0.01]

# Fit the curve
popt, pcov = curve_fit(gompertz, x, y, p0)
#popt2, pcov2 = curve_fit(gompertzx, x, y, p0)

# Print the parameters of the fitting curve
print(&quot;Fitting parameters:&quot;)
print(&quot;a =&quot;, popt[0])
print(&quot;b =&quot;, popt[1])
print(&quot;c =&quot;, popt[2])

# Calculate the maximum value of y
x_fit = np.linspace(x.min(), x.max(), 10000)
y_fit = gompertz(x_fit, *popt)
yo_fit = gompertzx(x_fit, *popt)
max_yo = np.max(yo_fit)
max_xo = x_fit[np.argmax(yo_fit)]

# Plot the data and the fitted curve
fig, ax1 = plt.subplots()

ax1.plot(x, y, &apos;ko&apos;)
ax1.plot(x_fit, y_fit, &apos;r-&apos;, label=&apos;Gompertz fit&apos;)
ax1.set_xlabel(&apos;Power (Watts)&apos;)
ax1.set_ylabel(&apos;Output (tok/s)&apos;, color=&apos;r&apos;)
ax1.tick_params(&apos;y&apos;, colors=&apos;r&apos;)

ax2 = ax1.twinx()
ax2.plot(x_fit, yo_fit, &apos;b-&apos;, label=&apos;Efficiency&apos;)
ax2.set_ylabel(&apos;tok/s/W&apos;, color=&apos;b&apos;)
ax2.tick_params(&apos;y&apos;, colors=&apos;b&apos;)

ax2.plot(x, gompertzx(x,*popt), &apos;bo&apos;)

# Indicate the maximum value of y
ax2.plot([max_xo, max_xo], [0, max_yo], &apos;k--&apos;, label=&apos;Max Eff.&apos;)
ax2.annotate(f&apos;Max efficiency: {max_xo:.0f}W&apos;, xy=(max_xo, max_yo), xytext=(max_xo+5, 0.25))

fig.tight_layout()
plt.title(&apos;RTX3090 output vs power&apos;)
fig.legend(loc=(0.6,0.2))

plt.show()</code></pre><figcaption><p><span style="white-space: pre-wrap;">Code generated with the help of LLMs!</span></p></figcaption></figure><figure class="kg-card kg-code-card"><pre><code>Fitting parameters:

a = 104.54941090829679
b = 23.474054254669152
c = 0.022347470077472967</code></pre><figcaption><p><span style="white-space: pre-wrap;">Fitted coefficients</span></p></figcaption></figure><h3 id="idle-power">Idle power</h3><p>Peak and sustained power is just one side of the equation and can help increase efficiency and reduce initial purchase costs as well as create a simpler and more compact AI server by reducing the number of PSUs required.</p><p>However there are two other things to consider:</p><ul><li>Controlling idle power consumption; and</li><li>How to power multiple high performance GPUs in a single server in an efficient way.</li></ul><p>If you&apos;d like to see these articles, subscribe and get alerted when these follow-up articles become available.</p>]]></content:encoded></item></channel></rss>