EE HPC WG: TUE Team

Sustainably supporting science through committed community action


"TUE, a new energy-efficiency metric applied at ORNL's Jaguar" by Michael K Patterson (Intel), Stephen W Poole (ORNL), Chung-Hsing Hsu (ORNL), Don Maxwell (ORNL), William Tschudi (LBNL), Henry Coles (LBNL) David J Martinez (Sandia NL), Natalie Bates (EE HPC WG)

The TUE Team has developed two new metrics; iTUE and TUE that account for infrastructure elements that are a part of the HPC system (like cooling and power distribution).  TUE is an improvement to PUE.  iTUE is not only a metric that is necessary for calculating TUE, but stands on its own as a metric for a site to use for improving infrastructure energy efficiency.

 

PUE is easy to understand, and is comparatively easy to use.  It is in common use, so extending its use is thought to be more practical than replacing it.

 

TUE is positioned to give a more accurate representation of the overall efficiency of the data center with its included IT processing equipment (servers).  It primarily allows for consistent results even when moving the location of the air handling and energy storage devices around in the facility / IT chain like from row based to rack located or even directly to the server node itself.

 

iTUE is simply a convenient ratio term that extends the ratio of input energy of the overall system to the IT energy.  By extending the ratio closer to the actual components doing useful IT work, the location of accessory items like fans and batteries become irrelevant, thus helping prevent erroneous data presentation conclusions.  This extending of the ratio is accomplished through simple multiplication with the existing PUE ratio to get TUE.

 

The simpler PUE ratio alone is likely sufficient when dealing with many data centers that are very poorly positioned from an efficiency standpoint.  If a data center has a PUE of 2.5 or more, getting more accurate results is probably not the top priority.  When a data center has reached a PUE of 1.75 or better, it starts to become important to dig a little deeper and get a bit more resolution to compare overall solutions.

 

Weaknesses still remain when comparing different locations and different machine generations.  The metrics TUE and PUE poorly account for differing weather patterns which affect primarily the amount of free cooling available to one site vs another, drastically changing the energy needed to cool a data center.  In other words simply magically moving a data center from one geographic location to another can have an enormous effect on TUE and PUE without making any improvements.  This needs to be understood and is not necessarily a bad thing, but certainly clouds the data center to data center comparison.

 

The other big weakness is that the TUE metric still does not account for the output of the IT equipment.  TUE and iTUE simply measures the input more closely to the working components (CPU, memory network, etc.).  What this means is that if a newer generation of server cluster can produce a lot more useful output with only a little more infrastructure per input, it will still have a poorer TUE metric.  This weakness is extremely difficult to overcome in reality due the diversity of output types which leads to a lack of agreement on an output metric and procedure.  If and when an output metric is adopted, it can be used to extend the PUE & TUE metrics further to complete the string of energy use.

 

Attached below are whitepapers and presentations that provide more information on TUE and iTUE.

 

There is also one published paper on the subject: "TUE, a New Energy-Efficiency Metric Applied at ORNL’s Jaguar".


Related documents

 

Henry Coles

 

 

Satoshi Itoh

 

 

Natalie Bates

 

 

Chung-Hsing Hsu

 

 

Anna Maria Bailey

 

 

Herbert Huber and Axel Auweter

 

 

Michael Patterson

 

 

Ghaleb Abdulla

 

 

Chung-Hsing Hsu, Don Maxwell, Saeed Ghezawi, Joe Stephenson, Jim Rogers

 

 

Chung-Hsing Hsu, Stephen W. Poole, Don Maxwell