This section defines the methods for calculation of power dissipation for thermal and electrical analysis.
There are two major purposes for thermal analysis:
The designer must insure that no component operates at a temperature higher than the maximum rating of the part, and
Average component operating temperature is a primary determining factor in the long term reliability of the equipment.
Important quibble: Parts operating outside their ratings do not fail immediately. When parts operate outside their ratings, it means that their reliability becomes unknown. This is why analysis is so important. It is more than possible to have units pass temperature testing, and later discover that devices are failing prematurely due to some components operating outside their temperature rating. They just happened to survive the test, somehow. Never should have.
The slang for a test that should have failed, but somehow didn't, is "an accidental success".
So why not simply assume that each part is operating at its worst case dissipation when performing the design and analysis? That would be the easiest, safest thing to do. This would certainly yield a reliable result. It would lead to a overly conservative design in terms of heatsinks, fans, size, weight, and other expensive stuff.
Assuming maximum values of power dissipation for each component leads to overdesign for two reasons:
The likelihood of a system in which all components run at their maximum specified dissipation approaches zero quickly as the number of components in the system increases, and,
In most systems, not all components dissipate their worst case power at the same time.
A worst case collection of components, and conditions is extremely unlikely. However, it is certainly possible that a single component can be at its maximum dissipation either continuously or intermittently, during one of the operating conditions. This is true of most designs, in fact.
This article explains analysis methods to combine the facts above. It is not the only way, but it has proved very effective.
The maximum operational temperature of each component is defined as:
the component is at maximum dissipation, as listed in the device data sheet,
this component is surrounded by components operating at typical power, and
this component is operating during the worst case operating condition (external temperature and operating mode).
There is a slight modification to this if many components of the same lot are assembled together, such as a memory array or battery. If all the components are from the same lot, they may resemble each other quite a bit. In this case, lot test data from the components should be requested.
The reliability of the equipment is calculated assuming:
the typical power of the of all components, as listed in the device data sheet,
duty cycles based on the expected operating conditions (e.g., 10% read, 5% write, 85% idle).
Typical power is calculated by summing the typical power numbers from the data sheets under operating conditions specified by the system engineer. Analog circuits use the nominal values during that operating condition. This includes factoring in the approximate operating temperature - the power consumption of many components is dependent on operating temperature. High dissipation at low temperatures may hurt your battery life, but it is the dissipation at high temperatures that we are considering in this article, since they are the primary factor in equipment reliability. Note that this means there may be several typical operating conditions (transmitting, sleep, encoding, etc). The operating condition is important, don't forget to get it, or define the operating conditions yourself if you have to.
The typical power, which is a summation of a lot of numbers, is usually augmented with a safety margin. There are many ways of calculating this margin. The Root-Summed-Squared method is one we used a lot, which produced a deviation (additional total power) based on the sums of the squares of the differences between the typical and maximum dissipation's of each of the components. Very spreadsheetable. Monte Carlo is another. Simulators probably have a few modeling methods to choose from. They all produce a number that is greater than the sum of the typicals, and less than the sum of the maximums, with a probability distribution. Pick your poison.
This also predicts a number of outliers - units that are built to the drawings, but fall outside the specifications you have guaranteed. This is especially true in IC design. These could be sold as seconds, placed in the bone pile to support repair, or just sold hoping that the actual customer is not using the product near its required capability (which is generally true - but infuriating when it is not).
Also remember that reliability is a number for a fleet average - a statistically significant number of units. The lower the number of units, the greater the uncertainty of a reliability calculation based on typical power . So a greater safety margin has to be included for small runs. If you are only building a few units, then testing is far more practical, and the typical dissipation is only useful as a comparison against actual temperature measurements.
This is the maximum dissipation of a component during its worst case operating mode. This will likely be different from the operating mode in the typical example above. This is what is used for the worst case temperature rise. Typically this is assumed to be the maximum power at high temperature operation. Some technologies use more power at low temperatures. Make sure you are using a power at a relevant temperature.
For example: you have a four transistor bang-bang driver, where two of any four are on at a time, when they are in use. The worst case operating condition is that two of them are on. Each of the four is analyzed individually in this condition for temperature rise, but collectively to determine average temperature.
Don't forget that operational fault conditions apply. A protection circuit only works if it limits the maximum power of the protected components to within their rated maximums. For instance, protection circuits may only dissipate power during faults, so that is the use case you will use to determine the maximum operating temperature those components will experience.
A component may have zero static dissipation, but a very large momentary dissipation. Component temperature rise has a time constant, most transistor data sheets contain curves for single pulse temperature rise. This temperature rise also depends on the heat sink mechanism - a larger heat sink provides a higher mass, which means that more power will be required to get the heatsink to change temperature. A thermal transient analysis may be necessary to determine if a component will survive. For instance, the peak instantaneous power for a film resistor is 10X its static value, if the duty cycle is low enough that the static dissipation is not exceeded.
As a historical note, electric starters were not used on cars for a long time because the general design rules predicted such devices would have to be incredibly large to provide the starting power required. What was overlooked is that even though the peak power of a starter is quite high (it can move your car), the duty cycle is only a few seconds, 10 times a day, at most. Electric motor calculations of the day were developed assuming continuous operation at maximum power - for trains, pumps, and machines, etc. A starter motor has to withstand large thermal transients during intermittent operation, not constant operation. The problems involved in building something that can put out two horsepower for a few seconds and then be off for 20 minutes is very different than something that has run continuously at 2 horsepower. A new set of design rules for engine starters evolved, and design of starters became practically small and design straightforward.
Rx: if you crank your starter long enough, it's reliability will become "unknown" - because automotive starters are not designed for continuous operation.