A2. the math of statistical inference

If we observationally measure a quantity called average energy, then Bayesian inference in the absence of more-specific prior-information tells us that the average will be defined by a Lagrange heat-multiplier that we'll write as 1/kT, which may vary between -∞ and +∞. The analysis also prescribes roles for an exponential Boltzmann-factor ei/kT and normalizing partition-function Z for calculating the probability pi = (1/Z)ei/kT of states of energy εialong with a wide range of various average values and variances.

given one measured-average

The problem of statistical inference in the absence of prior information is to the find the set of Ω accessible-state probabilities pi which is least-committal (i.e. has maximum uncertainty S/k = Σipiln[1/pi]) subject to an average-energy constaint i.e. that ⟨E⟩ = Σipiεi. Note that if information supporting a non-uniform prior probability-set qi is also available, it becomes a Shannon-Jaynes entropy maximization i.e. a Kullback-Leibler divergenceminimization instead which basically weights the solution-probabilities given below (and their normalization) with the factor (Ωqi). Hence maximizing entropy (always negative in the Shannon-Jaynes case) is a special case of minimizing subsystem-correlations, as measured by the (always positive) Kullback-Leibler divergence (Gregory1965).

The Lagrange method of undetermined multipliers (not covered in detail here) then finds that states of energy εiwill have probability:

,

where 1/kT is the "heat-multiplier" of energy E, and the normalizing partition-function Z ≡ Σiei/kT. After maximization, the uncertainty S/k in natural information units (1 nat = ln2[e] bits ≈ 1.443 bits) becomes:

,

while derivatives of entropy-uncertainty S/k and ln[Z] with respect to heat/work multipliers yield a variety of useful relationships between those multipliers and measured values for observables and work-parameters as well.

The most important of these might be the relation which shows that 1/kT (reciprocal-temperature or Garrod's (Garrod1995) coldness) is an energy uncertainty-slope:

and as such a measure of "stochastic appeal" in that it represents the choice-variety available (#choices = eS/k = 2#bits) to chunks of thermal-energy given an opportunity to either stay or leave.

The analysis also tells us that ⟨E⟩ = -∂ln[Z]/∂(1/kT)|V where for both of these partial-derivative expressions the control-variable V (so far being held constant) is discussed below.

add one control-variable

If we further postulate a control-variable or work-parameter V whose value may be used to change energy mechanically via the relation P = -dE/dV, the average value of the resulting work-multiplier or volume uncertainty-slope (written here as a free-expansion coefficient P/kT) becomes:

.

For chemical systems another such variable may be number of particles (atoms or molecules) N e.g. of a given type, but only one variable is needed here to illustrate the concept.

fluctuation constraints

The foregoing relations can be combined to estimate fluctuations and 2nd-moment correlations i.e. variances and covariances as well. In this case for instance we get an estimate of energy variance (i.e. standard-deviation squared) from some derivatives:

,

whose non-negativity means that when kT increases at fixed-volume, average-energy does too regardless of any physics which may (or may not) later be introduced. For the free-expansion coefficient variance we get:

,

whose non-negativity here says that when volume increases at fixed average-energy, pressure decreases. Such equations might also be used e.g. to examine pressure-fluctuations in a sea-shell cavity, when you put it up to your ear in hopes of hearing some waves.

no-work heat capacities

Integral no-work heat-capacities for observed averages (like E/kT) or work-parameters (like PV/kT) are in general multiplicity-exponent (Fraundorf2003) for the corresponding parameter (here E or V) e.g. the bits of state information lost per 2-fold increase in energy or volume. In other words, for example:

.

Here the multiplicity-exponent for energy E is just the power of energy (i.e. the energy-exponent) to which state-multiplicity W ≡ eS/k is proportional.

Differential no-work heat-capacities like dE/d(kT) are comparable multiplicity-exponents for the corresponding multiplier (or its reciprocal) e.g. in this case the bits of state-information lost per 2-fold increase in absolute temperature kT. In other words:

.

In many (so-called quadratic) thermodynamic systems, energy-multiplier heat-capacities are often half-integralbecause (at least over a range of temperatures) multiplicity is a product of "degrees of freedom" coordinates (like velocity v = Sqrt[2E/m] or spring-displacement x = Sqrt[2E/ks]) proportional to the square-root of energy E. Multiplying by Boltzmann's constant of course converts these heat capacities from natural units into historical units (J/K), and further dividing by mass or number of molecules converts those into the traditional "per-quantity" forms.

changing average-values & uncertainty

Inference then further predicts several types of energy increase. In particular the inferential first-law equationcategorizes energy-increases into disordered (stochastic or heat-related δQin) and ordered (or work-parameter related δWin = -δWout) parts, expressible in terms of individual state energies εi and probabilities pi as:

.

Note that work-parameter related increases are further broken into correlation-based or informatic (TδSirr) and mechanical (-PδV) parts. As we discuss later, the former part is non-negative over time in classical thermodynamics although the "irreversible" label used here may be misleading in general.

The inferential second-law equation, in information units that depend on k (e.g. in nats for k = 1 or in J/K for k = Boltzmann's constant), focuses on probability-distribution instead of on energy changes and uses the fact that dS/dE = dS/(Σiεidpi) = 1/kT for fixed energy-levels εi to give us:

.

Thus if energy were to be conserved on transfer between systems, heat-flows represent the energy-impact of occupancy (i.e. state-probability) changes driven stochastically by temperature (i.e. heat-multiplier) differences. Work flows are volume (i.e. work-parameter) driven energy changes associated with both state-probability and state-energy assignments, so that the former can cancel out the latter e.g. in the case of free-expansion by partition-removal in which the volume is changed but no work is done.

Irreversible entropy-increases are associated with only those volume-driven energy-changes associated with changes in state-probability. For reversible processes in classical thermodynamics, all changes in state-probability assignment are driven stochastically by heat-multiplier differences.

sub-system correlation measures

Finally for comparing two-subsystems, statistical inference provides us with expressions for the dimensionless (i.e. information-unit or KL-divergence) analogs to ensemble free-energy:

,

and available-work (equal to the engineering-quantity "exergy" after assuming the analog-system Gibbs-Duhem relation i.e. entropy's extensivity) in context of an ambient reservoir with uncertainty-slope 1/kTo:

.

In applied physics and chemistry these expressions come in handy e.g. for figuring out what transformations may be likely (or at least possible) with the availability-resources at hand.

Put simply the best guess comes from minimizing net-surprisal (KL-divergence) about the equilibrium-state of a single system, while net-surprisal with respect to ambient lets one assess thermodynamic-availability or deviation-from-equilibrium. If the ambient-reference consists of uncorrelated subsystems, then KL-divergence becomes a delocalized measure of mutual-information or more generally multi-information (total correlation) between subsystems (Schneidman2003).

Note from the figure at right, however, that the expression above works just as well for comparing dot-averages on weighted dice as it does for thermal systems. Thus they are all still simply physics-free statistical-inference (i.e. math) at this point.

More generally we might say that the dimensionless KL-divergence or generalized thermodynamic-availability, with respect to a reference-ambient denoted by subscript "o", is simply IKL ≡ Σipiln[pi/poi] = ln[Zo/Z] + ΣrError) = Σrλro(Er-Ero) - (S-So)/k. This may also be seen as a measure of the useful "correlation information" in [nats], available to folks in the world around about that system's state.

Here as usual Z is the partition function used to normalize probabilities in the maximization, S is the entropy that is maximized under the R observed-average constraints Er, and the λr are the Lagrange-multipliers (or uncertainty slopes) associated with each of those constraints. So far this is pure statistical inference with no "physics", in that it is designed for guessing state probabilities whenever one has information on average values.

Since for average-energy U the Lagrange multiplier is λU = 1/kT, multiplying through by kTo = 1/λUo gives a quantity with units of energy. For example, if the other average constraint is volume V with Lagrange multiplier free-expansion coefficient λV=P/kT, the engineering expression for exergy follows: ΔB = kToIKL = (U-Uo) + Po(V-Vo) - To(S-So).

For systems with constant multiplicity-exponents (generalized heat-capacities) ξr, like a monatomic ideal gas, these expressions take the form of a sum of Gibbs inequality functions Θ[x] ≡ x-1-lnx ≥ 0, namely IKL = ΣrξrΘ[Er/Ero] where Er/Ero = λror. Thus for a monatomic ideal gas system whose temperature and volume both differ from ambient values, IKL = (3N/2)Θ[T/To] + NΘ[V/Vo]. 

inference recap

The foregoing expressions apply to estimates of any probability-distribution based on knowledge of but one average-value. The uses of these expressions therefore extend far and wide e.g. to analog as well as to digital information systems, to model-selection/parameter-estimation in many fields, to data compression and error-correction, to clade-analysis and plagarism-detection, etc. None of the expressions thus relies, for their correctness, on the underlying physics of thermal systems.

These expressions don't for instance require that energy E or volume V is a quantity that's conserved on sharing between systems, even though we've chosen our notation to remind us of physics applications in which energy is conserved. Changing a variable's status between observable-average and work-parameter opens the door to other "Gibbsian ensembles" (e.g. micro-canonical, pressure, grand) with very little added complication, although this specific example for introducing statistical-inference math already contains more than enough content for an intro-physics class.

To see what these results mean for physical systems in contact with a heat reservoir, a few general physical insights as well as some very specific functional shapes for the state energy assignments and the resulting partition function are needed. 

Related references: