FAQs

Computing Φ (technical):
identifying a complex &
unfolding its Φ-structure

These FAQs especially build on the content of the postulate pages and the Computing Φ page. If you have a question that is not yet covered by those pages (or the FAQs below), please add or upvote a new question on the comment threads at the bottom of those respective pages.

How do we get a TPM?

How do we obtain the “backward” probabilities needed for cause selectivity?

What is a valid (or "legal") partition?

Why is Φ impossible to calculate for systems larger than a few units?

How do we determine the causal grain at which integrated information (φs) is maximal?

Why do we normalize φs when searching for the minimum partition?

How do we get a TPM?

A transition probability matrix (TPM) is a matrix of numbers, each of which specifies the probability that the units will transition from one (input) state to another (output) state. For example, the inset shows the TPM for a candidate complex AB constituted of two binary units, A and B (uppercase ON and lowercase OFF). It shows the probability that each input state (left column) will transition to the output states (top row). A TPM forms the basis for assessing the cause–effect power and unfolding the full cause–effect structure of the substrate it represents.

A TPM can be derived in two ways: through empirical manipulations or through modeling. In both situations it will only ever provide an approximation of the true transition probabilities: finite empirical experiments are always limited in their precision (assuming non-deterministic substrates), and models are only as good as the assumptions going into them. Still, assuming high-quality repeated experiments (for the empirical approach) and building on our best current knowledge of the substrate (for the modeling approach), we can estimate the causal powers of the substrate.

As illustrated in this slideshow, to get a TPM by the empirical approach, we manipulate the units of the candidate complex (e.g., set them ON or OFF) and observe the states in the next time step (given by the update grain). We repeat these manipulations and observations many times so as to get a satisfactory estimate of the probability for each state transition. The empirical approach is key to isolate true causes and effects (and not mere correlations). Yet for any system of interest (e.g., a brain), this approach would require an indescribably long time to complete, to obtain even a decent estimate of the probabilities.

Hence, in IIT, we often resort to using TPMs based on the modeling approach. This means working with idealized systems, whose properties approximate those of a real substrate of interest (e.g., a brain constituted of neurons). An example of a model system is iAbcdo, used throughout the Overview section. Of course, this example is much too simple as an approximation for any relevant substrate of consciousness, but it serves as an example that we can exhaustively analyze. For such model systems, TPMs are obtained by explicitly defining the properties of every unit. Thus, we need not do experiments to know the transition probabilities—they can be computed explicitly. We operationally define each unit based on the following:

Possible input states to the unit (left column)
Possible output states of the unit (top row)
The probabilities of all possible output states given every possible input state

In IIT, we often obtain these probabilities using activation functions that approximate the behavior of particular types of neural circuits (e.g., sigmoid or logistic functions).

These probabilities allow us to build a unit TPM. Take, for example, the two units A and B with unit TPMs depicted here. These are the two units assumed to constitute the substrate AB, whose TPM is presented at the start of this FAQ. The unit TPM for A shows that it is likely to be observed in state ON (A) if its input state is set to AB. Unit B is modeled to behave identically to A in that situation. Given that every unit is defined in this way, the substrate TPM of AB (as a whole can be obtained by multiplying the unit TPMs (based on the assumption of conditional independence). Thus, by multiplying the probability of A (0.9) with that of B (0.9), we get the probability of AB (0.81, bottom-right value of the AB system TPM at the start).

Take, for example, the two units A and B with unit TPMs depicted here. These are the two units assumed to constitute the substrate AB, whose TPM is presented at the start of this FAQ. The unit TPM for A shows that it is likely to be observed in state ON (A) if its input state is set to AB. Unit B is modeled to behave identically to A in that situation. Thus, by multiplying the probability of A (0.9) with that of B (0.9), we get the probability of AB (0.81, bottom-right value of the AB system TPM above).

It is essential to note that both ways of building a TPM presume the perturbational approach. This may seem obvious for the empirical TPM, where the manipulations and observations are explicit. But how is the model TPM “perturbational” if no perturbations are done? The main reason is that the TPM contains conditional, interventional probabilities that would be obtained using Pearl’s do-calculus. That is, the numbers that populate the TPM are those that would have resulted from a comprehensive perturbational approach applied to a system that the model approximates.

Thus, we explicitly model the TPM in a way that assumes the perturbational approach: we assume that if we could perturb and observe the given system, this is what we would get.

Furthermore, we can check whether the idealized models were reasonable by applying a simplified perturbational approach in the real substrate of interest (e.g. the brain). If the resulting probabilities from properly controlled experiments are similar to those in our model, then we have reason to believe the model is decent, and we may thus make informed inferences about “what it is like” to be that substrate based on the model alone. But if the empirical tests show different probabilities, then the model is insufficient and we should not use it for validating or testing the theory. We should rather use the empirical results to refine our model.

For more, see

Cite this FAQ

Juel, Bjørn Erik, Jeremiah Hendren, Matteo Grasso, and Giulio Tononi. "FAQ: How do we get a TPM?" IIT Wiki. Center for Sleep and Consciousness, University of Wisconsin–Madison. Last modified June 30, 2024. Available at https://www.iit.wiki/faqs/technical (DOI: https://doi.org/10.5281/zenodo.14160283).

How do we obtain the “backward” probabilities needed for cause selectivity?

To compute intrinsic information, we need to take the perspective of the system in its current state and adjust the informativeness factor by the selectivity factor. On the effect side, we can simply use the probabilities from the effect TPM to calculate both informativeness and selectivity. This is because in the effect TPM, the current state is treated as the input and the effect states as the potential outputs that the system might transition to. One way to think about this is that for the effect side, the probabilities to calculate informativeness and selectivity are both “forward looking”—they are the probability of an output given an input.

However, this is not the case on the cause side. On the cause TPM, the current state is treated as the output of a transition and the cause states as potential inputs that the system might have transitioned from. These are still “forward looking” in the sense that they are probabilities of the output state given the input state [denoted P(output | input)]. And these probabilities are indeed what we need to calculate cause informativeness. However, to capture the selectivity factor, we have to “look backward” from the system in its current state to determine the probability of an input state given the output state [denoted P(input | output)], and we alter the relevant probabilities accordingly.

Here is a simple-minded way of thinking about this: to calculate selectivity on the cause side, we need to make the column corresponding to the current state sum to 1. As shown below, for our example system Abcd, this column sums to 1.06 on the cause TPM, so we have to renormalize it. The reason is that from the perspective of the system in its current state, it must have “come from somewhere” with 100% likelihood. Its cause can neither be overdetermined (>100%) or underdetermined (<100%). Thus, from the perspective of the current state, we can figuratively say that the system “looks backward” from the current (output) state when “selecting” its cause state.

To transform the “forward-looking” probabilities into “backward-looking” ones, we apply Bayes’ rule and the law of total probability:

Text. Next, we assume a uniform distribution over N possible input states (P(input)=1/N) [1], which allows us to cancel out P(input) from the equation. Thus, the formula simplifies to this:

In the figure above, we have applied this formula to the transition from aBcd to Abcd (probability 0.82), which yields a renormalized probability of 0.76. This means that from the intrinsic perspective of the system in its current state (Abcd), there is a 76% chance that it transitioned from aBcd. This value of 0.76 is the selectivity factor used for calculating cause intrinsic information (iic) for cause state aBcd—the one used in the main example throughout the Wiki.

For more, see:

Footnotes

[1] If we were to assume a non-uniform distribution (e.g., an empirical distribution), this would impose information about past states that we know as observers (e.g., observing the system over time). However, since we aim to take the intrinsic perspective of the system, we limit ourselves to what the system can “know” about its past states conditional on its current state—which means giving equal unconditional probability to each possible past state.

Cite this FAQ

Juel, Bjørn Erik, Jeremiah Hendren, Matteo Grasso, and Giulio Tononi. "FAQ: How do we obtain the 'backward' probabilities needed for cause selectivity?" IIT Wiki. Center for Sleep and Consciousness, University of Wisconsin–Madison. Last modified June 30, 2024. Available at https://www.iit.wiki/faqs/technical (DOI: https://doi.org/10.5281/zenodo.14160283).

What is a valid (or "legal") partition?

As described in the integration postulate, irreducibility is an integral concept in IIT. Whenever we aim to assess irreducibility of a system, we compare the intact system with a partitioned version of it (likewise for mechanisms, relations, or units). If there is a way to partition it without affecting its causal powers, we say that it is reducible, and it should not be assumed to exist as one. Thus, it is essential to understand how to make valid partitions—not only to systems but also to mechanisms, relations, and units.

System partitions

System partitions are used to assess the irreducibility of a candidate system (to assess its φs value). In the integration postulate, we saw the minimum partition of system Abcd, as reproduced here. Let’s break down how we constructed this partition and what makes it valid (or “legal”).

Briefly, to understand valid system partitions, imagine that all you know about the candidate system is what its units are, but you know nothing about the connectivity among them (panel 1 in the figure below). To construct a legal partition, we first separate all of the units into non-overlapping parts of the system, each of which must contain at least one unit. For the system Abcd, there are 14 ways to do this (can you find them all?). One grouping could be A and b together (Ab) as one part, c as a second part, and d as a third (panel 2).

Next, we must causally separate the parts from one another. To do this, we remove each part’s capacity to “take a difference” from the other parts or “make a difference” to them [1]. That is, for each part, we must “cut” all possible outgoing and/or incoming causal connections. Each cut is “unidirectional”—meaning either outgoing or incoming. But we are also free to apply both cuts to a part to isolate it completely (as with Ab above).

Panel 3 shows four cuts, each indicated by a small orange arrow: all connections to c, from d, and both to and from Ab. These four cuts together make up our partition (which is, indeed, the minimum partition of this sample system used throughout the Wiki). This partition is but one out of twenty-seven valid partitions that could be applied to causally separate these parts (or any three parts) [2].

Panel 4 presents the same partition a second time, but with gray arrows showing the causal connections among the units that are affected for a specific cut—namely, for Ab outgoing. The gray arrows emphasize that a cut always severs not only all but all possible connections to or from a given part. The single Ab outgoing cut requires severing the causal connections of A → c, A → d, b → c, and b → d. Keep in mind that it doesn’t matter whether there are “real” connections between these units (e.g., as depicted on the substrate model); a given cut always severs all possible relevant causal connections [3].

For more, see Marshall, W., Grasso, M., Mayner, W. G., Zaeemzadeh, A., Barbosa, L. S., Chastain, E., ... & Tononi, G. (2023). System Integrated Information. Entropy, 25(2), 334.

Footnotes

[1] Why do we say “or” when we usually insist that causal power requires the ability to “take and make a difference”? The reason is given in the integration postulate page: “The cuts can be unidirectional because, from the intrinsic perspective of a system, a subset of units must be able to interact with the rest of the system in both directions (cause and effect) to be truly a part of it.” Therefore, since a part must have both cause and effect, cutting only one is enough to disintegrate it.[2] Note, however, that some of the valid partitions will be identical in practice—for example, cutting outgoing connections from Ab have the same consequences as cutting incoming connections to both c and d.[3] In fact, the process of cutting can be seen as a way to assess the presence of causal interactions between parts in the first place (in a simpleminded sense, cutting lets us figure out where we should even put connection arrows on our substrate model).

Mechanism partitions

Mechanism partitions are used to assess whether a candidate distinction is irreducible (i.e., to assess its φd value). The same way that system partitions aim to assess the causal power of the system over itself as one system, mechanism partitions aim to assess the causal power the mechanism has over its purviews as one mechanism. Yet there are three key differences in how valid mechanism partitions are made.

First, the mechanism’s inputs (cause purview) and outputs (effect purview) may be constituted of different units than the mechanism itself. This is in contrast to the system’s inputs and outputs, which are always the same set of units that constitute the system itself (by the intrinsicality postulate). Second, partitions are applied to mechanism–purview pairs on the cause side and the effect side independently (unlike system partitions, where a single partition is applied and then assessed on the cause side and effect side in turn). This is necessary because of the first point: the mechanism’s input and output units may differ; thus the possible partitions necessarily differ too. Third, building on the first two points, the process of creating parts is the same as cutting (unlike for system partitions, where we first found parts and then applied cuts that were outgoing and/or incoming). This is because the irreducible causal power we are assessing only goes in one direction: from the mechanism to the effect purview, or from the cause purview to the mechanism. So, since the direction is fixed, when we isolate the parts, we automatically have the relevant cut(s) comprising the partition.

To create a mechanism partition, we separate the mechanism–purview pair into non-overlapping parts. A part usually includes one or more mechanism units, and any positive number of purview units (including zero). The “complete partition” is the exception, in which the entire mechanism is in one part, and the entire purview in the other. (This is the only valid partition for 1st-order mechanisms.) Every unit in the mechanism and the purview must appear in exactly one part. For higher-order mechanisms, the highest number of possible parts in a given partition matches the order of the mechanism (e.g., a maximum of three parts for a third-order mechanism). However, the number of possible partitions is usually much higher as it depends on the number of permutations of mechanism and purview units.

Let’s look at the sample distinction a–cd–ad (from our sample system in the composition postulate). The figure below shows all possible partitions on the effect side on the top and the cause side on the bottom. The orange arrows indicate the direction of causal power, whose irreducibility we are assessing (from cause purview to mechanism, and from mechanism to effect purview). The boxes indicate the parts of each partition, and the dashed orange arrows indicate the “cuts” of causal power—which together make up the partition (like for the system partitions above). Since the mechanism is second-order (cd), each partition has two parts at most.

Note the various ways to partition any given mechanism–purview pair. Valid partitions always go “through the mechanism” to emphasize that we are trying to quantify the causal powers specified by the mechanism as one. The one exception is the complete partition (the furthest left), which completely separates the mechanism from its purview [1]. The labels on the top illustrate a way these partitions are notated (especially in PyPhi). For example, the label “c/Ø x d/ad” in the effect partitions is read as “part 1 is mechanism c alone (over the empty set); part 2 is mechanism d over purview ad.”

It is easy to depict an exhaustive list of distinctions for first- and second-order mechanisms and purviews. But as soon as we move to third-order, the combinatorics explode. To give a brief teaser, consider distinction ac–bcd–abd, which also featured in the composition postulate. In the figure below, the left-most partitions are shapes we’re already familiar with (the complete partition, and two-part partitions), but note the various new ways the three-part partitions can be formed. Within these various partition shapes, if you ran through all the possible permutations of units, you would arrive at all possible partitions.

For more, see Barbosa, L. S., Marshall, W., Albantakis, L., & Tononi, G. (2021). Mechanism integrated information. Entropy, 23(3), 362.

Footnotes

[1] The complete partition is identical to assuming there is no causal interaction between the purview and mechanism, which makes the partitioned repertoire identical to the unconstrained repertoire. Thus, if the complete partition is the minimum partition, the mechanism specifies all of its intrinsic information as one. (Note that a “complete partition” is not possible for systems because connections within parts are always untouched by system partitions.)

Relation partitions (“unbinding”)

Valid relation partitions are simpler to grasp than system or mechanism partitions. To assess the irreducibility of a relation, we don’t use “cuts” but rather we “unbind” the distinctions that constitute it, one at a time, and identify the distinction that contributed least to the overlap (the minimum partition). In other words, to partition a relation we simply remove one of its distinctions.

The example below shows a second-degree relation among distinctions c and bcd. Removing distinction bcd results in the lowest value of φr. (The calculation is explained here.)

One possible misconception is that a second-degree relation would have only one valid partition: by removing one distinction, aren’t we simultaneously removing the other? This is not the case: a relation partition is not “commutative”; it rather removes one of the distinctions from the overlap. In the example, the purview elements of distinction c are identical to the units in the overlap of the relation (ad), whereas for bcd, only 2 of its 4 purview elements contribute to the overlap. So removing c would remove all its PHI (0.87) from the relation. However, as bcd contributes only half of its PHI (0.017/2 = 0.008) to the overlap, removing it is less destructive. (More details are in these slides: Computing Φ – Step 6.3: Calculate φr.)

Unit partitions

When assessing the irreducibility of a unit, we consider it as a system in its own right (constituted of a set of more-“micro” units). Thus, the valid partitions are identical to the valid system partitions (above).

As a final note, recall that for partitions of systems, mechanisms, relations, and units, the minimum partition (MIP) is always used to quantify its irreducibility. This follows the principle of minimal existence and is used to avoid overestimating irreducibility. The MIP is always the one that makes the least difference—the partition that minimizes its integrated information.

For more, see:

Cite this FAQ

Juel, Bjørn Erik, Jeremiah Hendren, Matteo Grasso, and Giulio Tononi. "FAQ: What is a valid (or 'legal') partition?" IIT Wiki. Center for Sleep and Consciousness, University of Wisconsin–Madison. Last modified June 30, 2024. Available at https://www.iit.wiki/faqs/technical (DOI: https://doi.org/10.5281/zenodo.14160283).

Why is Φ impossible to calculate for systems larger than a few units?

Coming soon.

How do we determine the causal grain at which integrated information (φs) is maximal?

This FAQ is under development. For the time being, see Marshall, W., Findlay, G., Albantakis, L., & Tononi, G. (2024). Intrinsic Units: Identifying a system’s causal grain. bioRxiv, 2024-09:

“...According to IIT, a substrate of consciousness must be a system of units that is a maximum of intrinsic, irreducible cause–effect power, quantified by integrated information (φs). Moreover, the grain of each unit must be the one— from micro (finer) to macro (coarser)—that maximizes the system’s intrinsic irreducibility (i.e., maximizes φs). The units that maximize φs are called the intrinsic units of the system. This work extends the mathematical framework of IIT 4.0 to assess cause–effect power at different grains and thereby determine a system’s intrinsic units. Using simple, simulated systems, we show that the cause–effect power of a system of macro units can be higher than the cause–effect power of the corresponding micro units. Two examples highlight specific kinds of macro units, and how each kind can increase cause–effect power. The implications of the framework are discussed in the broader context of IIT, including how it provides a foundation for tests and inferences about consciousness.”

Why do we normalize φs when searching for the minimum partition?

Coming soon.

Page updated

Google Sites

Report abuse

FAQs

Computing Φ (technical): identifying a complex & unfolding its Φ-structure

How do we get a TPM?

How do we obtain the “backward” probabilities needed for cause selectivity?

What is a valid (or "legal") partition?

System partitions

Mechanism partitions

Relation partitions (“unbinding”)

Unit partitions

Why is Φ impossible to calculate for systems larger than a few units?

How do we determine the causal grain at which integrated information (φs) is maximal?

Why do we normalize φs when searching for the minimum partition?

Computing Φ (technical):
identifying a complex &
unfolding its Φ-structure