n-Fold Cartesian Product Formula

This explains a formula used in multidimensional modeling to calculate all combinations of dimension indexes.

The Problem

In multidimensional modeling (MDM), we often need to combine several dimensions and we must make sure to include all possible combinations. This formula calculates those possible combinations and returns an index to each dimension's instances.

What is a Dimension?

In MDM, a dimension is a category of things. In the example we will explore later, we will find the dimensions: Distribution Regions, Market Sectors, Products and Months. In table based modeling (TBM) we put each dimension into its own table. At right is the Regions table.

Each dimension has instances. An instance is a single occurrence of a dimension. Our Regions dimension has the following instances: North, South-East, South-West, East and West. In TBM, each instance is placed in its own row within the dimension's table.

Each instance has attributes. An attribute is something that defines or differentiates an instance. Our Regions dimension has the following attributes: ID, Name, and Unit Delivery Costs.

tblRegions

What does "Multidimensional" mean?

When all instances of two or more dimensions are needed to calculate something, that calculations is multidimensional. At right is a table with two dimensions: Sectors and Products. To calculate the values in the last three columns we need attributes from each Sector and each Product. The results of these calculations are unique (in this case) to each Sector/Product combination. In set theory, we call each unique combination of two dimensions an Ordered Pair. We can see the ordered pairs in the second and third columns labeled Sector and Product respectively. These are indexes pointing to the instance (row) in each dimension's table. The combination in each row is unique, meaning no other row contains the same combination of indexes.

There are eight rows in this table. The number of rows is determined by the product (multiplication) of the number of sectors and number products. In this case, we have 4 sectors and 2 products; thus, the product is 4 x 2 which equals 8. 8 unique ordered pairs represent the finite list of all sectors and products. We call the finite list of all ordered pairs, the Cartesian Product.

tblSPC

How do we create a "Cartesian Product"?

The n-fold Cartesian product formula calculates a column of all indexes for a dimension in a pattern that when combined with another dimension's column of indexes, we end up with unique ordered pairs that form the Cartesian Product. This formula is not limited to an ordered pair which is also known as a 2-tuple. We can use this product over as many dimensions as we like. When we have n-dimensions we refer to each unique combination as an n-tuple; thus, I call the formula the n-fold Cartesian product formula, which is a name I proudly stole from Wikipedia. Here is the formula:

=MOD( QUOTIENT(<TC>-1,<PD>), <CD>) + 1

where:

<TC> = Tuple counter

<PD> = Product of all lower dimension instances

<CD> = Current dimension's instances count

I find the formula a bit baffling and I created it (or at least, I didn't copy it from anyone) so I imagine an explanation is in order. Let us first tackle <TC>, our Tuple Counter.

Tuple Counter

Oddly enough, in database terms each row in a table is referred to a tuple and in TBM (table based modeling) we place each ordered pair (2-tuple) in its own row. Whether we want to think of a tuple in database terms or mathematical terms, the first thing our formula needs is the number of the tuple we are working with. To calculate the row number of a table (thus, the tuple number) we use this formula:

=ROW()-ROW([#Headers])

1st Dimension

Now let us take a closer look at the Sector Product Calcs table, examine the patterns, and figure out how to replicate them. In the last column (Products) we see a pattern:

1, 2, 1, 2, 1, 2, 1, 2

We have two products (Current Dimension instances or <CD>). There keys are 1 and 2. We need a formula to generate 1, then a 2, and then repeat. So when the tuple counter (<TC>) equals 1, we need 1. When <TC> is 2, we need a 2. When <TC> is 3, we need to go back to 1. A way to generate a sequence of two numbers that repeat is to use division and take the remainder. If we divide <TC> (1,2,3,4...) by 2 (<CD>) and take the remainder we get the pattern:

1, 0, 1, 0 ...

Well, at least the pattern repeats every two numbers. We can get closer to our pattern by subtracting 1 from <TC> so it starts it at 0 instead of 1. When we divide 0,1,2,3...(<TC>) by 2 (<CD>), the remainder pattern changes to:

0, 1, 0, 1 ...

We are getting close! If we add one to the result our pattern is:

1, 2, 1, 2 ...

Success! An easy way to get the remainder of a division is to use the MOD() function. Thus our formula at this point is:

=MOD( <TC> - 1, <CD>) + 1

NOTE! We can always use this formula for the first dimension.

2ns Dimension

In the first column (Sector) we see a pattern:

1, 1, 2, 2, 3, 3, 4, 4

There are four sectors with each repeating twice. Why twice? Because we have two products and we need to match each product to each sector. To generate this pattern we can use division again but this time, take the quotient. The quotient is the result of division without the remainder. Thus, if we divide <TC> (1,2,3,4...) by 2 (the number of products) and take the quotient, we get

0, 1, 1, 2 ...

Ugh! It doesn't look close. But wait! If we subtract one again from <TC> the pattern changes to:

0, 0, 1, 1 ...

And if we add one to the result we get:

1, 1, 2, 2 ...

Success! Excel has a QUOTIENT() function. Thus, our formula for the sectors is:

=QUOTIENT(<TC> - 1, <# of Products>) + 1

NOTE! We can always use this formula for the second dimension.

Going 3D+!

The above works great for 2 dimensions. But does it work for 3 dimensions, say, if we add Months before Sectors (see figure at right)?

There are two problems introduced by more than 2 dimensions.

Problem #1 - Counting & QUOTIENT()

With 2 dimensions we used QUOTIENT() to 'slow' incrementing our 2nd dimension's index to once for every set of 1st dimension items. QUOTIENT divided our tuple counter by the number of instances in the 1st dimension. When we go to 3 dimensions we must 'slow' incrementing to once for the product of lower dimensions. We use <PD> to represent the product of lower dimensions; thus, for dimensions higher than the 1st, our QUOTIENT() must be modified like this:

=QUOTIENT( <TC> - 1, <PD>) + 1

Problem #2 - Wrapping & MOD()

In the 2 dimension example, the 1st dimension had to restart its index back to 1 when it tried to increment beyond the number of instances in that dimension (which we labeled <CD> for current dimension). Wrapping is a term used to describe the index restart. No matter how many dimensions we have, the last dimension does not wrap, but all lower ones do.

In our 2 dimensional example we used MOD() and the tuple counter to determine when to wrap our index. The 1st dimension wrapped when the tuple counter reached the number of instances in the 1st dimension. In the 3 dimensional situation, we should shift from using the tuple counter to using the current dimension's counter as described in Problem #1. Thus our MOD() formula changes to:

=MOD( QUOTIENT(<TC> - 1, <PD>), <CD>) + 1

This one formula will work for any dimension with one small change. When used for the 1st dimension, which has no lower dimensions, we replace <PD> with 1.

Is it possible to make it simpler?

I'm so glad I asked that. YES! We can with LAMBDA(). With LAMBDA() we can add a formula that looks like this:

=CrtIdxλ(A1:C1)

Where A1:C1 contains the number of dimension instances. CrtIdxλ requires a horizontal array of dimension instances which can be a contiguous row of cells like A1:C1 or an array of values which we can create using HSTACK() like so:

=CrtIdxλ(HSTACK(Sectors, Products, Months))

From that one formula, placed in one cell, ALL tuples for as many dimensions as we pass to it will #SPILL automagically. We can import this formula from my GitHub gist and load it into name manager using the Advance Formula Editor add in (free) or we can create a name (CrtIdxλ) and copy the formula to it.

CrtIdxλ = LAMBDA(haDimCount,

LET(vaCounter, SEQUENCE(PRODUCT(haDimCount)),

Repetitions, SCAN(1, haDimCount,

LAMBDA(Accumulator, Elements,

Accumulator * Elements)) / haDimCount,

Repetition, QUOTIENT(vaCounter - 1, Repetitions),

1 + MOD(Repetition, haDimCount)

)

Where haDimCount is an horizontal array of dimension element counts

Credit Peter Bartholomew for converting the n-Fold Cartesian Product formula into a LAMBDA().

Conclusion

Admittedly, this formula is complicated, but it is just one standard formula that is used throughout multidimensional modeling... that is... unless we use Power Query, in which case, we don't need it at all. But that's another discussion.

LAMBDA: SumPeriodsλ

Table Based Modeling

Page updated

Google Sites

Report abuse