"Despivotar" os dados (EN)

Data de publicação: Dec 09, 2020 5:11:57 PM

Dados pivotados?

Let's consider an example:

Students have volunteered to take part in activities to raise funds for charity, and we want to analyse the proceeds to see which activities were most successful, which colleges raised the most, and so on.

We may be tempted to lay our data out like this:

It's a nice, useable layout: each student with their own row; columns for each activity; values entered for the amounts raised in each case...

The trouble is, that data's already been pivoted!

What does that actually mean? Well, let's say we wanted to tot up the amount raised by each college or yeargroup. It's going to take an awful lot of messing about because our amounts raised are in multiple columns. Really we want to have the amounts raised in a single "Amount" column. And that means we'll also need a corresponding "Activity" column. And that means we need to do something to our data that might seem counter-intuitive — we need to do this:

This may look odd, particularly the repetition, but it means that each row is a collection of separate data items including all relevant information for each instance of a student doing an activity. And the stuff we want to be able to measure (specifically the amount raised) is in the one easy-to-handle column.

Data structure rules

The rules we talked about regarding lists are doubly true here:

Planning the ideal data structure

Here’s one approach that may help you plan or check your data structure. We’ll continue to use the student fund-raising example.

First, identify the data that provides the individual values that are most important to your analysis — in this case, it’s the amount raised at a specific activity undertaken by a specific student.

Next, surround this with the collection of related attributes:

Also include ‘attributes of attributes’ if you are likely to make use of them. For example, college and year are technically attributes of the student, but since we are storing all the data in one table, these must also be regarded as attributes of the amount, and included with each value.

These attributes should then be the additional columns used in the data set, so each amount has a corresponding activity, student, college and year.

Lowest common denominator data

Pivot tables work by grouping like values together. So it will look at our data-set, spot three rows of David Joneses and start merging them. Which is all well and good if we've only got one David Jones in our student cohort. But perhaps we've got two of them (like the one from The Lower Third and the one from The Monkees). If we don't want these two entities mingling together into some frightening gestalt, we're going to need to provide some means of disambiguation. Perhaps the year and/or college would be sufficient to do that, or maybe we're going to need a unique student ID.

Como despivotar os dados?

Unpivoting in Google Sheets is a bit harder. 

There are a few methods you could use, all of which involve building up a coordinate map of your data and then restructuring it in a new location, including this example using OFFSET, ROUND, MOD, COLUMN and ROW.

However, perhaps the most elegant method is the player0/Ben Collins approach which unpivots a sheet using this beautiful formula:

=ARRAYFORMULA( QUERY( IFERROR( SPLIT( TRIM( TRANSPOSE( SPLIT( TRANSPOSE( QUERY( TRANSPOSE( QUERY( TRANSPOSE( IF( Sheet1!B2:Z<>"", Sheet1!A2:A&"🐒"&Sheet1!B1:1&"🐒"&Sheet1!B2:Z&"🎾", )),, 500000)),, 500000)), "🎾"))), "🐒"), ""), "SELECT * OFFSET 1"))