"Despivotar" os dados (EN)

Data de publicação: Dec 09, 2020 5:11:57 PM

Dados pivotados?

Let's consider an example:

Students have volunteered to take part in activities to raise funds for charity, and we want to analyse the proceeds to see which activities were most successful, which colleges raised the most, and so on.

We may be tempted to lay our data out like this:

It's a nice, useable layout: each student with their own row; columns for each activity; values entered for the amounts raised in each case...

The trouble is, that data's already been pivoted!

What does that actually mean? Well, let's say we wanted to tot up the amount raised by each college or yeargroup. It's going to take an awful lot of messing about because our amounts raised are in multiple columns. Really we want to have the amounts raised in a single "Amount" column. And that means we'll also need a corresponding "Activity" column. And that means we need to do something to our data that might seem counter-intuitive — we need to do this:

This may look odd, particularly the repetition, but it means that each row is a collection of separate data items including all relevant information for each instance of a student doing an activity. And the stuff we want to be able to measure (specifically the amount raised) is in the one easy-to-handle column.

Data structure rules

The rules we talked about regarding lists are doubly true here:

Data should be entered down the page, each item occupying a new row, so that each row contains one instance of the value to be used in calculation (e.g. the amount raised in the above example)
Each column should contain just one type of information (e.g. text, number, date), and each cell should contain just one value
Enter column headings in one row at the top of the list – never use more, and never merge cells for labels
Do not repeat attributes across several columns (e.g. month names) no matter how tempting it might be!
Do not leave whole rows or columns empty (some blank cells are fine)
Stick to one dataset per tab of a spreadsheet file

Planning the ideal data structure

Here’s one approach that may help you plan or check your data structure. We’ll continue to use the student fund-raising example.

First, identify the data that provides the individual values that are most important to your analysis — in this case, it’s the amount raised at a specific activity undertaken by a specific student.

Next, surround this with the collection of related attributes:

Also include ‘attributes of attributes’ if you are likely to make use of them. For example, college and year are technically attributes of the student, but since we are storing all the data in one table, these must also be regarded as attributes of the amount, and included with each value.

These attributes should then be the additional columns used in the data set, so each amount has a corresponding activity, student, college and year.

Lowest common denominator data

Pivot tables work by grouping like values together. So it will look at our data-set, spot three rows of David Joneses and start merging them. Which is all well and good if we've only got one David Jones in our student cohort. But perhaps we've got two of them (like the one from The Lower Third and the one from The Monkees). If we don't want these two entities mingling together into some frightening gestalt, we're going to need to provide some means of disambiguation. Perhaps the year and/or college would be sufficient to do that, or maybe we're going to need a unique student ID.

Como despivotar os dados?

Unpivoting in Google Sheets is a bit harder.

There are a few methods you could use, all of which involve building up a coordinate map of your data and then restructuring it in a new location, including this example using OFFSET, ROUND, MOD, COLUMN and ROW.

However, perhaps the most elegant method is the player0/Ben Collins approach which unpivots a sheet using this beautiful formula:

=ARRAYFORMULA( QUERY( IFERROR( SPLIT( TRIM( TRANSPOSE( SPLIT( TRANSPOSE( QUERY( TRANSPOSE( QUERY( TRANSPOSE( IF( Sheet1!B2:Z<>"", Sheet1!A2:A&"🐒"&Sheet1!B1:1&"🐒"&Sheet1!B2:Z&"🎾", )),, 500000)),, 500000)), "🎾"))), "🐒"), ""), "SELECT * OFFSET 1"))

Make a copy of your list on a new sheet and reduce it to an index column and the columns you need to pivot (in our case we'd have the students column as our index and the activities columns for unpivoting. This effectively gives us a header row, a header column, and our values;
Use the "monkey tennis" formula on a new sheet, making sure that it points to the sheet with your data on (the formula assumes it's called Sheet1);
The formula basically reduces your list to a coordinate set (using some emoji delimiters) and then breaks it apart in its new form. You'll need to add new headers for your resulting unpivoted data;
In our case we need to add back the student attributes of year and college which we could do with a VLOOKUP (or similar formula) back to our original data.