Pivot tables are very useful for summarising data but they do pose some restrictions on how data and related formulas are stored. This article describes a situation where the data layout (XLS 79 KB), chosen for its convenience, was inappropriate for summarising using a pivot table. Read on to find out why and how we got around the problem.
Recently a friend sent us a very simple spreadsheet containing some data that she wanted to chart in summary format. She thought she should be able to summarise it with a pivot table. No matter what she did she couldn’t figure out how to do it. The data looked like this:

(Conditional formatting has been used to automatically colour the cell backgrounds. See the FAQ for a note on how to do this.)
There was a column for each person employed. For each day the person’s status was recorded. This could be worked away (wa), worked at home (wh) or they had the day off (o). The required chart, see below, was to show how many days each person had worked and where for the specified month.

- Yes, we are aware that Clara seems to have a problem (she has only values entered for 16 days) and we also know that some other people have missing entries. We shall come to that later!
- This chart is easily produced from the pivot table shown here. The only problem is that our data does not easily yield such a pivot table!
Our anonymous friend thought that she should be able to get a pivot table like the one shown from her data. When she tried she could get a sensible summary for each person, one at a time, but not one that showed everyone in the same table in an acceptable format. It was then that she asked us for help.
What is the problem here?
When our friend set up the spreadsheet she used a different column for each Person and they each had a Status (wa, wh, or o) for every Date. This suited her purposes as it was convenient for data entry and it was easy to see what each person did on each day. However it essentially makes each person into an independent variable rather than different values or categories of a Person variable. It ignores the fact that a person is actually either an Al or a Bob and can’t be both! It is, in fact, likened to storing your data in a database report format rather than a database table format.
A consequence is that it is not possible to usefully summarise data in this format, in the required form, using a single pivot table. This example for just Al and Bob highlights the problem. It just can’t be transformed so that it looks like the first two rows of the table above. It is like saying that the values for Al don’t necessarily mean the same as those for Bob. Because we know better we can use it to tell that Bob and Al were both off (o) together on 6 days and never worked away on the same day. Mind you we could easily have seen that from the original data. Furthermore, once we introduce more than two people to the pivot table at once it becomes much more complex and we can’t easily graph its data either.
How does it need to be stored in order that it can be summarised with a single pivot table?
If the data is stored in a relational database table format then it is easy to generate the required pivot table. We now have columns for Person with values (Al, Bob, … ), Date and Status (wa, wh, o). For each day we now have multiple records instead of a single record. We now have a record for each person for each day. If the data is normalised in this way, it is a very straightforward operation to generate the pivot table above. We can also make pivot tables to answer questions such as who was working away on a particular day. However, the cost is that the data is no longer stored in the way that the user wishes to see it and, most often, use it!
Summarising the data, in the original format, using other techniques
Back to the original question. How can we summarise the data so that we can draw the graph?
The easiest way is to use COUNTIF formulas. The formula in the selected cell (B51) illustrates its use. B$18:B$48 are Al’s status values and Worked_away is the range name of cell A51 which contains the criteria we want to match. Similar formulas have been created for the rows 52 and 53. Note that we’ve made sure we have left at least one empty row below the data before we put in our formulas. This is essential to make sure list features such as sorting and filtering and pivot tables work properly. We didn’t need to set up the column headings again but it does make it easier to draw the chart.
Missing data
The original dataset has a problem in that it contains empty cells, particularly in Clara’s column. It is tempting to assume these should really be o’s for off. We don’t know the answer! At data entry time it would have been sensible to check for empty cells (filtering is a good way). Comments in cells are useful for documenting odd data features that should really be there.
How general is this problem?
It is not uncommon for spreadsheets to be set out with the values of an attribute used as column (field) labels. In the case we have discussed we have people, one per column, who have one of three statuses for each day. Another similar situation is where a set of keywords are used as the column labels and the values are just yes or no or the equivalent. This is the case in our example below where values of a Hobby attribute are the column labels and ticks are used to represent the yes values.
Another situation where categories as columns are commonly used is when survey data is stored in spreadsheets. Counts of different species of insects on different days or traffic surveys, etc are often stored in this way in a single sheet as in the partially completed example here.
Sometimes there are multiple sheets for different years. That really does make it tricky when summaries are required!
Read more about storing data in different arrangements and the problems that result in the IADIS 2007 Conference Paper: End User Database Management: Trade-offs when Storing Multi-Valued Data in Spreadsheets (PDF 126 KB).