Using Dfunctions in Data Tables

This note describes the use of the D or database family of functions and their application in summarising databases with Data Tables. Such data tables provide a useful way of applying the same formula to a set of different but related criteria. We’ll start with a quick overview of Dfunctions.

Enlarge this imageThe DFnData Tables example spreadsheet (XLS 65 KB) uses data from a survey of first year university students. For ease of analysis we’ve added the Age Group column with a calculation to assign the status of youth (under 19), adult (19 – 24) or mature (25 or over). The entire table, including the headings, has been named AllData to make it easier to refer to the table in formulas. (This may seem strange but it is common to refer to university students who are 25 or older as “mature” or “mature aged” students). 

Hint: There are lots of features in spreadsheets that assume that data is set up just like this. It looks like a database table with a row of field names directly followed by the data.  In spreadsheets a single table has traditionally been called a "database". In Excel 2007 there is a table feature and in Excel 2003 there was a list feature. These features are designed to automate some of the useful database features. Such tables don’t have to be set up in the top corner of a sheet but this is good idea. Always make sure that it is surrounded by space. i.e., Make sure you leave an empty column and row after it.

The database family of functions

These functions, sometimes referred to as the database statistical functions or the Dfunctions, select data to summarise by using separate criteria. We call these criteria tables or criteria ranges but they’re also referred to as advanced criteria (Excel Help) and search criteria (OpenOffice.org). The latter is confusing as it is also used for the expressions entered into the second and any subsequent rows. We shall look at some simple examples of criteria tables first.

The first row of a criteria table normally consists of field names. The other rows are the criteria to be matched. It is a very good idea to copy the field names for criteria tables from the database as they must be spelt exactly the same. This, for example, avoids problems caused by having significant spaces at the end of field names!Enlarge this image


Enlarge this imageThe first example, Criteria1 in  DFnData Tables example spreadsheet (XLS 65 KB), can be used to match all males students aged over 20. Because no restrictions have been placed on other fields such as Height, these could have been omitted from the criteria table resulting in the two column table on the right.

In the second example, Criteria2 in  DFnData Tables example spreadsheet (XLS 65 KB), there are two rows of criteria to be matched.  The students can either be females weighing 55 kgs or less, or male and under 65 kgs.

The rules used in setting up criteria are:

  • criteria in the same row must both or all be matched (logical AND)
  • sets of criteria in different rows are alternatives (logical OR)

For more about criteria tables in Excel search for the help topic “Filter by using advanced criteria”.

Back to the database functions themselves.  These all have the form:

    Dfunction(database, field, criteria)
    Where function can be COUNT, COUNTA, SUM, MAX, MIN, AVERAGE, etc;
    database is the database range, including the field headings;
    field is the field to be summarised (several different formats are permitted but the field name in double quotes is preferred);
    criteria is the range of the separate criteria table.

Enlarge this imageThis spreadsheet fragment from the Database Functions sheet in  DFnData Tables example spreadsheet (XLS 65 KB) has a number of examples of database functions all using AllData as the database and Criteria as the criteria. Rows 15 – 17 illustrate one of the common problems with counting. The DCOUNT function, as with the COUNT function, can only be used to count numbers. DCOUNTA is used to count non-blank cells.

On the subject of counting and empty cells, care needs to be taken when using DCOUNT/ DCOUNTA formulas to count the rows in a database.  If there are missing values in the chosen column then clearly the total won’t be accurate.

Making data tables which use formulas with Dfunctions

Having talked about criteria tables and Dfunctions we are now in a position to talk about data tables. These have been spreadsheet features since the very early days. The most common application of them is for performing efficient “what if” analyses. There is a lot of material in books and on the web about this usage, so we won’t discuss it here.

The application we are going to describe has quite a different purpose and there is much less written about it. The data tables we’re going to use are summary data tables. Here the data table is a device for performing the same calculations with different sets of criteria. Such tables are useful because once set up they behave like conventional formulas. The setting up is the tricky bit as it is just not possible to make the formulas and then to copy them down through the rows in the table, as you would generally do with spreadsheet formulas. This is why they have to be data tables. We shall firstly talk about one variable summary data table and then go on to talk about two variable tables.

One variable summary Data Tables

We shall look first at what we are going to produce and then talk about how to do it in more detail. We start with a simple criteria table such as the one shown here and use it to produce the data table below.

Enlarge this image

In this case Age Group is our variable. We are summarising the database in terms of its three different values.  These are the yellow cells in the table. The purple cells all contain Dfunction  formulas which refer to the separate Age Group criteria table. Note that as set up this criteria table will match all data in the database as no search criterion is specified. The Count of 124, generated by the formula =DCOUNTA(AllData,"First Name",CriteriaDT1), is the number of data rows in the database. CriteriaDT1 is the name of the criteria table above. Similarly the average weight of 68.0 is the average of all the data.

The three rows of turquoise cells contain the results of the data table calculation.  Each row is automatically generated by applying a different search criterion to the purple formulas.  Thus the criteria tables used successively become: 

 

 

Steps for generating a one variable summary Data Table

Enlarge this imageWe shall go through setting up the data table above step by step.

  1. Set up the column of values of the variable and on the same sheet the related but separate criteria table.
  2. Above and immediately to the right of the variable values make the first, of the one or more, Dformulas. At this stage the data table will look like this. Hint: Before you go any further test the formula to make sure it is working properly. If you copy one of the variable values, e.g. youth in this case, into the criteria table the value calculated should change appropriately.
  3. Set up the rest of the Dformulas
  4. Enlarge this imageTo generate the answers first select the range that includes the column of variable values and the formula row and then Data | Table from the menu in Excel 2003 (or earlier versions) and Data | What-if Analysis | Data Table in Excel 2007
  5. Now comes the bit people find tricky. You will be prompted with the (Data) Table dialog box. In this case because we only have Age Group as a table variable we only have one input cell. The variable values were put in a column (they could alternatively have been in a row) so we have a column input cell. This is the single cell, coloured yellow in our example, in the search criteria row of the separate criteria table.
  6. You should now have your completed data table. Add suitable headings, etc.
  7. Enlarge this imageThe completed data table showing the formula in the first result cell.  Note that the formula {=TABLE(,A5)}  is the same in each of the turquoise result cells.  The  {and} are automatically generated and indicate that it is an array formula.  However, this formula must be generated using the Data Table feature and cannot be entered manually.  The  (,A5) is because A5 is the column input cell and there is no row input cell (step 4 above).

Modifying an existing Data Table

Enlarge this imageBecause the result cells of a data table form an array, they cannot be edited individually. If you try to modify just one of the calculated (turquoise cells) you get an error message about changing part of the table. If it is necessary to add extra columns, etc, delete all the result cells and recalculate using the new table range. To delete the result cells select them all first and then press delete. Take care not to include the row of formula cells in your selection.

Two variable summary Data Tables

As the name suggests a two variable data table has a row variable as well as a column variable. Consequently it can’t have a row of formulas but instead it has a single formula located at the intersection of the row and column. Two variable summary data tables refer to a criteria table with columns matching both the row and column variables. In this example the single (DCOUNTA) formula is in cell C9.

Enlarge this image

 

 

 

 

 

 

Enlarge this imageAs you would expect the input cells are cells from the second row of the criteria table.  In this case the row input cell is B5 and the column input cell is A5.

The procedure for making the table is similar to the one variable summary data table. Set up the variables, the criteria table and the single formula first. Select the table range and execute the Data | Table command (or Data | What-if Analysis, Data Table in Excel 2007) specifying the two input cells.

Some other bits and pieces that you may want to know

Data tables versus pivot tables

An alternative way of getting the same information is to use a pivot table. It is quite tricky to get a single pivot table to have the layout of a one variable data table.  Another problem is pivot tables generate numbers rather than creating formulas. Consequently they need to be refreshed, when the underlying data changes, as they do not automatically update. On the other hand pivot tables don’t have the problem described below in “a note of warning”.

Criteria tables again

The criteria table related to a summary data table can have additional search criteria such as the example shown here.  This could be used to generate a one variable data table like the one above except that only female data would be included.  A little thought would suggest that only a single row of search expressions can be used (no OR only AND operations!).

Hiding the formulas

Some people prefer to format the data table formula row or cell so that it is hidden.  This can be done by using the custom format ;;;

A note of warning

There is one situation where data tables summarising databases can give misleading results. It relates to how criteria tables work. Suppose our Age Group field had a category of maturer as well as mature. Then the criteria table would match all the maturer values as well as the mature values leading to inflated counts, etc for the mature row in the data table. This is because there is an implied wildcard in the search criterion, mature is behaving as if it was mature*.  The cure is two enter the variable values in data tables using formulas of the form   =”=mature”.

Read a little more about this in Excel help topic: ‘Filter by using advanced criteria”.

Automatic except tables recalculation

Data tables have been features of spreadsheets for more than 20 years. When they were first implemented, they were widely used because they helped overcome the problems of limited computer memory and slow recalculation speed. If you wish, in a large spreadsheet with a data table, to speed up its recalculation you can still choose the Automatic except tables recalculation option from Tools | Options | Calculation dialog pre-2007 versions or from Formulas | Calculation | Calculations Options in Excel 2007. With this option turned on it is necessary to use the calculate now key [F9] to force data table recalculation. We should probably not tell you about this because if you have a play with it and forget to turn it off again you may wonder why your summary data table is not applying the criteria properly!

Data tables in spreadsheets other than Excel

All the common spreadsheets have data tables and Dfunctions. We believe most can still make summary data tables except for OpenOffice.org Calc. Calc’s data tables are called multiple operations. They don’t appear to be able to be used to generate summary tables referring to criteria tables.

For other examples of data tables summarising databases, you might like to look on the Microsoft website, at http://support.microsoft.com/kb/282851.


Page last updated on: 07/07/2010