Part 1: Validating the Input
Applied Management and Computing Division
Email: Theresa.McLennan@lincoln.ac.nz or Clare.Churcher@lincoln.ac.nz
(Published in the Chartered Accountants Journal of New Zealand, 78, 2, 21-22, March 1999.)
Previous articles (Churcher & McLennan 1998a, 1998b) have discussed the appropriateness of using a spreadsheet or a database for storing complex data as opposed to performing calculations.
The conclusions were that database software provides considerably more control over the accuracy, integrity and versatility of the data. However designing and implementing a database is not a trivial job and pragmatics compel many users to opt for the immediacy provided by a spreadsheet.
Given that many small (and not so small) collections of data will be stored in a spreadsheet it is useful to consider the ways in which inconsistencies and inaccuracies can be minimised. The first of these is to become familiar with the new and comprehensive data validation tools available. As an example we will describe some of the features available in Excel.
Why use Data Validation?
Data validation features can be used to prevent invalid data being entered into spreadsheet cells.
Even more usefully, in combination with spreadsheet auditing, they can be used to check data imported into Excel and existing spreadsheets created in older versions of Excel. Imported data may have come from an accounting package, word-processed document, database or from elsewhere.
Just how accurate is the data in your spreadsheet? You might be surprised, to find just how much misspelt or inconsistent data there really is. For example, a spreadsheet used for four or five years, for our teaching purposes, had references to Christchurh as well as Christchurch. Not too much of a problem in this context - but had we tried to do some statistical analyses by City, or if the data had been more critical, then the consequences may have been serious.
What is Data Validation?
Validation is used to check a value against your specified criteria before it is entered into a spreadsheet cell. If it does not match the criteria then there are user specified options to reject it outright or to warn or inform the user. The criteria can be placed on individual cells, ranges or whole columns.
The table below shows a few records from a list of expenses for building a studio.
|Multi Machinery Ltd
||shower door & glue
|Upper Ricc. Hardware
|Upper Riccarton Hardware
The Data Validation menu option could easily be used here to ensure:
- The expense Category was one of a predetermined list of values: Building, Decorating, Electrical, Labour and so on
- That individual Costs are in a certain range: eg. Between $0 and $2,500
- That the only entries in the GST column are "*"s.
The American spelling of labor in the third row will mean that any statistics or reports based on Category will be incorrect. This could have been easily prevented if the checks above were enabled. The user would be presented with a pulldown list of the possible alternatives, from which to select a value.
Similarly the value in the Costs field can be checked to see if it lies within a pre-described range. The user will be alerted if the cost lies outside the range but can choose to override the warning if appropriate.
Validation can also be used to check the values generated by formulas. This would be particularly useful where a complex formula maps data onto a discrete set of values. Similarly, the values placed in cells by macros can be checked also.
Validating an Existing Spreadsheet
Data validation checks take place when data is typed into a cell. For older spreadsheets converted to Excel 97 and when data is imported into Excel, these checks do not occur automatically. However, once the data has been parsed into an Excel format, checks can be applied. This requires a two step process. In the first step, the required Data Validation checks are placed on the data range. No checking of the existing data occurs at this stage. In the second step the audit toolbar is turned on from the Tools Audit menu. Clicking on the Circle Invalid Data tool highlights any cell violating the validation criteria. The screen dump below shows the result, where our studio costs data has the validation checks described previously. The circled data can be checked and the misspelling corrected. In the case of the cost of $2800, this can be validated as being correct if that is appropriate.
Other Techniques for Validating Existing Data. You may well have data where there are too many possibilities to put into a validation list. A useful technique here might be to generate a list of all the different values in a field. The incorrect values can then be spotted by manually scanning through an alphabetically sorted list and remedied by using search and replace options.
|Roofline Products Ltd
|Upper Ricc Hardware
|Upper Ricc. Hardware
|Upper Riccarton Hardware
How do you generate the list? The old fashioned way uses a criteria table and Advanced Filtering. An easier way is to create a Pivot Table. Make the field you are interested in the column field and the data field can be designated as a count of this field. In the example to the right misspellings and inconsistencies for the Supplier's name have become obvious.
In summary, three techniques have been useful to track down data errors in an existing spreadsheet. These are • Data Validation • The Audit toolbar. • Pivot Tables.
In our next article we will outline further uses for the Auditing features.
Churcher, C., McLennan, T. Are spreadsheets enough? Chartered Accountants Journal of New Zealand, 77, (3), 81-82, 1998.
Churcher, C., McLennan, T. Relational databases. Chartered Accountants Journal of New Zealand, 77, (5), 73-74, 1998.