Welcome to Luca!globe
 The CPA Journal Online Current Issue!    Navigation Tips!
Main Menu
CPA Journal
Professional Libary
Professional Forums
Member Services
Nov 1992

Analytical procedures that work. (includes related article) (November is Computer Month)

by Schmee, Josef

    Abstract- Exploratory data analysis (EDA) is an analytical method in which similar data obtained from different sources are statistically analyzed. Irregular or questionable patterns that emerge are further examined with the aid of computer-generated graphs, diagrams or plain numerical data. The goal of this approach is to enable those who have knowledge of data to investigate patterns within these data. Application of the EDA in an accounting firm revealed its possible uses in the area of auditing. It was shown that EDA could direct attention to areas that need to be investigated and could lead to a better understanding of the audit client and the industry to which it belongs. The approach was also found to facilitate increased efficiency in computer-enhanced auditing. In addition, EDA proved useful in generating information that clients can utilize for management purposes.

The computer can provide a resourceful look at data when applying analytical procedures The authors explain how this can be done and the more popular methods CPAs have used in their entrance into the computer age.

Many CPAs use computers to improve the speed of their tasks. However, using computers to change approaches or methods of performing audits is less widespread. The analytical approach presented here is a unique use of computer power that goes beyond just doing things faster. It presents a better way to look at relationships within account balances and identify account balances with a greater risk of material misstatement. The ability to determine and interpret patterns in financial information is an essential key to effective analytical procedures. Auditors are accustomed to looking for relationships and patterns in account balances or other information expected to exhibit a relationship, such as a change in vehicle expenses compared to the change in the number of vehicles owned. In "A Test Of Analytical Procedure Effectiveness" (CPA Journal, June 1992), Pany and Wheeler warn that analytical procedures are at times limited in their ability to detect errors and suggest that more sophisticated techniques may enhance effectiveness.

An approach that has proven effective in other disciplines in analyzing disparate data is known as exploratory data analysis (EDA). Under this approach, like data from multiple sources are subjected to statistical analysis without any preconceptions of what would be considered unusual or questionable. Based on the relationships of all the data from the various sources, patterns that seem unusual or don't make sense are highlighted or flagged for investigation. The data is explored and displayed on a computer screen using graphs, diagrams, or straight numerical data based on statistical models. An important element is the visual and graphical display of the data. With today's high quality monitors, the use of color to identify data and relationships becomes an added tool. EDA's objective is to allow users who have knowledge of data to explore patterns within the data. It is based on the premise that data from multiple sources can be explored and examined with the expectation of learning hidden truths. EDA's objective is compatible with the goal of analytical procedures--to identify potential misstatements, but in a wa.v that the auditor finds easy and not intimidating to use. The approach allows the auditor to see things that stand out graphically that might otherwise be missed in a simple numerical review of the data.

This type of graphical analysis is not a replacement for the traditional use of statistics in areas such as sampling but is intended to improve the persuasiveness of the information gained in performing analytical procedures. SAS 56, Analytical Procedures, states that the focus of analytical procedures should be on enhancing the auditor's understanding of the business and identifying areas of risk. EDA is able to accomplish this.

The Tools of Data Analysis

Statistical packages have changed considerably over the last few years, and have become more user friendly. Newer packages constructed around the concept of EDA lend themselves to innovative graphical approaches. Rather than performing a series of statistical tests, these programs provide a graphical analysis of data including the comparison of several aspects of the data simultaneously, which John Tukey, the originator of EDA describes as detective work.

Several acceptable application software programs perform EDA, including one designed for Macintosh computers, Data Desk, developed by Paul Velleman, relying on the work of Tukey. A highly recommended program for the PC environment is Systat. The software furnishes standard statistics but also incorporates extensive color graphics and provides for a great deal of flexibility in exploring data. Systat with Syraph requires an IBM PC, XT, or compatible; PC-/MS-DOS 2.0 or later; 640K of RAM; two floppy drives and a graphics adapter. The list price for Systat with Sysgraph at the time of our study was $795. Data Desk requires a minimum of a MAC Plus with 1 megabyte of RAM, an 800K disk drive, and either a second floppy disk drive or a hard drive. The list price was $595.

Applying the Concept

A CPA firm recognized the potential benefit of applying EDA to enhance the understanding of their clients' operations and provide for more effective and efficient audits. The firm explored how it might apply EDA to an industry in which it specialized, hoping to reduce audit costs while enhancing their competitive position by developing insights into the industry.

A Better Mousetrap

The firm's objectives for using EDA were to:

* Increase audit quality by raising the awareness and general understanding of the industry and the client position within the industry by identifying averages, variability, and extreme values;

* Increase the effectiveness of audit planning through an increased ability to identify areas of specific risk; and

* Increase efficiency by replacing other substantive tests with EDA analytical procedures thereby reducing audit costs.

The Client Base

The firm had a number of possible clients that made good candidates for EDA. The firm chose a group of 23 agencies that manage residential programs for the mentally retarded and developmentally disabled to determine if the comparative information available would enable EDA to work in the audit environment. Firms without this extensive base of information in one industry could perform the same type of analysis by obtaining the data from published industry or other sources. Data similar to that used in this study would be available from state authorities through the Freedom of Information Act. The largest agency in the study had sixteen sites, and five agencies had a single site. Many of the agencies are multi-programmed, and the operation of supervised sites was a part of a larger overall service to the mentally retarded and developmentally disabled.

SAS 56, Analytical Procedures, implies that relationships involving income statement accounts are apt to produce a higher level of assurance from analytical procedures since they are usually more predictable. The firm chose to use expense information for the individual sites. There are 82 sites managed by the 23 agencies. On the detail level there is less chance of compensating variations, and the actual variations are more visible. Since the income statement represents accounts over a period of time, relationships involving these accounts tend to be more usable.

Software Selection

The software selected was Data Desk, which required a minimum amount of time for the auditors to become familiar with its use. The ease of use enables a computer novice to become comfortable in a few hours. The original data was in a spreadsheet file, and loading it into the statistical package was effortless. The CPA does not have to be an expert in statistics to get immediate results. An ability to read charts and graphical plots is sufficient.

Data can be prepared by paraprofessionals with the analysis conducted by an experienced professional. The audit staff was enthusiastic in the use of the program, especially after the software demonstrated the ability to identify items for examination.

The Specifics

For each site, the data consisted of the bed capacity and the number of days of service provided, The recipients of the service are called clients. Most clients use the service long term and the sites are typically at a very high rate of occupancy. Occupancy is measured in this industry by total client days.

The financial information was expressed in cost per client per day. Thus, the data was comparable between sites, The underlying data of the 82 sites was used to determine how much insight could be gained by examining the data and the relationships among them.

The expense accounts were selected based upon their dollar magnitude and the need for all sites to incur the expenses, as follows.

*Support wages;

*Direct care wages;

*Clinical wages;

*Fringe benefits;


*Repairs and maintenance.

One tool the audit staff found useful was a series of graphs giving a picture of the distribution of the items over the various sites. The benefit of these graphs was that the staff could readily identify those sites that were not only high or low but also quickly identify the magnitude of the dispersion, Two simple graphs were used for this purpose, the dot plot and the box plot.

The dot plot, a one-axis distribution of the items by site, has the advantage of exhibiting individual amounts and displaying their dispersion. The box plot, a summary approach, gives a general picture of the range of most of the values and then identifies those amounts that are at the extremes. The box plot consists of three parts. The central box is a description of where the central fifty percent of the amounts occur. The line inside the box is the median and the two ends of the box represent approximately the twenty-fifth and seventy-fifth percentile of the data. The second part of the plot is the whiskers. The range covered by the whiskers represents a probable range of values based on statistical formulas. If the data is normally distributed, the whiskers should contain approximately 99% of all values. The final part of the plot represents the extreme data values. These values are plotted individually with a circle indicating a value outside the whiskers but which nevertheless might be expected to occur occasionally. An extreme value is indicated by a starburst as one that would be difficult to explain as originating from usual variability. The box plot does a better job of quickly identifying the degree of how far off a high or low value is, and the dot plot is better at providing an understanding of how uniformly the group of sites experiences costs in a particular area.

Figure 1 shows the distribution of support wages. Here the box plot alerts us to unusual distribution with several milder extreme values and three very extreme values as indicated by the starbursts.

It is also useful to know how a particular site compares not only to the total group but to those within the same agency, Figure 2 is a bar graph from a spread sheet program that was used to develop a graph. Sites belonging to the same agency are grouped with a space between each agency. The benefit of this graph is that it depicts expenses at the site level and compares it to those of all sites within the same agency. It would be expected that local sites would have similar characteristics. For example, sites 58 to 62 are part of the same organization. Site 62 was not identified as exceptional on the box plot but this graph shows that not only is it relatively high overall, it is appreciably higher than the other sites of the same agency.

The relationship between expenses of the same agencies was also beneficial in some graphs, For example, a plot of clinical care wages versus direct care wages was enhanced by adding a regression line that represented estimated average values. Since clinical and direct care wages are both directly related to the severity of the handicap of the clients, there should be a relationship between them. Amounts that were high in one wage group but not the other were deemed to be worthy of further investigation. A further identification of sites of one agency allows an evaluation of whether an agency's costs are consistently below or above average or whether they are randomly scattered around the average line.

Data Desk made the analysis even more meaningful by being able to identify the sites of the same agency by the same color and symbol. In Figure 3 A. B, and C indicate three of the 23 agencies displayed. On the computer screen they were displayed in different colors. The agencies were chosen for further investigation because of the unusual relationships between the two variables as revealed by highlighting each location, one at a time. Note the clustered nature of the "Bs"as opposed to the scatter of the "Ks" and the "Cs." This indicates that the agency identified by Bs has clinical wages consistently lower than would be predicted from the direct care wages. The agencies represented by Ks and Cs exhibited a more scattered pattern and were subjected to further tests.

Finally, the staff was provided with a distribution of all the expenses. For each site the actual expenses of the site were written next to the average for all sites. This enabled the staff to get a quick picture of where the site stood overall and to spot unusual patterns such as being at the high end for most expenses and low in one or two.

Did EDA Work?

EDA was able to provide more insight for the auditor and the client. However, a word of caution is appropriate. On many occasions the first explanations of the extreme values identified were not the cause of the variance.

For example, one that was explained quickly was the disparity in food expense. The suggestion was made spontaneously that at some of the residences there were dietary restrictions that would cause the food cost per client day to rise significantly. However, examination of grocery store register tapes indicated that the expenses could not be explained by dietary reasons but were the result of what appeared to be questionable purchases.

Another examination was in the area of transportation expense. One site had a vehicle to take the clients to a treatment program during the day. The site was the furthest from the treatment facility of all the sites in the agency. The vehicle was also used for an annual trip to a vacation site five hundred miles away. These circumstances were given as the immediate explanation for high transportation costs. Examination of odometer readings on repair bills indicated the vehicles had traveled significantly more miles than the explanation could account for.

Further investigation was called for to check for unauthorized use.

The relationships between various expenses were easy to see with the use of plots of one expense against another. The cases that did not make sense were frequently caused by errors in classification and poor allocation methods.

EDA or Not EDA

The results of this test indicate that there can be a use for EDA in an audit. Its statistical graphics increased the auditor's understanding of the industry and the client. Current professional standards recognize that analytical procedures can be used to reduce other substantive tests. The use of EDA on the income statement items can do this. In addition, EDA gained popularity with client management by frequently providing information on characteristics of which the client was not aware. EDA not only can lead toward more efficient computer enhanced auditing but can be used as a basis for management assistance to clients. It also offers a very distinctive way of separating the competition in a proposal situation.

There was no question as to the ability of EDA to identify areas to examine and its ability to give the auditor more insight into the client and its industry. EDA worked because it was clear that the problems it uncovered would not have been revealed under more typical audit routines.

Its efficiency is more subjective. To have a cost benefit the auditor has to be able to accept the reliability of this process and be willing to reduce other substantive tests. Ks with other analytical tests, the accounts examined may not contain errors or the errors may not be material.

The firm concluded that if it could reduce substantive tests by three hours per site by using EDA, its use was cost justifiable. However, savings are dependent upon the level of expense testing that would otherwise be done and the willingness to reduce such tests. If the audit plan calls for testing expenses, the ability to identify what to test is helpful. If the audit plan does not call for extensive expense testing, the procedure may not be efficient.

Due to the predictable nature of revenues in the test industry--number of patients served times daily rates---EDA was not used to test revenues, However, in other industries that may not be the case and EDA might be effectively applied.

Applications For Smaller Client Bases

The test selected a group of clients that is larger than can be typically expected in many CPA firms. However, the approach is generally applicable to ten or more sets of data. The dot plot and box plot would remain useful. The variability of costs within one organization provides useful information in any circumstance where an entity operates a few similar operations. The percentile distribution illustrated needs to have at least ten points to be useful. However, an auditor could use this technique with a smaller data base by predetermining a reasonable range of values and plotting actual results against the expected range. The more information, the more reliable the results. EDA can be used with just one client if comparative data can be obtained from outside sources.

In addition, firms with a small base of clients in many fields can obtain a data base of financial data. Some of the comparable financial information is available from industry associations such as the AICPA MAP Committee Survey for CPAs industry. Many types of clients are required to report operations to regulatory authorities. Clients such as school districts, nursing homes, municipalities, and not for profit agencies all have comparable financial data available. If the firm has a small base of business in an industry where comparable financial information is available, there is an excellent opportunity for the CPA to establish an understanding of the industry.

Innovation Enhancing Competitiveness

The audit segment of a CPA practice continues to be under cost pressures. The practitioner needs to maximize the power of the computer to keep clients and remain competitive. Practitioners that do not will continue to operate at marginal survival levels. This study demonstrates that innovative approaches of accepted procedures can contribute to more efficient and better audits. It allows the auditor to use statistical methods of analytical procedures without the need to become an expert in statistics.


One of the discoveries of this test was that EDA is a very practical way for an auditor to distinguish his or herself by providing added value to the client. Both the auditor and client gained better insights into the client's operations and the industry.

CEOs expressed an interest in reviewing the graphics and using them as a management tool. Not often do clients have an interest in audit workpapers. IN one instance, an executive director was aware of high expenses of particular site compared to others but had o idea of the magnitude of the variance. Seeing the comparative data graphically and recognizing that the site's expenses were higher than a large percentage of all sites examined, the client gained new insights as to the reasons for variation. EDA pointed out several concerns in salaries and other expenses such as food and transportation expenses.

In many industries, such as group home services, the administrators come from program service careers with little formal training in accounting. Graphical displays of their financial performance compared to other similar agencies were particularly useful to them. Although data in columnar form had been provided to the agencies for years, graphics highlighted the financial performance in a manner far superior to pages of expense data.

CPAs in industry could also benefit in the analysis of their companies' performance by the use of EDA for industry comparisons with others i their industry. In contemplating a new site the broader base of data could provide them with reasonable expectations of expense levels.

Where the higher levels of expenses did have sound reasons, clients were interested in using the data in discussions with rate setters to point out the uniqueness of the site. Imagine the impression a auditing firm can make on a potential client then it displays comparative performance in explaining the firm's approach to an audit. How much more impressive, if the firm can point to the potential client's variances from industry norms.

Nicholas J. Mastracchio, Jr., CPA is a lecturer at the School of Business, State University of New York at Albany and former managing director of CL Marvin & Co., Schenectady and Albany, New York. Josef Schmee, PhD, is professor, Union College, Schenectady, New York.

The CPA Journal is broadly recognized as an outstanding, technical-refereed publication aimed at public practitioners, management, educators, and other accounting professionals. It is edited by CPAs for CPAs. Our goal is to provide CPAs and other accounting professionals with the information and news to enable them to be successful accountants, managers, and executives in today's practice environments.

©2009 The New York State Society of CPAs. Legal Notices

Visit the new cpajournal.com.