A second look at an old tool: analytical procedures.by Wallace, Wanda A.
Advanced microcomputer technology now permits the use of traditional statistica techniques to assist in analytical procedures for use by auditors and management. A case study of one application shows how it works.
The mathematical technique known as regression analysis has been used for many years in the social and physical sciences to identify and quantify relationship among groups of variables. Regression analysis is also well established in the curriculum of many business schools. However, it has been used sporadically by management and auditors as a planning, testing, and control tool. One reason is the substantial computing power required by sophisticated regression-analysis software that until recently was available only on mainframes or minicomputers. Busy managers and auditors were unlikely to find it practicable to interact wit a company's central computing resource either directly or through the information systems group. Second, many statistical packages were littered with terminology more suited to professional statisticians than to those who wanted to use the technique occasionally. The learning curve for generalists was just too steep.
Things have changed. Today's fast, powerful, portable, and affordable personal computers can handle sophisticated regression analysis packages. User interface have improved significantly. Turnaround time has improved so users can experiment with model building and see the results quickly. It is time to reconsider the possibility of regression analysis as a management and audit tool.
Regression analysis uses a mathematical technique called the "method of least squares" to create an equation that relates one variable to one or more other variables. A simple regression equation involves only one independent variable and takes the form y = a + bx, which provides an estimate of what the variable should be given a known value of the variable x. Multiple regression analysis involves the use of two or more independent variables to predict the value of the dependent variable.
The regression equation is developed from a data set involving multiple observations of the variables. In a time-series regression analysis, the data set will represent observations taken at regular intervals over time, such as monthly or weekly. For example, a natural gas utility might build a model to capture the historical relationship between monthly sales to residential customers and such drivers as the volume of gas sold, the number of residential customers, temperature statistics in the markets served, and typical residentia unit sales prices. In cross- sectional analysis--for example, retailing or branc banking--the data set will represent observations taken from different location at the same point or period in time.
An example of cross-sectional analysis will show how regression analysis can be used in the accounting profession. It is based on an actual experience where regression analysis evolved from an auditing tool to a management tool. To preserve the anonymity of the company, the actual data has been altered.
An Example from the Field
The company is a small regional retailer with approximately 90 locations. Virtually every store has the following principal departments: men and women's clothing, jewelry, sporting goods, housewares, and miscellaneous.
The auditors were interested in improving their ability to identify stores with operating results that were out of line with normal expectations. Management used the results to determine if the stores should be subjected to more detaile review and inquiry, including a possible visit by the audit staff. Historically a rotational plan of store visits was followed. However, since only 15 stores per year were visited, it took at least six years to visit all stores at least once, assuming no new stores were added. The rotation had been augmented by des reviews of each store's operating results. Among other things, the desk reviews concentrated on the gross profit percentage for each store relative to other stores. If an individual store exhibited an atypical gross profit percentage, i might be visited in place of one of the normal rotation plan units. This led to the rotational plan extending over a seven or eight-year period before achievin complete coverage of all units.
The company had no specific requirement that every store be visited by the auditors at the same time. Typically, management negotiated a total audit fee and left it to the auditors to decide which stores to visit.
The auditors thought a cross-sectional regression application across the 90 stores might be a better way to evaluate the risk of significant error in each individual store's operating results. Based on experience, they believed the major risks of error were possible failures to record revenue in the proper accounting period, pilferage of inventories, and the valuation of inventories. These issues were important to management.
The auditors recognized that any of these errors would result in the store's gross profit being incorrect and therefore decided to create a regression model to predict the gross profit for any given store. It was obvious that gross profit would be strongly correlated with sales, so the auditors set out to create a spreadsheet showing sales and gross profit for each store. The first phase of the spreadsheet is shown in the first three columns of Figure 1.
When the auditors prepared a scatter plot of gross profit (on the Y axis) against sales (on the X axis), they could see the strong correlation between gross profit and sales. The scatter plot in Figure 2 highlighted some sizable differences in the gross profit achieved by operating units with comparable levels of sales. For example, store 18 achieved nearly twice the level of gross profit of store 76 on about the same level of sales. Clearly other factors besides sales influence gross profit.
An interesting analogy can be made with astronomy. The planets Uranus and Neptune were discovered in 1781 and 1846 respectively. After their discovery, astronomers noticed their orbits were not entirely as expected. Something else was influencing their orbital behavior. Scientists deduced this could be the result of the gravitational pull of a possible trans-Neptunian planet. The likely position in the sky of such a planet was even calculated, but no planet could be seen using the nineteenth-century telescopes. Eventually, more powerfu telescopes were developed, and in 1930, the planet Pluto was discovered very close to where the original calculations predicted.
Regression analysis involves looking for descriptor variables which can explain the behavior of the dependent variable, just as Pluto explained the movements o Uranus and Neptune.
Fortunately, our auditors didn't have to wait as long as the astronomers to fin additional descriptor variables. Through discussion with management, they came to appreciate that besides sales, the location of the individual stores greatly affected gross profit. Of the 90 stores, 17 were in particularly good locations 17 were in particularly poor locations, and the remaining 56 stores were in relatively average locations. Location is a qualitative, not a quantitative, attribute. To reflect it in the regression model, the auditors created two "dummy" variables, representing two types of location, one good, the other poor A dummy variable involves assigning a value of 0 or 1 to each observation. Typically an observation is assigned a 1 if an attribute applies, and a 0 if it does not.
Note that when modeling, one less than the number of possible categories is the appropriate number of dummy variables. As can be seen from the fourth and fifth columns in Figure 1, the two dummy variables effectively describe the three mutually exclusive qualitative attributes with respect to location: for example store 2 is in an average location, store 50 is in a poor location, and store 90 is in a good location.
In performing further analysis of the data, the range of the sales prompted the audit team to consider the varied physical sizes of the individual stores. Rather than being of standard size, the range of store size varied from 20,000 square feet to almost 100,000 square feet. Since sales per square foot is an important statistic in the retailing industry, and large stores are expected to be capable of generating more gross profit, the team enhanced the model by adding data on the size in square feet for all 90 stores.
It will be noted that the three descriptor variables now in the model, i.e., sales, location, and size, are likely to be inter-related themselves. In particular, location and store size would probably have a significant bearing o store sales. However, descriptor variables do not need to be independent of eac other. At the model building stage, we recommend giving the creative process free reign to identify, and if feasible to obtain data for, all descriptor variables that seem likely to be correlated with the dependent variable. On occasion, the inclusion in a regression model of two closely correlated descriptor variables can give rise to a statistical problem known as multicollinearity. Good regression analysis software will warn the user when multicollinearity is a problem, and the user can then assess its effects and decide which of the redundant variables to prune from the model.
The auditors' objective was to create a model accurate enough to predict a store's gross profit to within $25,000 with a confidence level of 80%. This figure of $25,000 is called the model's precision, and it means that a difference between estimated and recorded values would be deemed acceptable if it did not exceed $25,000. This does not mean that the auditors were necessaril accepting errors of $25,000. Instead they were accepting unexplained difference of up to $25,000 that could be the result of errors in the recorded amount of gross profit and/or store- level business factors not reflected in the regressio model, as well as sampling error that exists whenever a statistical estimation tool is applied.
Unfortunately, the auditors found that the model had a precision of about $40,000 at 80%, which was deemed insufficient to meet the audit planning objective. It would be necessary to enhance the predictive power of the model. This is typical of the iterative process of model- enhancement. The auditors searched model enhancement ideas by discussing with management (who were, by now, increasingly interested) the stores where the model had been ineffective a predicting gross profit. Management pointed out a relatively new policy whereby store hours could vary from nine to 14 hours per day. Since stores with longer hours might be expected to generate more gross profit, the auditors decided to add "hours per day" for each store in the spreadsheet.
The auditors re-ran the regression model and found that it had precision of les than $25,000 at 80%. The regression equation was as follows:
Gross profit = 11.67% of sales + $35,500, if a good location, -$17,900 if a bad location, + $.50 per square foot + $300 per hour opened + $61,407 constant term
It can be seen that each independent variable is linked to gross profit with a coefficient, e.g., $300 per hour opened. The coefficient is the model's prediction of the dollars of change in gross profit per unit change in the independent variable, holding all other variables constant. As the auditors had expected, gross sales was positively correlated with sales, a good location, size of store, and hours opened per day, and was negatively correlated with poo location.
The constant term ($61,407 in this case) "falls out" of the regression algorith as the level of gross profit common to all stores before systematic differences captured by the independent variables in the model are observed. It might be expected that there would be no constant term, since with no sales, square feet and hours, etc. there would be no gross profit. However, none of the stores had zero values for sales, square feet, and hours. Instead, the stores all had positive values within a given range (for example, 20,000 to 100,000 square feet). The regression equation is simply the best fitting relationship for the observations in that range. Specifically, the constant term applies to the relevant range of the data set.
For each store, the software can now estimate the gross profit based on the descriptor variable data for that store. For example, one store had sales of $6.72 million, was well located, was 40,000 square feet in size, and was open 1 hours per day. The regression model projected a gross profit for this store of:
11.67%($6,720,000) + $35,500 + $.50(40,000) + $300(12) + $61,407 = $904,731.
In fact, this store had a recorded gross profit of $895,546. The $9,185 difference between the recorded amount and estimated amounts is known as a residual. But what do we do with it?
The interpretation of store by store residuals is facilitated when the stores are sorted by ascending order of regression estimate, and lines are drawn through not only the regression estimate for each store but also the upper and lower confidence limits for each store's estimate. The confidence interval is the difference between the lower and upper confidence limits. This is shown graphically in Figure 3.
The confidence intervals in Figure 3 were determined at a level of 80%. If a store's recorded gross profit is within its confidence interval, we can say tha it does not differ from the regression estimate by an amount that is statistically significant at 80% confidence. The higher the confidence level, the wider the confidence interval will be. If a recorded value is outside the confidence interval, there is an outlier which is quantified by the difference between the recorded value and the nearest side of the confidence interval.
When the auditors reviewed Figure 3, they observed that seven stores had outliers, and they decided to investigate possible cause for the six largest of these.
The Results of Investigating Outliers
Two of the investigated stores had gross profit amounts above the confidence interval. The first of these proved to have a paperwork problem that resulted i gross profits for that store being overstated by approximately $50,000 or 4%. This accounting error was corrected.
The other store had the good fortune to be located near a major rapid transit station. This store's customers tended to be busy commuters who visited the store with a specific purchase in mind and who were relatively insensitive to price. Consequently, the company had been able to increase margins on selected products without any loss of volume.
Four of the stores investigated had gross profit amounts less than the confidence interval around the regression estimate. Two turned out to be what management called "dumping stores," in reality off-price discount stores. Slow-moving stock (particularly women's clothing) was often shipped to these stores from other locations to sell at reduced prices. This was done so as not to cannibalize sales of newer high-margin products at the other stores. The model had not considered the existence of dumping stores, and consequently thes two stores showed up with outliers.
A third store below the confidence interval had experienced internal theft problems in the jewelry and sporting goods departments. While management was aware of the problem and had recently taken corrective action, they were intrigued that the auditors identified this store as anomalous because the store's overall gross profit percentage approximated the average for all stores in the chain.
The last store below the confidence interval turned out not to have a jewelry department. Jewelry was the highest margin product line, and the absence of jewelry in this store meant that it suffered by comparison with the other stores.
It is important to note that the outlier stores highlighted by regression analysis are not the same set of stores that simple analysis of gross profit percentages would have highlighted. Indeed, several outlier stores had gross profit percentages relatively close to the average for the chain as a whole. By considering not only sales but other factors that should logically be related t the ability to generate gross profit, a more sophisticated expectation of store performance had been developed.
Based on the experience of this regression application, the auditors and management decided to enhance the model for joint use in the future. Among the enhancements considered were specific recognition of dumping stores through dummy variables, and the consideration of product line differences. The product mix question could be addressed by developing separate regression models to predict the gross profit for each product line, or by staying with one model bu adding as a variable the proportion of sales contributed by specific high or lo margin product lines. The former approach would result in more incisive analysis, but at the cost of requiring more input data.
In addition, thought was given to developing cross-sectional models of certain key store-level operating costs, notably shrinkage and compensation and benefit costs. Another possibility would be to develop a model to predict the number of full-time equivalent employees at each store, with a view to investigating stores that might be over or under staffed.
Finally, there was joint auditor-management development of time-series models for the chain as a whole. For example, a model was built to predict monthly payroll expense based on the number of stores, the number of employees, the rat of pay for one or two benchmark categories, and the average number of hours worked.
Reasons for a Second Look
As a result of using regression analysis in their audit, the auditors believed their approach was now much more effectively targeted at locations or months that had above average risk or significant accounting error, and they were able to reduce their work significantly for locations that appeared normal. Management also believed that regression analysis would be a useful weapon in their arsenal to spot stores which were operating in an abnormal manner and thereby facilitate investigation and correction on a timely basis. The ability to use personal computers to experiment and iterate with various combinations o variables was a critical success factor in this process. Based on this experience, we suggest there is an opportunity for greater application of regression analysis in management control systems. Research has demonstrated that the average number of errors identified in audits of financial statements is substantially lower when management scrutinizes financial records using analytical procedures. Such evidence provides strong incentive to give regression analysis a second look.
David A. Scott, CA, is a partner of Price Waterhouse, Toronto, Canada. Wanda A. Wallace, PhD, CPA, CMA, CIA, is the John N. Dalton Professor of Business Administration and Associate Dean for Academic Affairs at the College of Willia and Mary.
The CPA Journal is broadly recognized as an outstanding, technical-refereed publication aimed at public practitioners, management, educators, and other accounting professionals. It is edited by CPAs for CPAs. Our goal is to provide CPAs and other accounting professionals with the information and news to enable them to be successful accountants, managers, and executives in today's practice environments.
©2009 The New York State Society of CPAs. Legal Notices
Visit the new cpajournal.com.