Doing More With Data

July 1999 Issue

THE CPA IN INDUSTRY

DOING MORE WITH DATA

By Simon Petravick

An abundance of data is scattered throughout today's accounting systems. Technological advances and sophisticated analysis techniques allow these bits and bytes to be the building blocks for significant improvements in decision making.

What is needed is a systems architecture to identify strategic information in existing systems and make it available for analysis and inquiries. Through such a program, many companies have found data patterns of strategic significance. For example, a music chain learned that people older than 65 bought many rap and alternative music CDs. These buyers had not changed their tastes for music; they were buying Christmas presents for their grandchildren. A target marketing program to this group increased sales by 37%.

Systems Architecture

Exhibit 1 illustrates the overall operation of the system. There are two primary steps: data acquisition and cleansing, and data analysis.

Data Acquisition and Cleansing. A typical company may find the following conditions: The marketing system contains data about promotions sent to customers. The order entry system contains details about customers' orders. The accounts receivable system contains customers' payment histories. Data about customers is spread across several legacy systems.

Few people compile a complete view of the information, from promotions to final collections, because data must be pulled from multiple sources. The process becomes more complicated if each system stores data in unique ways. For example, marketing refers to the customer by name, order entry is organized by purchase order number, and accounts receivable is keyed to the customer's account number.

To overcome these difficulties, many companies have copied the data decision makers need from the legacy systems to a data repository. Special software automates this repetitious task. During the transfer, the data is "cleansed." Cleansing reconciles the dissimilarities, handles missing values, and integrates the data into the repository in a consistent fashion. In the previous example, one key would be used to identify each customer in the data repository.

In addition to internal data, decision makers often require external data such as average income levels, other demographics, and economic forecasts. One systems manager reported that users ran two-thirds of their queries against information from external sources. To improve access, many companies have found the data repository a convenient place to store such information.

Several different names are commonly used for data repositories, depending upon the type of data stored and their intended use. For example, a data warehouse is a repository that draws data from many systems that will be shared by many users. A data mart contains data drawn from fewer systems that will be used by one or a few functional units. A multidimensional database provides different views of financial data: by product, channel, or time of sale, to name a few.

One advantage of moving data to a single repository is that decision makers can obtain views and analyses of data not previously possible. For example, marketing personnel may be able to better evaluate the effectiveness of promotions when they can easily match information about the promotions sent to customers with the orders received.

A second advantage is time savings. One organization has 200 users running 5,000 queries per month against its data repository. By using the data repository, each person saves one hour per query over the previous data collection efforts. This amounts to a substantial savings of 60,000 hours per year.

A third advantage is that the existing legacy transaction systems will remain intact. Thus, the many people that process transactions on a day-to-day basis do not need any retraining. Furthermore, there is no disruption of business processes caused by the conversion to other transaction processing systems.

One downside to a data repository, however, is the cost. On the low end, an investment of six figures is often reported. A large-scale operation may cost several million dollars. In time, these costs should decrease. Vendors are creating products with lower price tags to support mid-sized companies' needs.

Another downside is that transferring data to a repository will not resolve problems with inefficient legacy systems. In this case, organizations should consider an Enterprise Resource Planning (ERP) system, which is explained in the sidebar.

Data Analysis. Decision makers may reap the following three benefits from data stored consistently in one source:

* Management reporting may be streamlined, because data that was previously spread across multiple systems is now readily available from one location.

* The repository may support online analytical processing (OLAP), also called multidimensional analysis. This technique allows the manipulation and viewing of data according in many dimensions. For example, OLAP allows sales to be analyzed, manipulated, and viewed according to product, location, and time of sale. If additional detail is desired, users then drill down to more specific views such as sales by month, store, or product.

To illustrate, using OLAP a 600-store office supply company was able to substantially improve its return on personal computer sales. OLAP allowed management to calculate gross margin by store and product type. This showed that some stores carried too much slow-moving stock. To eliminate unnecessary inventory and future write-downs, the company reduced its PC assortment from 22 products to 12.

* The repository may support data mining. Data mining uses statistical techniques to uncover answers to questions that are important to the operation of the business. The results of data mining provide insights that are not apparent from viewing specific transactions. For example, a phone company may gain insight into why customers transfer their business to a competitor.

Development and Maintenance

Any large system project involves risk. Therefore, careful planning and management are important. Exhibit 2 summarizes the steps to create a data repository and conduct data analyses.

Systems Analysis. Systems analysis is the creation of a plan or blueprint to guide the project's development. A systems analyst is the architect for the design of a system. The systems analyst should help establish project goals that are consistent with the overall objectives of the business. These goals provide a means of controlling the scope of the project and keeping users' expectations in line with what will be delivered.

Second, a high-level sponsor should be chosen to foster collaboration among those that maintain the separate legacy systems and use the new repository. Many individuals view the creation and dissemination of information as their primary duties within an organization. Their desire to maintain job security may make them reluctant to participate; a high-level sponsor will encourage people to part with their fiefdoms. Additionally, the sponsor will help them realize that by sharing data, individual units will gain access to other data and possibilities for new insights.

Finally, the analyst and management must evaluate the feasibility of the project. Generally, most organizations have criteria that are used to justify new systems. Typically, the evaluation requires answers to the following questions: Do the expected benefits exceed the costs? Can the system be completed within a reasonable time? Do we have, or can we obtain, people with the appropriate skills to complete and operate the system?

Development. The systems analyst next works with future users of the system in order to identify key data needed to make decisions. The analyst also learns how the users will analyze the data. Based on this information, the analyst develops a model, or blueprint, of how the data will be stored in the repository. The analyst designs this model to make access more efficient for the users' intended analyses. When the model optimizes one type of analysis, other types of analyses may be less efficient. In other words, a data repository designed for OLAP may not efficiently support data mining. Systems analysis is an iterative process requiring communication between the end users and the technicians responsible for the project's creation.

Once the data and its intended use are known, the systems analyst must tackle several technical questions: What software will be used to manage the data and conduct the analyses? What hardware will run the software? What network infrastructure is required to deliver the information to end users?

Typically, the software is selected first. Initially, the systems analyst compares the proposed system's requirements to software created by third parties. Many vendors provide products for creation of data repositories and analysis of data. Exhibit 3 lists some of the major vendors and their websites.

There are many advantages to using a third party's products. The software is debugged and technical support is readily available. It can usually be installed in less time than it takes to write a custom program. The major concern is that the product may not exactly fit the specified requirements. If a compromise cannot be reached between the software's capabilities and the project's needs, the project must be reevaluated to see if the benefits still justify a more costly custom development path.

After selecting the hardware and software, data is copied from the legacy systems and external sources, cleansed, and loaded into the repository. Typically, the data is initially stored in its raw, detailed form. When it is known in advance that a particular summary will be used frequently, the system can be programmed to automatically calculate and save it. This will be more efficient than recalculating the summary each time it is requested.

The system should be tested prior to going live to ensure that dirty data has been eradicated. Nothing will bring a project to an end faster than delivering incorrect information to decision makers. Several techniques are available. Queries can be run against the data and compared to predetermined results. Unexplained differences may indicate data problems. Additionally, statistical summaries may spot unusual items. For example, if the repository should contain the post office's two-digit state code for each address, the data can be sorted by state. If both New York and NY are found, the data has not been properly cleansed.

Additionally, those working on development of the system should document its construction and prepare operations manuals. Critical facts should not be stored in only one person's head since that person may not be available at the time of a problem.

Finally, users of the system must be adequately trained. The degree of training depends upon the sophistication of the analysis technique. In general, data mining requires the most training because its success depends upon knowledge of the business domain and the statistical techniques used for analysis. Users must be able to determine whether the findings are valid. Irrelevant results can easily be obtained when questions are asked of billions of bytes of data. For example, data mining has tied stock performance to serial numbers and determined that customer ID numbers are the best predictor of interest in a marketing campaign. Recommending future activities based upon such findings would not be helpful.

Operation. Operation of the system involves three tasks. First, as already discussed, the data will be used in various ways to support decision making. This can range from preparation of management reports to sophisticated techniques such as OLAP and data mining. Exhibit 3 also provides some websites describing these techniques.

Second, to maintain the usefulness of the repository, a plan should be developed for copying new data from the operational systems to the data repository at predetermined times. As previously discussed, the system may also summarize and store the frequently used data, depending upon users' needs.

Finally, operation includes the archiving of some very detailed, very large data. Fortunately, the 8020 rule applies to data: 80% of the queries can be answered by 20% of the data. The greater the size of the database, the longer it takes to load new data and retrieve output. Therefore, it will be useful to develop a schedule for archiving older data. Naturally, when needed, the system can make this data available.

Maintenance. Several events will lead to changes in the repository. This phase of operation is called maintenance. Changes to the legacy systems that feed the repository will require changes to the cleansing process.

Second, a well-designed project will be popular among the users, who will request the addition of new data to the repository. Requests should be evaluated in light of the original plan and available resources. Any requests not consistent with the original plan should be rejected.

Finally, technological advances occur frequently. On at least an annual basis, the capabilities of new hardware and software must be evaluated on the basis of their costs. *

Simon Petravick, PhD, CPA, is an assistant professor of accounting at Bradley University, simonp@bradley.edu.

ENTERPRISE RESOURCE PLANNING SYSTEMS

Conversion to an enterprise resource planning system (ERP) involves the installation of a single system that covers many information processing needs typically handled by separate systems. A fully functional ERP may handle general ledger, accounts receivable, accounts payable, fixed assets, human resource administration, payroll, production planning, materials management, order entry and processing, warehouse management, transportation management, project management, plant maintenance, and customer service. Traditionally directed toward industry giants only, these products are now being marketed atmid-sized companies with sales of $150 million.

There are several advantages to using an ERP. The first is that all information goes in one place. This eliminates inconsistencies and allows each user to view the same data. Also, since these systems handle daily transaction processing, they embed the vendor's understanding of best practices. Therefore, day-to-day operations may be improved. Third, using a system from one vendor avoids the problems inherent with integrating products from different vendors.

One disadvantage of an ERP is that it requires a great deal of configuration. This is a complicated process, often requiring the assistance of expensive consultants. These configuration decisions decide how the system will run and, accordingly, how the business will operate. Second, the ERP defines an entirely new way of doing work. End users must be retrained and organizational changes made. Finally, configuration and conversion is also a time-consuming process. Large-scale projects are often measured in years. *

Editor:
James L. Craig, Jr., CPA
The CPA Journal

The CPA Journal is broadly recognized as an outstanding, technical-refereed publication aimed at public practitioners, management, educators, and other accounting professionals. It is edited by CPAs for CPAs. Our goal is to provide CPAs and other accounting professionals with the information and news to enable them to be successful accountants, managers, and executives in today's practice environments.

Visit the new cpajournal.com.