April 1989

Performance evaluation of government agencies.

by Wallace, Wanda A.

    Abstract- Certified public accountants (CPAs) are increasingly called upon to evaluate the performance of government agencies, using criteria that goes beyond the traditional analysis of numbers. In the case of mass transit, a CPA can be called on to make a performance evaluation which will assess the reliability, safety, environment, information, and cost of the system. Accountants can develop more creative definitions and more meaningful reporting mechanisms to replace the inadequate standards currently advocated by the Urban Mass Transportation Administration.

It is increasingly obvious that accounting has outgrown the bounds of debits and credits, and its focus has expanded to nonfinancial information. The Attestation Standards issued in 1986 clearly signaled that CPAs are information specialists who can assist in evaluating diverse criteria related to any assertion capable of reasonably consistent estimation or measurement. Accountants have a potential role in developing, reviewing and attesting to a wide range of information including performance measures of public services. The types of problems which accountants can help address to close the gap between information producers and users are legion. Efforts toward the phenomenon called privatization--which many see as a vehicle for enhancing competition and hence performance--imply a key role for the accountant.

In this article, the CPA assumes the role of a researcher, making a performance evaluation in mass transit. If this is your assignment, you must consider what's really important. 1. Reliability

You are considering whether to use the local transit systems, buses or subways, to get to work. What information might influence your decision? Most likely, of paramount importance is whether the transit system is reliable. Will it get you where you want to go, on time? Part of the answer to this question is one of location; is service reasonably convenient to your points of departure and arrival? Another part relates to scheduling; is the claimed schedule fact or fiction? Perfection is unlikely, so the question then arises of how to interpret percentages describing deviations. For example, if the schedule is reported to be 90% on time, what does that mean to a passenger? Are delays clustered in the A.M. or P.M. peak periods? In other words, is service on time during `rush hours?' When delays arise, how long are the best and worst scenarios? What is the possibility of cancellation? In these situations, what are the consequences? Can you get such information for the route you will use? 2. Safety

If you determine that the transit system is reasonably reliable, what else is relevant? Most likely, safety is an important consideration. How many accidents involve mass transit? Do they result in injuries or fatalities? What about crime at bus stops, train platforms, or in the vehicles themselves? Are the locales patrolled by security or police? 3. Environment

If you can get to your destination on time with relative safety, the next concern is likely to be comfort. This is sometimes referred to as the "environment" for traveling. Specifically, how crowded is the vehicle? Will you get a seat? Is the vehicle's condition good? Is the equipment intact and the floors reasonably clean? Is trash rampant? Is the climate comfortable? In the summer, is there air conditioning? Do doors work so that you can get off at your stop? Are announcements audible so that you know when to transfer or leave? Are signs on the front of vehicles clear so you know which one to board? Is the driver courteous? 4. Information

A related concern is whether information services are timely and accurate. If you call to find out how to get somewhere, can you count on being given correct information on which bus or train to catch, how to transfer, and when to expect a pick-up and delivery of passengers? If schedule changes are made, are they announced in a timely manner, with written updates? 5. Cost

The final consideration is likely to be cost of service. What is the fare? What fare reductions are provided for frequent travelers or users at other than peak hours? How does this cost relate to the cost of a "demand response vehicle," i.e., your car? Traffic, parking, and related costs of driving, as well as environmental implications, would be among the factors weighed against the farebox demands.

Performance Evaluation: The GASB Initiative

It seems to make sense that a passenger would be interested in this information about mass transit and should be able to access answers to questions about service and price. A study supported by GASB is devoted to performance evaluation, measures of interest to users and how they can be communicated to users. Mass transit is only one element of a host of services under consideration by the GASB's Service Efforts and Accomplishments Task Force. Mass transit operations illustrate both the types of controversies involved and the challenges posed to accounting practitioners and researchers.

The Current State of the Art

You may not be surprised to discover that answers to these inquiries from a typical mass transit rider are not commonly available. Instead, only about 20% of communities operating mass transit systems include any statistics in their annual public reports, and then the disclosures are usually included in budget statistics aggregated for the total system and focusing on output measures such as number of miles driven. Certain systems do not even maintain the database necessary to generate answers to questions concerning customer service described herein; those maintaining such statistics are reluctant to share them with consumers. Questions of understandability are raised, as well as small number problems arising from disaggregation by route.


Why does this situation exist? Contentions are made that reports based on technical terms would not be understandable to consumers. Transit operators lack consensus on the definition of such terms as road call; industry participants disagree as to what on-schedule means. What is the relative importance of equipment reliability versus service reliability? What constitutes a cancellation? When is the climate sufficiently comfortable? What constitutes crowded conditions? Is it not better to be late on several lightly traveled trains than on a primary peak-period, heavily traveled line? Yet, some transit operators contend, performance measures cannot effectively reflect the propriety of their decisions which weigh factors such as the number of customers inconvenienced.

Small Numbers

The small number problem ties to concerns that percentage calculations involving small numbers will result in nonsensical conclusions. As an example of how certain small numbers would be distortive, as well as the sentiments of information producers, Table 1 presents an excerpt from correspondence between the president of the New York City Transit Authority, and a New York Assemblywoman. The high cost of data collection is implicit in the reference to sampling procedures.

Suspicions of Users

Interviews with consumer-oriented interest groups, such as the Straphangers Campaign in New York City, have identified users' suspicions as to why the current state of the art is so woefully lacking. The pervasive suspicion relates to the inability of managers to control the requested performance measures. Users understand that if a system is allowed to aggregate statistics across routes or lines, some of which have old equipment and a number of problems (perhaps tied to the nature of the neighborhood serviced), while others have brand new equipment, then on average, the system's performance is likely to look fairly effective. However, as soon as averaging is not permitted, the real picture, somewhat uncontrollable on the short term due to the time involved in capital replacement decisions, begins to emerge.

In New York City, recent subway reports on by-line on-schedule performance result in a substantial range within the disaggregated statistics: 46.5% for the R line on weekdays to 98.4% for the Franklin Shuttle. The annual average across all routes was 87.9%, which clearly is at variance from the riders' experiences on the R line. While this detail was applauded by user groups, it is the only by-line detail provided. There is no doubt that other details are available internally for management use, but resistance to third-party access is strong.

The president of the New York Transit Authority has claimed problems arise with interpretability of small numbers, allocation issues of assigning failures to specific routes, and the unwieldy lengths of reports necessary to provide by-line details. Politicians and user groups have challenged the substance of these remarks; lengthy reports currently produced are cited as useless due to their overall system focus.

The Auditors' Perspective

The suspicions of users are borne out, in part, by findings of the auditors. The Inspector General's Office for the New York Transit Authority (MTA), in describing findings related to bus service, cites that the waiting time of passengers at bus stops is a critical dimension of passenger service, yet is not tracked by the information system. Similarly, in describing subway performance, reports note that disaggregated data are relevant to riders but rarely provided.

The accuracy of the data produced by this transit system is challenged by auditors, with comparisons made between auditors' reported statistics from direct observation of subway performance and those generated by the management information system. As one example, observation of 442 scheduled train trips led to the auditors' calculation of 61.89% on-time performance in contrast to the MTA's information system quantifying an 80.1% on-time performance for these same trains. A second test resulted in a 66% to 84% comparison. These are statistically significant differences.

The source of error is cited to be a combination of factors: . Too many other duties assigned to observers of pull-outs, leading to error at the point of initial data entry; . Illegibility of written figures leading to errors in data entry; . Unwritten procedures and definitions and undocumented changes therein, leading to inconsistency among observers and lack of comparability over time; and . Intentional misstatement to make performance appear superior to actual results.

The last is attributed to a lack of understanding by employees and the absence of training of employees responsible for information input. An incentive problem is also apparent in unauthorized changes to records by individuals not directly involved at the data entry stage.

The relevance to transit riders of certain performance measures is questioned by the Office of the Inspector General. For example, the concept of mean distance between failures is seen as a car reliability measure, whereas service reliability is the primary consideration of riders, regardless of the source of the problem. Moreover, even the car reliability measures are subject to extensive measurement error due to common practices of attributing problems to a major reason for service breakdowns, regardless of a concurrent maintenance problem (a secondary effect). As an example, if service is late or cut back due to construction on a bridge, then even if a car also happens to require a road call, it will tend to be tracked as late due to the construction. As a result, performance in the controllable realm appears better than it actually is.

Where Do the Accountants Fit In?

An obvious question is how can accountants help in enhancing performance measurement and meeting users' information needs? The answer lies in the definition of accounting as an art of measurement and the role of the CPA as an objective third party that pursues means of capturing the substance of performance, rather than viewpoints of those with vested interests. The nature of disclosure in terms of understandability, consistency, comparability, usefulness, and reliability can be considered by accountants who are accustomed to giving attention to important dimensions of reporting.

Analogies to the Corporate Sector

The apprehensions of information preparers in the public sector are no different from those in the corporate sector. Measurement of nonfinancial information and reporting of disaggregated information have both been controversial issues in the corporate sector. In particular, information preparers have argued that financial statement footnotes are too complex, reserve information too inexact, and line-of-business reporting too subjective due to arbitrary allocation practices. These are the same arguments posed in the mass transit area. Yet, they can be grappled with by accountants in a manner analogous to approaches that are common in the private sector.

A recent development in the corporate sector is the evolution of condensed financial reporting. There is no doubt that the related concern for clarity of presentation is particularly relevant to approaching performance evaluation in the public sector. While the concept of a sophisticated or knowledgeable user may well be reasonable in the corporate sector, the disclosures to users of public services must be directed to the typical citizen, perhaps lacking education beyond high school. This suggests that jargon such as through put, road calls, and route spacing be avoided and clear descriptions of what is being measured should be used.

Control Mechanisms

To facilitate meaningful reporting, mass transit information systems need to be designed and monitored to deter misstatement and uncover material inaccuracies. This need can be addressed by implementation of control systems, including education of employees, sanctions for noncompliance, and clear benefits to honest reporting practices. This, of course, ties to the establishment of an effective control environment. Reviews by CPAs are one means of monitoring the information system to ensure against material misstatements to the public.

The involvement of an objective third party who is responsible to the public may be critical to the evolution of performance measurement because of the current gaming environment which exists in public databases now assembled in the mass transit industry. Specifically, the Sec. 15 filing of financial information and performance measures to the federal Urban Mass Transportation Administration (UMTA) is criticized because many systems believe that certain Mass Transit systems are cheating in how they report their performances, thereby penalizing truthful reporters.

Criticisms of Sec. 15 Filings

A few examples of the games which have been described in the current Sec. 15 database on mass transit operations will clarify the obstacles to obtaining desirable disclosure practices. Currently, an output measure of transit systems is passenger miles. There is evidence that some systems adjust out service and deadhead miles, i.e., miles necessary to get to the original bus stop of the route, while others do not. This particular measure leads to higher dollar allocations by UMTA to poor reporters, since this is one criterion on which Sec. 9 resource allocations of federal funds are based. Similarly, the number of passengers is calculated using a statistical sampling approach which UMTA defines as requiring a 10% precision at 95% confidence. Although the intent is to sample an entire year of operations, some reporters sample from the month of filings since they don't have the option of going back in time to sample passengers. Some transit operations have been permitted to use this basis to quantify the number of passengers serviced. The result is again a potential advantage to those not following the rules.

Creative Definitions

Then we get into creative definitions. Perhaps the most telling example concerns the reporting of collisions. One system reportedly attributed a collision to the breaking off of a rearview mirror, and used this causal link to justify reporting the event as an equipment failure rather than a collision. Similarly, some only record a road call if it delays a bus or train by more than 10 minutes, while others track all road calls regardless of the time required for servicing.

Recent Action

Recognition of these problems has led to a clarification of the procedures the CPA is expected to apply when involved with Sec. 9 and Sec. 15 filings. A total of 24 procedures have been specified as agreed upon procedures. The required procedures are directed at controls as well as the approach to data collection. Practitioners believe the recently specified procedures are considerably more stringent than those applied in the past.

Problems in Other Statistics Outside Sec. 15

Inconsistencies over time are another type of problem for the performance measures currently tracked by individual transit systems. For example, one conspicuous illustration is in the international arena, as reported in British Rail's 1986/87 annual report:

"Improved punctuality has been a priority in the drive

for higher service standards. Despite particular problems

between Norwich and London, which will be eased with

the arrival of electric services, punctuality improved with

77% of trains arriving on time or within five minutes.

Research shows that customers are most concerned

about arrival within ten minutes of schedule. In line with

this, InterCity's punctuality target is being redefined at

90% of trains to arrive on time or within ten minutes."

One can speculate with some confidence that the on-time performance across years will improve if such easing of standards is permitted. Disclosures on performance measures need to highlight clearly changes in definitions. Moreover, consensus on a reasonable "on-time" definition for comparison across systems is essential for meaningful reporting. Currently 3-minute, 5-minute, and 10-minute benchmarks are among those used.

The American Public Transit Association (APTA), an industry trade association for mass transit, set out several years ago to track a customer satisfaction measure. Specifically, the measure selected was the number of complaints. The Association soon deduced that those who had superior information systems and tracking devices were being penalized in relation to those with ineffective management systems. The association terminated the reporting of the number of complaints and has not as yet sought an alternative measure to evaluate riders' satisfaction.

The Schism Between Information Producers and Users

The irony of debates on measurement and information production is that redundant resources are being used to generate surrogate measures of performance and are widely disseminated and debated in the press. The predictable result is suspicion among information providers and users, and assertions that you cannot believe anything that you read. From the vantage point of the producers, information for management purposes would be dangerous to share with the public, since one must understand the context in which the information is produced and a host of caveats to interpret the outcome measures.

From the vantage point of the users, they have to fill the need for information as best they can. That means culling information from the mass transit authorities, using volunteer reporters and observers to assimilate their own database, and using catchy titles on resulting reports, such as "The Good, the Bad, and the Ugly" ranking of various lines, and the similarly telling disaggregated report of "A Tale of Two Subways," both issued by the Straphangers Campaign.

A Lack of Perfection: Should that Preclude Information Flow?

Although theories speak of perfect information, little doubt exists that we live in a world of imperfect information. We joke about weather reports, yet we value forecasts as useful information. Similarly, we chastise companies for accounting procedure selections when distortions of financial positions appear to result, yet we find annual reports relevant to security valuations. It would seem that passengers would find performance statistics on mass transit useful despite some obvious limitations of any service measure. The grassroots attempt to produce the information by groups such as the Straphangers in New York is clear evidence of a demand for performance measures.

Trouble in reconciling information systems', auditors', and user groups' statistics calls for an information measurement arbitrator role in a classic agency/monitoring sense. That arbitrator seems to be the accountant.

The Other Discernible Pattern Directed at Performance

In light of the current problems in performance evaluation disclosures, what have riders done to encourage better service? An interesting trend is toward considering privatization as an option and, at times, forcing the use of the private sector or at least its comparison on a competitive bidding basis. A recent survey by Touche Ross reported that 17% of respondents had privatized some aspect of mass transit. It is commonplace for systems to privatize service for handicapped and elderly using car services or vans.

It is interesting to note that the private sector controlled mass transit well into the 1950s and 1960s and Cleveland had private sector operations into the 1970s. Many point out that the failure to generate a profit led to public sector operations, and others cite failure of service when in the private hands. Yet, today many recognize that the real distinction is not public versus private but rather, competition versus the lack thereof as a source of incentives for controlling cost and for improving service.

Public transit costs from 1970 to 1985 have increased 63.8%, inflation adjusted and on a per mile basis; this even exceeds the highly visible increase in medical costs. Moreover, fatality rates per 100 million passenger miles are .458 for public transportation and only .071 for private bus operations. Such statistics suggest a need for competition.*

One result has been legislation recently passed in Colorado. The legislation provides that 20% of bus service vehicle hours must be privatized within a year, using competitive bidding. The intent is to see the results of such privatized services and how they compare to public services. If the private services perform better than the public sector, presumably a larger percentage will be privatized. Yet, participants in the bill's sponsorship predict that the public sector will improve and maintain a fair share of the service. The means of ensuring long-term positive effects on service quality and cost control will be to mandate periodic competitive bidding.

Of course, for such an approach to work, measurement and monitoring practices are necessary. Mass transit experts note that games used to enhance the appearance of public sector performance include imposing the less desirable routes on private sector providers, such as those with a disproportionate number of higher cost bus drivers and restrictions on terminating drivers, making costs less controllable. Similarly, to lower their own competitive bids, some public sector participants have reportedly played allocation games having intercity routes cross- subsidize suburban services. The concern of the Colorado legislators is documented by their inclusion of a performance audit requirement. Table 2 presents this requirement and its intended use by decision makers. * Wendell Cox, "Public Transit, Competitive Contracting and the Public Ethic" (November 7, 187) and related interview.

Implications for the GASB

Many have expressed the view that GASB was needed foremost in the area of exploring service efforts and accomplishments in the public sector. This viewpoint echoes grassroots organizations' contention that information is demanded, but not presently provided. The GASB has to be careful, just as FASB must, to avoid placing undue weight on information providers' claims of an inability to produce certain information or of excessive costs being related thereto. Loopholes must also be monitored carefully, as standards are set.

Challenges to Researchers and Practitioners

While GASB has taken a critical first step of investigating performance evaluation across a spectrum of government services and having researchers propose a set of useful measures, a number of years will pass before standards are set which require systematic reporting. In the interim, progress will largely depend on both researchers and practitioners addressing the problems and encouraging experimentation among mass transit and similar public sector service providers.

Legislators are interested in whether the treatment of costs, allocation issues, and monitoring mechanisms within new laws is adequate. Those operating mass transit systems are curious as to how to ensure the availability of reasonably accurate information on operating performance. User groups wish to sort fiction from fact as to what can be reasonably measured or estimated to facilitate disaggregated comparisons of transportation options. Auditors recognize the sources of measurement error shortcomings in current measures, and problems with inducing useful reporting practices. Questions arise as to whether detailed procedures now required for Sec. 15 information will effectively improve information quality. Moreover, should other information be subject to similar review procedures? The research and practice communities, in tandem, could begin to close the schism in the demands for and supply of performance-related information on mass transit, as well as other public services.

