|
||||||||||||
|
||||||||||||
Search Software Personal Help |
A better way to measure tomorrow's CPA
By Nathan T. Garrett, Robert P. Moffie, and Kevin Sweeney
In Brief
Prior to the May 1997 examination, if 30% of the candidates did not achieve a passing grade of 75, points were added to scores so that this percentage level could be achieved. This was caused by the need to use untested new questions that led to uneven examination results under the old disclosed examination policy.
Under the new nondisclosed examination policy, it is now possible to build a bank of questions of high quality with known characteristics from which to construct new examinations. This policy eliminates the need to have a minimum level of passing grades. However, the question remains as to whether achieving a 75 passing grade assures that a candidate has the minimum competency to be granted a license.
A new procedure has been instituted to attempt to measure that level of competency. It was tested based on the May 1996 examination, used for the first time in the May 1997 examination, and will be kept current based on future examinations.
If you passed the Uniform CPA Examination during the past 20 years, chances are your success can be attributed, at least in part, to the preparation levels of the people who took the exam with you.
Until May 1997, the passing standard for the CPA examination was based on the policy that all candidates who achieved a raw score of 75 would pass; however, if nationally fewer than 30% of the candidates achieved this standard, points were added to candidates' scores so that approximately 30% achieved an adjusted score of 75 or higher. These added points were known as grading adjustment points (GAP). Under this policy it was not uncommon for raw scores as low as 60 to be given GAP to raise the raw passing score to 75. This policy, in effect, passed the top 30% on every examination administration. Consequently, if a group of candidates was particularly well prepared, or particularly ill prepared, the top 30% would pass, regardless.
The 30% policy was derived from historical data that indicated how many times well prepared candidates took the exam before passing it.
The need for such a policy resulted largely from the fact that the questions on each examination were available to candidates, academicians, examination review course providers, and others immediately following each administration. With the old questions in the public domain, the AICPA Board of Examiners (BOE) was forced to prepare a large number of new questions for each subsequent examination. The need to keep the new questions secret before using them made it impractical to pretest them to determine how well they would perform in the examination environment. Therefore, it was not possible to know whether a new examination was harder or easier than a previous one until finding out how the candidates performed. The 30% rule was the BOE's solution to this problem. Grading adjustment points were used to adjust for differences in difficulty between examinations.
Neither the BOE nor the state boards were comfortable with the 30% rule, but nothing could be done about it without discontinuing the policy of disclosing examination questions. Thus, the BOE and the state boards agreed that, effective with the May 1996 examination, examination questions would no longer be disclosed after the examination was given. Under this new policy, it is now possible to build a bank of questions of high quality with known characteristics from which to construct new examinations. This should result in improving each examination's ability to consistently screen out those candidates who do not possess the requisite knowledge to gain entry into the profession.
The nondisclosed examination policy was not a complete answer to the problem of determining which candidates should pass. The objective in establishing the passing standard for any professional licensing examination is to determine that candidates possess the minimum level of competence necessary to be granted a license. Those below that level should be denied entry into the profession to protect the public. The question then becomes, does correctly answering 75% of the content of an examination assure that all candidates of minimum competence will pass? Should the passing score be 85%? Should it be 65%? To answer this question, the BOE decided to undertake a study to find a means of establishing the passing standard. The study was undertaken in close cooperation with a committee appointed by the chairman of the National Association of State Boards of Accountancy (NASBA), and the BOE consulted extensively with testing experts (psychometricians) from the AICPA Examinations Division, the University of Wisconsin at Madison, and the University of Massachusetts at Amherst. The result of this study is the adoption of a method of determining the passing standard. The new standard was employed for the first time to determine who passed the May 1997 examination.
The new procedure is known as a modified Angoff standard setting method. Generally, the Angoff method involves convening a panel of judges familiar with the work of entry level professionals who evaluate each question of each section of an examination. Each panelist's task is to estimate the probability that a "borderline" or "minimally qualified" professional would answer each question correctly.
This new procedure was applied to the May 1996 examination, but the actual passing scores for the May 1996 examination were established in the traditional way. The May 1996 results were used as the starting point for determining the passing standard for May 1997 and will be used for subsequent examinations until such time as new panels are convened to perform new standard-setting studies. The expectation is that the standard-setting studies will be performed periodically to keep the passing standard current.
What follows is a description of the composition of the panels, the procedures they followed, derivation of experimental passing scores based on the May 1996 examination, and the way in which the May 1996 results were used to determine the passing scores for the May 1997 examination.
Panel Composition. Volunteers were solicited to serve on standard-setting panels for the four sections of the CPA examination. Volunteers selected were CPAs with the following qualifications:
* Currently engaged in public practice with at least three years but not more than 10 years experience.
* Spending a minimum of 50% of their time in either attestation or tax.
* Supervising new CPAs in public practice as a part of their duties.
Fifteen volunteers served on five different panels that studied the entire examination. In most instances, the volunteers were assigned to panels based on their stated preferences.
Procedures Followed. The panelists were given extensive training on the purpose of the studies and the process to be used. First, they were given a definition of "the minimally qualified CPA" prepared by the BOE. They were also told that the purpose of the examination, as with all professional licensing examinations, is to determine which of the candidates have the minimum level of competence to be allowed to practice.
Following this overall explanation, the panelists were given the questions and official answers for the specific section of the May 1996 examination to which they were assigned. They were instructed to read each question and assign a minimum pass level (MPL) that represented the percentage of minimally qualified CPAs who would have answered the question correctly. This was done without consultation among the panelists. (Angoff ratings are usually referred to as minimum pass levels, or MPLs, to differentiate them from candidate scores. This reduces confusion when discussing Angoff ratings and candidate scores on the same test.) The individual MPLs were tallied, question by question, and panelists were provided a summary of the range of MPLs they had assigned, the average MPL for each question, and average of all MPLs for that group of questions. The panelists were then given the statistics showing how all candidates actually performed on the questions they had rated. For each question, the panelists were encouraged to consider the summary information and the statistics on candidate performance and to discuss their rationale for the MPLs they had assigned. They were then instructed to rate each question again. This procedure resulted in a smaller spread between the highest and lowest MPLs for almost all of the questions but only a small difference in the average score for the entire group of questions. The second set of MPLs for all questions was tallied to arrive at the "initial passing score" that, in the judgment of the panelists, was needed for the minimally qualified candidate to pass that section of the examination.
Deriving the Passing Score. Two refinements were made to the initial MPLs. The first was to determine an "adjusted MPL" by using the average MPLs assigned by the panelists only to those questions that had good statistical properties, i.e., questions that were good enough to be used on a subsequent examination. This method maintained BOE's long-term policy of not reusing questions if the statistics on candidate performance indicate the questions are too easy or too difficult. For years the questions on the examination have been screened using item statistics that weed out questions that are not diagnostic enough (that is, too easy or too hard) to separate the well prepared candidate from a candidate who is not well prepared.
The decision to use only those questions with good statistical properties is also supported by a 1992 National Academy of Education Panel report, "Setting Performance Standards for Student Achievement." From its observations, the report concluded as follows: "Panelists [tend] to underestimate performance on easy items and overestimate it for hard items. Therefore, rather than setting a consistent expectation for each level, panelists [generate] very different cut points from the easy and hard items."
The second refinement was to reduce the "adjusted MPL" by one Conditional Standard Error of Measurement (CSEM) to arrive at the passing score. There are random factors, referred to as "errors," that may cause a candidate to score higher or lower on an examination than his or her true ability would suggest. Such factors include misreading a question or answer or making a clerical error in recording an answer. The CSEM represents the amount of error associated with examination scores at a particular score level. Specifically, it is the expected standard deviation of the distribution of scores that would result if all candidates with a single level of achievement were to retake the examination repeatedly. Because of this error, which is present in all types of examinations, candidates would be expected to obtain either slightly higher or lower scores on retesting.
Developing Passing Scores for the May 1997 Examination. Essentially, there are two variables that had to be considered in applying the recommended passing standard to the May 1997 and subsequent examinations. First, there was always the possibility the May 1997 examination, and all subsequent examinations, would be easier or harder than the May 1996 examination on which the passing standard is based. The reason for this is that most of the questions are different, and for those questions that are the same, the order in which they fall is different. Second, it is possible candidates who take subsequent examinations may be better prepared or not as well prepared as those who took the May 1996 examination.
The BOE psychometricians used a method called "equating" to determine if the May 1997 examination was more or less difficult than the May 1996 examination and to test for evidence that the candidate pools were significantly different. Examinations are "equated" by imbedding questions from earlier examinations into later ones. If the candidate pools are equal in ability, they should perform equally well on the equating questions. If the candidates' performance on the equating questions is the same as it was on the earlier examination, the pools are deemed to be equal in ability and any difference in candidates' performance on the remainder of the examination is assumed to be attributable to a difference in the difficulty of the two examinations. If the candidates perform significantly better or worse on the equating questions, it is assumed the candidate pools were different. From these measurements, the passing standard derived for the May 1996 examination was adjusted mathematically so that it applied to the May 1997 examination.
The raw scores (MPL) derived by the Angoff method resulted in the following Minimum Pass Levels (MPL) for the May 1996 Examination:
These raw MPL scores are the starting point for the development of MPL scores for subsequent examinations until such time as new Angoff studies are done.
Raw scores on the May 1997 examination were converted to reported grades in a two-step process. First, through use of the equating procedure, the raw score that represented the same amount of ability as the MPL arrived at on the May 1996 examination was determined for each section of the May 1997 examination. That raw score became the raw passing score. Second, all raw scores were adjusted so that the passing scores equalled 75. In theory this was done in the same way the zero temperature reading on the Celsius scale is converted to the 32-degree reading on the Fahrenheit scale. Although the numbers are different, they both represent the freezing point for water. The MPL and the grade of 75 both represent the same level of ability, although the numbers are different.
Under this process, the percentage of passing candidates can, and will, fluctuate from examination to examination. There will no longer be a predetermined minimum percentage for passing, as was the case with the former 30% policy. The passing percentages based on the MPLs for the May 1996 examination and based on the equated MPLs for the May 1997 examination are shown in the exhibit.
For subsequent examinations, the percentage passing each section may vary even more. For example, assume the equating process reveals that the May 1996 examination and a subsequent one are equal in difficulty so that a passing score of 65.4 applies to the auditing sections of both examinations. If the candidates on the subsequent examination are, on average, not as well prepared as the 1996 group, and only 20% of them achieve raw scores of 65.4 or above on auditing, under the new procedure only 20% will pass the auditing section. Under the old procedure, adjustment points would have been added to all candidates' scores in auditing so that at least 30% would have passed. Conversely, under the new procedure, if 40% of the candidates on a subsequent examination score 65.4 or higher, all 40% will pass.
Improvements in the quality of the examination dictate that new standard-setting studies be conducted periodically. The BOE was faced with the question of when new studies should be performed. The May 1997 examination was of higher quality because the examination is now nondisclosed. (Questions from the May 1996 examination were the first to be removed from the public domain.) Questions of proven psychometric value can now be retained in the item bank for reuse. Test items that are either too difficult or too easy will be less likely to appear on future examinations. Thus, a higher quality examination will result in a more reliable passing standard because the standard-setting process will be based on a larger group of questions.
Finally, BOE decided they did not want to wait three or four years before reconsidering the passing scores. They agreed the passing scores should be kept current by initiating an ongoing series of studies. Furthermore, BOE psychometricians advised that new procedures need not be applied simultaneously to all sections of a single examination. Consideration was given to the administrative advantages of spreading out the process over a four-year span when the scheduling demands would be reduced for AICPA staff and consulting psychometricians, and the cost absorbed over several fiscal years. For these reasons, it was concluded the new procedures should be applied to one section each year, beginning with the May 1998 accounting and reporting section, which was selected because the corresponding May 1996 section contained a higher percentage of psychometrically weak questions than the other three sections. *
Nathan T. Garrett, JD, CPA, is a past chairman of NASBA, a faculty member at the North Carolina Central University School of Business, and a partner in Cherry, Bekart, and Holland, LLP. Robert P. Moffie, PhD, CPA, is an associate professor of accounting at the North Carolina Central University School of Business. Kevin Sweeney, PhD, is the assistant director of psychometrics for the AICPA.
This article is based on the report of the Examination Passing Standard Subcommittee (EPSS) of the National Association of State Boards of Accountancy (NASBA). Nathan T. Garrett chaired the subcommittee and was the author of its report. The EPSS members were Alvin A. Arens, Michigan State University; Robert C. Ellyson, retired partner of Coopers & Lybrand, LLP; and John K. Simmons, University of Florida. The subcommittee drew heavily on Kevin Sweeney, assistant director of psychometrics for the AICPA; Michael T. Kane, professor at the University of Wisconsin; and Asa L. Hord, chairman of the NASBA Examinations Committee and retired partner of Deloitte & Touche, LLP.
©2009 The New York State Society of CPAs. Legal Notices |
Visit the new cpajournal.com.