Boosting Productivity With Speech-Recognition Systems

Boosting Productivity With Speech-Recognition Systems
By Kent N. Schneider, JD, CPA, East Tennesseee State University, and Dwight Owsen, Long Island University

Watching “2001: A Space Odyssey” or reruns of the original “Star Trek” television series, it’s common for computer users to fantasize about trading in the keyboard for a microphone. Although speech-recognition technology has not reached this level of sophistication, it definitely has developed to the point of becoming a useful, popular tool, ranging from the high-end systems used in the medical profession to the low-end programs teenagers use to surf the web. For business professionals, however, neither extreme is likely to have much appeal, because of cost and functionality limitations. But mid-range programs, such as Dragon NaturallySpeaking, IBM ViaVoice, and even Microsoft Office XP, are increasingly finding their way onto the desks of financial and accounting professionals.

For many users, the initial appeal of speech-recognition systems is faster data input. Assuming one is using a highly accurate program, most users can dictate at 160 words per minute, which is faster than the average typist. In addition, speech-recognition software permits users to create dictation shorthand for frequently used words and phrases. Consequently, speech-recognition can reduce if not eliminate manual transcription costs. Taking things a step further, speech-recognition software makes it possible for a mobile user to dictate into a digital recorder while away from the computer, then upload the recording for conversion to text at a later time.

Speech-recognition systems can also make computers more accessible to those suffering from injuries or disabilities. For example, speech-recognition software can protect computer users from developing repetitive-strain injuries or carpel tunnel syndrome, and help people afflicted with those ailments to work. Similarly, speech-recognition software can help dyslexics overcome many of their difficulties. Finally, people suffering from vision problems such as presbyobia or macular degeneration may find that speech recognition, systems can make word-processing tasks easier. As a result, these systems can help employers comply with the Americans with Disabilities Act and, perhaps, reduce workers compensation claims.

Many users already have adequate computing power. Most speech-recognition programs recommend a Pentium III computer operating at 500 MHz or better. Although Pentium II–based computers can run these programs, most users will find the slower processing speed both noticeable and annoying. Conversely, Pentium 4–level computers make the process much more enjoyable.

In addition to adequate processing power, the computer also needs an input device. Although it is possible to use a free standing microphone, a headset yields greater accuracy and reliability. The choice will be dictated by the method of connecting the headset microphone to the computer. For computers lacking an universal serial bus (USB) port, the only choice is to plug the headset into the computer’s sound card; higher-quality sound cards yield greater transcription accuracy. For newer computers with USB support, USB microphones are a better choice because removing the sound card as the middle man makes the USB microphone more accurate.

Microsoft Office XP users will be pleased to discover that their software already possesses basic speech recognition capabilities. (See office.microsoft.
com/assistance/2002/articles/oFirstTimeSpeech.aspx.) This can be an inexpensive way to experiment with this technology, and many users find Office XP adequate for simple speech-recognition system needs.

For people who want greater capabilities, Dragon NaturallySpeaking offers a wide range of programs for Windows users. Routinely garnering top reviews from the technology media, Naturally-Speaking also offers versions that recognize several languages in addition to English. (See www.scansoft.com/naturallyspeaking/.)

People using alternative operating systems can use IBM’s ViaVoice software, which comes in versions for Windows, Macintosh, and Linux. (See
www-3.ibm.com/software/speech/.) For the adventurous Linux user, free speech-recognition software programs are currently under development. (See slashdot.org for the latest versions.)

The training process usually requires the user to dictate text appearing on the screen so the software can link the digitized voice input to the words in its vocabulary files. Obviously, the more time spent “training” the software this way, the greater the recognition accuracy. To expedite the process, some speech-recognition programs will compare vocabulary files with existing documents prepared by the user, then ask the user to dictate new words found in these documents.

Fortunately, the latest generation of software has reduced training requirements from several hours to perhaps 20 or 30 minutes.

Still, the software training process has not reached the point of “speaker independence,” where one can theoretically delegate the software training task to a subordinate. That is, if Heidi trains the program to recognize her speech patterns, the program will not yield accurate results for Trudy or Eric, each of whom must train the software before using it with acceptable results.

Training for the user typically requires a series of adjustments. Although software has progressed to the point where the user no longer must speak in a slow and unnatural manner, articulating each word greatly enhances accuracy. In addition to speaking distinctly, the user must learn program-specific voice commands for turning the microphone on and off, editing text, and correcting errors. Typically, users find the most frequently used commands to be intuitive, and quickly master them. For example, using Dragon NaturallySpeaking, a cut-and-paste operation requires three commands: “select that” to highlight the desired tex; “move that forward three words” to move the selected text to the new location; and “insert after” to complete the operation.

People who find this training process tedious because they’re accustomed to using keyboard shortcuts and instinctively entering commands by mouse may choose to use the software for dictation but not for editing tasks. After experimenting, each user instinctively finds his optimal blend of microphone, keyboard, and mouse use. Many users enjoy significant productivity gains using the keyboard and mouse for some functions and the microphone for others.

First and most important, one must have reasonable expectations about the level of accuracy that can be achieved. If an accuracy level of 90% to 95% for dictation is acceptable, the current crop of speech-recognition systems will be satisfactory. For people with dyslexia or repetitive stress injuries, even lower levels of program accuracy will be a godsend.

Second, one must have reasonable expectations regarding the application to be operated by voice commands. For word-processing tasks, the products available at the local computer or business supply store will probably suffice because the program, typically, does not have to shift frequently between accepting data and interpreting program commands. However, users who want to use speech-recognition with a spreadsheet or database, constantly switching between program commands and data entry requires adjustments on the part of the user. Keyboard- and mouse-strokes for entering program commands and using the microphone for data entry is a viable option for some users. On the other hand, users who want to use the microphone exclusively for spreadsheet or database applications will probably need to purchase more sophisticated (and more expensive) speech-recognition software. This user will also need to spend more time training the software and herself.

Third, all users must have reasonable expectations about their willingness to properly train the software and modify their work habits. Again, the more time spent training the software, the higher the level of accuracy achieved. With practice, most users intuitively adapt their dictation cadence to maximize accuracy and productivity.

Finally, the user must be willing to buy the right hardware and software for the job. The authors suggest investing in a high-quality microphone, one better than what comes with the speech-recognition software. If the computer has a high-quality sound card, then the less expensive headsets may be fine. However, if the computer has a free USB slot, spending a bit more for a USB headset is a good idea. The user should carefully follow the instructions for adjusting the microphone and make sure that the microphone is pointed at the corner of the mouth so that it doesn’t pick up the sound of the user breathing. Reducing background noise also will enhance the system’s performance and accuracy.

The CPA Journal is broadly recognized as an outstanding, technical-refereed publication aimed at public practitioners, management, educators, and other accounting professionals. It is edited by CPAs for CPAs. Our goal is to provide CPAs and other accounting professionals with the information and news to enable them to be successful accountants, managers, and executives in today's practice environments.