Welcome to Luca!globe
 The CPA Journal Online Current Issue!    Navigation Tips!
Main Menu
CPA Journal
FAE
Professional Libary
Professional Forums
Member Services
Marketplace
Committees
Chapters
     Search
     Software
     Personal
     Help
Feb 1995

Making automated speech recognition work. (The CPA & the Computer)

by Beshers, Cliff

    Abstract- Automated speech recognition (ASR) systems allow computers to respond to the human voice. With this technology, a microphone or a headset can be attached to the computer to enable the machine to record and play back sounds, analyze speech, execute commands, run programs and convert speech into text. ASRs can help companies lessen their dependency on the secretarial staff, expedite the document creation process and enhance the efficiency of operations. In addition, this technology can be used as an adaptive device for disabled employees, as mandated by the Americans with Disabilities Act, and as a means for reducing the risk of injury in the workplace, including carpal tunnel syndrome and other repetitive stress injuries. The ASR systems now available in the market offer varying features and hardware and software requirements. Selecting a system is often based on their three main features: vocabulary, type of speech and speaker dependency.

Acceptance of ASR will be determined by those who drive technology in their businesses and those who would benefit from the technology - businesspeople who previously would not use a computer because of a perceived lack of typing skill or desire; those for whom the adaptive technology compensates for the limited physical ability or motor skills necessary to use a traditional keyboard; and others who need to operate in hands-free environments. For the CPA involved in firm or client automation, or the businessperson looking for the right way to become more efficient, the time has come to consider ASR.

What Is ASR?

ASR lets you attach a microphone or headset to your computer, either directly or indirectly. More than just a computerized tape recorder that captures and plays back sounds, ASR analyzes your speech, executes commands, and runs programs or converts your speech into text in your favorite word processor or other program.

When Did ASR Become Available?

Science fiction has popularized computers that listen and respond to verbal commands and inquiries for more than 25 years. Companies that develop equipment for the disabled and the military have been working with ASR systems for many years. For the disabled, ASR lets those with limited or no use of their hands gain the advantages of using a computer. The military has experimented extensively with voice input for fighter pilots and others who need a hands-free use of a computer.

For the business world, however, sound, in general, has just been a plaything. Although Apple Macintosh and Amiga computers had sound capabilities, MS-DOS compatible computers were limited to beeps and buzzes for 10 years.

The year 1991 may be considered a pivotal year for ASR. Microsoft Windows 3.1 began shipping with multimedia features. With the introduction of the 486 chip and availability of large amounts of memory, computers were finally powerful enough to process algorithms quickly enough to substitute for more expensive proprietary systems. Sound boards and microphones started making inroads into the office environment.

These powerful computers are the core of widespread ASR. In order to quickly analyze spoken commands, the matching patterns need to be stored in RAM (random access memory - the computer's fast active memory) to be matched. The amount of information captured in a few seconds of speech is large; 1 megabyte (MB) per minute or more. Compression techniques that can more quickly process the information or compress it for storage are in development. This becomes even more important as the information needs to be transferred between two locations on computer networks.

Why Is ASR Important?

Experts think that today's ways of interacting with computers are archaic. Apple computer has long decried the PC's command line obscurity (C:\>) and how foolish DOS developers are forcing users to remember how their different packages do similar tasks using different keystrokes. Microsoft Windows, they say, is Apple's vindication. However, sitting down in front of a Mac or a Windows-driven PC is not intuitive, unlike a pencil and paper or a Magna-Doodle. In 1993, Apple introduced their Newton technology, which uses handwritten input. However, handwriting speed pales in comparison to typing - which pales in comparison with how quickly we can speak. Recognizing this, IBM has announced that they will soon be selling a collection of portable computers - with built-in ASR.

ASR makes computer use easier than current input devices. Computers are easier to integrate into our lives when we are not tied to keyboard or mouse. Some tasks cannot be done as easily with voice as with a mouse, pen, or digitizer, like continuous drawing. But for speed and accuracy, ASR produces a more satisfying result than current handwriting (pen- based) technology. As the power of computers continues to grow, ASR may become the only viable method of communicating man to machine.

ASR can also automate tasks that technologies like bar-coding cannot. Many business environments, like food processing or hazardous workplaces, benefit from computer input without physical contact and prelabeling. In addition, bar codes and other automatic identification methods fall short in some data-entry situations because of the need for human interaction and decision making.

Introducing ASR can make workers more productive. If ASR requires a large learning curve, it won't replace dictation or direct input. The idea of futuristic and almost magical capabilities will not make a more satisfied and productive worker. The user must see the benefit and reap the increased productivity quickly, if this is to be the technology of the future.

Features

Not just a product of the future, ASR systems are available now. There are many options, offering different features with varying hardware and usage requirements. The three primary categories in ASR are vocabulary, type of speech, and speaker dependency. In addition, there are different ways speech is analyzed.

Vocabulary. Some systems allow for many words to be recognized at one time (large vocabulary); others can recognize fewer phrases at one time. Better small vocabulary systems offer multiple vocabularies the user can switch between, depending on context. Dictation requires a large vocabulary, while data entry may only need a small list to get the job done.

Continuous or discrete speech. Some systems require a distinct break between words or phrases; others allow continuous speech. Generally, large vocabulary systems are discrete speech, and small vocabulary systems can be continuous speech.

Speaker dependent or independent. Some systems must be trained for each speaker; others are speaker independent. Training involves a preliminary time of dictation with the system, where the computer learns how you say certain sounds, words, or phrases. A personal dictation system does not require speaker independence, whereas a public computer-information kiosk requires independence. Low-end systems, even after speaker training, try hard to match a verbal command to something they know, often with ill effects.

Speech systems employ artificial intelligence methods or statistical algorithms to pull the meaning from the sound spoken. Some systems guess based on context, rules of grammar, or the type of work. They also offer methods of correcting mistakes. In dictation, for example, you speak to the computer, and it offers what it thinks you said on screen, with an opportunity to easily correct it or continue on. If you or the machine make a mistake, you say "oops," and a menu appears with homonyms or similar sounding words for you to choose from. Because the system looks up words in its internal dictionary, a spelling checker is unnecessary (proofreading is still a good idea).

Most speech systems can be used to operate existing applications. A command can trigger a series of events, including adding paragraphs and text formatting. Using "macro substitution," ASR can act as a secretary, putting together a perfect letter, or as an interpreter, looking up words in a foreign dictionary. ASR can make dissimilar applications operate the same. A spoken "End program!" can run the steps to quit Lotus 1-2-3, WordPerfect, or Best Depreciation.

Tasks - How Can ASR Be Used?

You will soon want to consider not if, but how you will be getting involved in ASR. Where are other places it can be useful?

Recent laws make ASR an important issue for many companies. The Americans with Disabilities Act mandates appropriate adaptive devices for your employees to do their jobs. ASR may be a way for a valuable employee to continue to contribute to their company.

Recent statistics show that 12% of the current data-entry work force will get repetitive stress injuries (RSI) like carpal tunnel. Voice will provide an alternate method of data entry that reduces the chance of injury. The costs of a speech system are dwarfed by those of disability and rehiring. Of course, future RSIs may include laryngitis and sore throats.

Can a computer act as an information clerk for your company? A system can be designed to anticipate "which way is it to ...?," "where is the ...?," or "where can I find ...?" questions. With current technology, this system can support speaker independence and continuous speech, making it very easy to use.

Could your staff work better if their hands were free? ASR systems can be designed to be portable; connect to host systems by radio; withstand factory environments; ...provide audible feedback to the employee during receiving, processing, and inspection; and order entry and inventory taking. ASR allows entry as far away as the voice can reach, and promotes mobility and miniaturization by eliminating keyboard input. Drivers, pilots, illiterate workers, those wearing large gloves in food preparation facilities, freezers, and clean rooms can all use computers where they were impractical before.

Usage Case Study

John Heveron is using ASR to dictate letters, enter electronic mail, and enter comments on billings in a hands-free manner. Here is how he went from dictation machines to ASR.

Heveron & Heveron, CPAs is a small accounting firm in Rochester, NY. As a firm, Heveron & Heveron has been a long-term user of automation. The firm's employees are linked together using a Lantastic (Artisoft, Inc.) network. They use WordPerfect Office (WordPerfect Corporation) for electronic mail, CD-ROM services for research and tax forms, and PCAnywhere (Symantec Corporation) for remote support of their clients using computerized accounting systems.

Heveron did not use his computer much, preferring dictation and manual methods of communication. He found to his dismay that his personnel was spending large portions of their time in the dictation correction process. In order to better allocate internal resources, he began to review the state of the ASR marketplace.

At the time, Apple and Compaq had just begun shipping microphones with their new PCs, Digital Signal Processing (DSP) chips were shipping with Apple's AV series computers, which permitted vocal command of the Mac machines. A number of low-end products would perform small tasks, but not do the dictation he wanted.

The dictation systems seemed to be limited to less orthodox operating systems (OS/2) or aimed at the adaptive equipment marketplace. Finally, Heveron was able to find a few product offerings that were feasible, settling on the original ChatterBook developed by Natural Input Technologies, Inc. of Cortland, NY. One of the most exciting new products on the market is the new ChatterBook II speech input notebook computer system from NIT. The ChatterBook II incorporates both the Dragon and the IBM large vocabulary systems, all in an eight-pound Pentium-based system that can be used anywhere.

The price of the system dropped weekly while John considered his purchase. IBM and other competitors announced and shipped numerous product offerings with major performance increases and improvements within a three-month span.

Heveron now has a laptop computer that functions as a personal secretary at home and in the office. ASR has Heveron so excited, his machine has become an almost constant companion.

Why Not Use ASR?

If ASR can reduce your reliance on secretarial staff, speed up the document creation process, and improve operations, are there any reasons to not consider ASR? For now, yes. ASR systems will not replace every other input technology completely.

Preexisting automation may be superior to ASR for the tasks involved. Supermarkets were among the first to implement bar coding. The grocery clerk won't benefit from shouting out the products you buy instead of quietly scanning the bar code. Bar codes also will not disappear because they can be used to track inventory and do other tasks without human intervention. In addition, information on paper is more easily input using scanners. Most aspects of drawings and blueprints are more accurately entered with digitizers, and quick sketches with a stylus. Finally, keyboards and mice are so entrenched that it will take some time to replace them, just as it took a decade for mice and window systems to replace text-only terminals.

In addition, working conditions may make ASR unsuitable, especially when you are working with confidential information in a public area. You can't take notes in a crowded meeting with ASR. You won't work on updating your company's finances on an airplane with ASR. Your banker won't enter your account number or the hotel clerk your room number in public using ASR.

If ASR is so good, how do you get started? By finding the right combination of hardware, software, and implementation assistance for your business. You may already have the ability to do limited ASR, especially if you own one of the AV computers from Apple Corporation. Otherwise, you can inexpensively add some ASR capability with off-the- shelf software, or put together a complete system for more sophisticated uses.

Examples

Low-end products provide command and control.

Voice Blaster, from Covox Corporation (503-342-1271), includes headphones to plug into your computer's sound card and software to let you automate repetitive tasks in Windows and DOS applications.

Microsoft Windows sound system "Voice Pilot" also obeys your voice commands; "ProofReader" reads information as it is entered or displayed on your computer's monitor; $79.95 list, $289.00 with sound card.

More sophisticated systems can be used for any of the basic uses of ASR:

* Command and control. The simplest use of ASR in many cases, it is the area most low-end systems cover. It is also potentially an extremely sophisticated area. You would want the command "lower the radiation in the nuclear pile by 38%" to provide very careful control.

* Data entry & retrieval. Data-entry systems require special design considerations, based on the user interface. Developers are rapidly developing ASR add-ins for FoxPro and other development languages.

* Report generation. Getting information from a computer in an intuitive manner may require continuous speech, built-in graphics tools, and a natural language interface. Natural language lets you ask questions about the system as if you asked a person, and isolates the user from having to learn the data-base program's technical requirements.

* Document creation. This combines command and control with large vocabulary dictation. Users either enter text into the document or speak commands that allow them to, for example, format text, navigate through the document, and save files.

* Graphics. While it may seem strange to "talk" a picture, telling the computer to "put ball 3 on top of square 5" can be quicker than grabbing the objects with a mouse and moving them. Charts and graphs may be quite natural to create with ASR: "Plot yesterday's gold prices from 1 to 3 p.m."

* Telephone. AT&T is out to replace its operators for collect calls and to use ASR for transcription and language interpretation services, and;

* Speaker recognition. Being able to identify a person from their voice is useful for security and for distinguishing between multiple speakers. The computerized office assistant of the future will not only answer from any location, it will know to whom it is speaking. This technology is not widely available.

DragonDictate (Dragon Systems, Inc.) is a large vocabulary dictation system that supports a dictation rate of between 35 and 50 words per minute. It is speaker dependent, requires discrete speech, and currently has an active vocabulary of up to 60,000 words. DragonDictate is hands- free: You can operate your computer entirely by voice. It has an extensive macro facility.

DragonDictate runs trader DOS, or from a Windows 3.1 or OS/2 DOS prompt. It requires a 486DX-66 or faster processor, with 16 MB of RAM and a haft-size speech adapter card. DragonDictate is available with several vocabulary sizes, beginning with the Start Edition with 5,000 words for $695, DragonDictate 30K with 30,000 words for $995, and the recently announced DragonDictate 60K with 60,000 words for $1,995. A Windows version was released in late 1994.

The IBM Personal Dictation System (IPDS) for OS/2 and the Windows version called VoiceType for Windows are large vocabulary dictation systems. Like DragonDictate, they are speaker dependent and require discrete speech. However, they support faster dictation, between 70 and 100 words per minute. These systems use statistics to try to choose correctly between homophones, words that sound the same. They record your dictation session for later review. They have an extensive macro facility. They also generate macros automatically for native OS/2 and Windows applications, so users can directly manipulate menus and dialog boxes.

Currently, they require OS/2 or Windows running on a 486DX-66 or faster processor, with 16 MB of RAM and a half-size speech adapter card or a PCMCIA card. IPDS and VoiceType for Windows have a list price of $998 to $1,095, depending on the hardware interface. Specialized language modules are available for journalism, emergency medicine, and radiology at a cost of $499 each.

Common Questions

Question: How does ASR function in a noisy environment?

Answer: There are no problems if the ASR system is trained in the noisy environment or a good noise-reducing microphone is used.

Question: How does the user function in a noisy environment?

Answer: You'll need to isolate the users or adjust the attitude in the workplace to working in the midst of noise.

Question: How does the computer know when you are talking to it or talking to someone else?

Answer: You can use hardware switches, headphones combining speech input and telephone capabilities, verbal commands to "wake up" and "go to sleep," or require that the user address the computer "by name" before each computer command.

Question: Do my 100-word per minute secretaries need ASR?

Answer: Maybe not. However, many professionals are dictating documents and passing them to secretaries for proofreading and correction. And, as mentioned before, repetitive stress injuries (RSI) are increasingly common among the office work force. ASR may allow an experienced secretary to stay on despite RSI.

Dramatic Changes Are Coming

Current ASR technology is quite powerful and affordable. Many applications currently exist where the payback on investment is 6-12 months. Macro capabilities make it possible to work with existing software that reduces the learning curve tremendously.

Most of the high-end computers sold today can be speech enabled. Within two years, large-vocabulary speech recognition will be available on most new machines. As a result, ASR will greatly change the computer workplace, opening up new areas of use and improving existing ones.



The CPA Journal is broadly recognized as an outstanding, technical-refereed publication aimed at public practitioners, management, educators, and other accounting professionals. It is edited by CPAs for CPAs. Our goal is to provide CPAs and other accounting professionals with the information and news to enable them to be successful accountants, managers, and executives in today's practice environments.

©2009 The New York State Society of CPAs. Legal Notices

Visit the new cpajournal.com.