Checking In on Speech Recognition

A couple of weeks ago, I referenced the video that Jon Udell made of testing Dragon NaturallySpeaking. I commented that I was still underwhelmed. I’m still underwhelmed because I tried an earlier version by Dragon in 1998., and wrote an article about it. Here’s an excerpt:

Simple, right? Naturally Speaking claims 95% accuracy. That’s darn swank, until you evaluate what that really means. 95% is one in twenty. Imagine if once every twenty words you had to stop, go back, and tediously re-type the word. You would hardly be a model of efficiency.. Furthermore, the microphone (being “high-quality”) tends to record every paper shuffle and neck scratch as “the” or “and”. Consider this last paragraph, as read at normal speed, without corrections, into my computer:

Symbol, right? It keeps once hands free to do other tasks right? NaturallySpeaking claims at 95 percent accuracy. Great, I think, into you value eight what that really means. 95 percent is one in 20. Imagine if the ones every 20 words you had to stop, go back, and tedious the read type the word. He would hardly be a mottled efficiency. Furthermore, the microphone being high-quality tennis record every paper shuffle and crash as the war and.

 

Now, in truth, speech recognition has come a long way since 1998. However, based on Jon’s demo, I still don’t think it’s easy enough to use. In a follow-up entry, Jon references this chart by Richard Sprague. It shows the converging error rates of speech recognition software and humans. Roughly another seven years and we should be in good shape. I’ve pasted the complete text of the original article after the jump.

Say What?

I don’t know about you, but I talk to my computer. Well, mostly I swear at it. In fact, on more stressful days, one might mistake my home office for the cab of an eighteen wheeler jacknifed on a crumbling overpass. We have more sedate conversations as well. At times I cojole (‘come on, you can download that page’), plead (‘please don’t crash, please don’t crash, please don’t crash’) and deliberate (‘that’s not the way you spell armour, silly’). Up until now, the computer wasn’t listening. It just sat there, smug and inert.

Being a man of the nineties, I’m constantly driven to lacerate myself on the cutting edge of technology. I trotted down to my local mega-store and bought Dragon Naturally Speaking, one of a new batch of voice-recognition software aimed at the average consumer. The other popular choices include IBM’s (what’s Big Blue doing in this market?) Via Voice and Lernout and Hauspie’s Voice Xpress. The latter was rated highest in a recent software comparison, but it hasn’t reached Victoria stores yet.

I took it home, opened it up and strapped on the ‘high-quality microphone’ that’s included with the CD. Feeling like a lonely air-traffic controller, I installed the software and began the arduous personalization process. This involved reading twenty-odd minutes of text into the microphone, so that Naturally Speaking could get accustomed to my monotonic reading voice. Notable among the readings were a chapter of 2001:A Space Odyssey and Arthur C. Clarke’s newest novel, 3001. How better to market your book than to make potential buyers read the first three chapters out loud?

Finally, after more Clarke and some (eep) Dave Barry, I could go to work. The program is strikingly simple. Using a bare bones word processing-format, you speak and Naturally Speaking displays what it thinks it heard. When, inevitably, either parties make a mistake or the computer encounters a new word (there are more than the built-in 230,000 words out there, apparently), you have to teach Naturally Speaking how it sounds and to spell it.

Simple, right? Naturally Speaking claims 95% accuracy. That’s darn swank, until you evaluate what that really means. 95% is one in twenty. Imagine if once every twenty words you had to stop, go back, and tediously re-type the word. You would hardly be a model of efficiency.. Furthermore, the microphone (being ‘high-quality’) tends to record every paper shuffle and neck scratch as ‘the’ or ‘and’. Consider this last paragraph, as read at normal speed, with out corrections, into my computer:

Symbol, right? It keeps once hands free to do other tasks right? NaturallySpeaking claims at 95 percent accuracy. Great, I think, into you value eight what that really means. 95 percent is one in 20. Imagine if the ones every 20 words you had to stop, go back, and tedious the read type the word. He would hardly be a mottled efficiency. Furthermore, the microphone being high-quality tennis record every paper shuffle and crash as the war and.

Predictably, I soon eschewed speech-recognition for the good old keyboard. Better, I think, to risk carpal tunnel syndrome and actually accomplish tasks. Besides, my wife thinks it’s a little rude. Like excluding her from a conversation in our own livingroom. Trust me on this one, and wait a couple of years. By then, speech recognition will be better and computers will be getting a little more case-sensitive. It’s difficult to program the common sense that says that ‘tennis’ probably shouldn’t appear where it does.

In this case, there’s no point in keeping up with the Jones’s. Especially if all their computer is doing is mistyping their cuss words.

4 comments

  1. perhaps i should add that i had Calculus exam today. “Mottled effciency” perfectly describes my feelings towards my performance on the exam.

  2. I’m rather fond of “value eight” myself. Yeah, I’d give speech recognition an 8/10 right now. Cool, but keyboards are still 10/10. 🙂

    That said, for those who can’t use keyboards? It’s definitely improving, and that’s good.

  3. Yet the accomplishments of Dragon Naturally Speaking become downright stunning when you have a person who would otherwise be giving up computing entirely.

    I have a friend who has lost much of the abilities of her hands due to acute tendonitis. She has now been using Dragon Naturally Speaking for years as her primary input method and she’s beyond happy.

Comments are closed.