SpeechRecognition

SpeechRecognition is likely by 2015. (See notes below on why.)

Applications of Speech Recognition

People imagine that Speech Recognition will be used to create Word documents. This is a limited perspective.

More likely, speech recognition will be used... * ...to keep real-time transcripts during conversations. * ...to annotate and to comment. * ...to instruct and answer computers in a hands-free environment. (while driving; see DrivingCars, though) * ...to send instant messages. * ...eventually, for most computer interaction; the LinguisticUserInterface

Real-Time Transcripts: Discovering Conversation

Imagine that you are studying biology, in particular [WWW] Mitochondrea. You study with a co-learner, by voice, over the Internet. Because the subject is educational, you let the conversation be public. (see [CommunityWiki]OverHear for details.)

A computer program is transcribing your conversation in real-time, and another program is indexing your conversation in real-time.

A few states away, someone else is also studying biology. They perform a search, and discover the conversations you are having. They may leave a note at an information node representing your conversation (see [CommunityWiki]ProjectSpaceNetwork, [CommunityWiki]WikiAsCollage,) or, if you are talking at that particular moment, opt to listen in. A small icon lets you know that someone is listening in on the conversation. You may invite her in, or she may knock requesting to come in.

This is made possible by SpeechRecognition, but it is not a scenario people think of when they think of speech recognition. Most people imagine that they will be writing word documents with speech recognition.

Instant Messages

Speaking is intutive and fast.

Reading is fast, easier on memory, and easier to index.

"Index:" When you read something, it only takes a moment to go back to the beginning. You don't have to say: "Go back to the beginning," or "Wait; what did you just say?" Rather, your eye darts back to the beginning of the sentence. We call this, "indexing."

But listening is slow, hard to index, and forgetable.

And writing is slow and requires some intention.

With SpeechRecognition technology, we will not have to choose between one or the other. We will have the best of both.

That is, you will speak to tell someone something, and they will read to understand it. Your microphone will be connected to your instant messenger. When you say, "Jim, how are you doing?", the computer will recognize that you mean to talk with Jim, and will send the text "how are you doing?" to him. (Your identity will likely be recognized by VoiceRecognition.)

Jim may be gone at the moment. But when he returns in 10 minutes, he may speak "Dave, I'm doing fine. Work was a bit wearisome, but otherwise, I'm fine."

(TODO: Our CommunicationMores will be different. We will likely communicate suits or roles, and have different ways of collecting messages, in a more organized fashion.)

You are both speaking, but you are both reading each other's text.

Comments

Similarly, when you attach a comment to Slashdot, you will just hold down the spacebar, and speak your mind. Comment attached. Similar for attaching comments to documents, songs that are playing, or anything you care to comment on.

Real-Time Transcripts: Everywhere and (almost) Always

Recording conversations will be the norm. There will be few conversations about who said or didn't say something at work; It'll all be automatically recorded, like having a court reporter in every room. There will be a searchable, time-indexed, tagged and annotated transcript of everything that is spoken. Everything.

When people have a hard time understanding a concept, because it's being poorly presented, we'll have all the evidence we need. "See, when you explain things this way, it usually takes 3 times longer to explain it, than when you explain it this other way."

All of this is unlocked when you have SpeechRecognition. SpeechRecognition is no small thing. Do not be one of those people who envision themselves writing Word documents with speech recognition.

Interesting Interactions

2005

Divide speech recognition into two types:

Selective speech recognition is very good, and presently rolling out into corporate phone trees. It's hardly ubiquitous, but it's not rare either. Merely: uncommon, and expanding. [WWW] Philip Greenspun provides instructions online on how a developer can make a voice program today, that works with the existing plain-old telephone system.

General speech recognition is better, but still bad. You still have to speak a little slower, and provide some corrections. But the computer is pretty good at recognizing context, and letting you correct it. [WWW] John Udell has a Flash video demonstrating what Speech Recognition is capable of on November 2004. [WWW] (Associated article.)

2015

In the IntelDeveloperForum2005Keynote, JustinRattner was blase about speech-to-text. He said that by 2015, computers will have "strong capabilities" in speech-to-text. Near the end of the keynote, he said (TODO: I can only say: "Something to the effect of") "Absolutely going to happen. No questions." (TODO: relisten, or find transcript.) He seemd bored to talk about it. He was far more interested in VideoAnalysis, where your computer knows it's you because it sees you through it's camera, and 3DGraphics.

(older stuff)

On MarshallBrain's [WWW] Robots In 2015 page, he writes:

I recommend actually doing this.

  1. call (800) 555-1212, ask for "American Airlines"

  2. call the number you were given, and go to the English flight times listing: (1-1-1 through the phone tree)

  3. ask for info about any flight from your local airport, to some other airport

I was, personally, surprised by the quality of the voice recognition:

I was a little disappointed that I had to got through the initial phone tree (1-1-1), but my girlfriend tells me that there are systems in place that don't. She works in the health industry, and there are some places you call, and they ask: "What do you want to do?" "I want to refill my medication." (And then it goes from there.)

MarshallBrain's prediction doesn't seem out of hand.

There are other things to note as well:

In LionsTimelineFrom2004, I've listed "Mature LinguisticUserInterface" at around 2018-2022. By that, I mean: "Fluid communication with the computer," where you don't have to think about it.

-- LionKimbro 2005-01-17 01:30:05

-- LionKimbro 2005-02-03 06:19:23

In the IntelDeveloperForum2005Keynote, JustinRattner spoke almost blase about speech-to-text. He said that by 2015, we will have "strong capabilities" in speech-to-text. I believe he uttered something like, "Absolutely going to happen. No questions," towards the end. (I'd have to relisten to it.) He seemed to be positively bored talking about it. He was far more interested in talking about VideoAnalysis, where your computer knows it's you because it sees you through it's camera, and 3DGraphics.

So, when I talk with people, I say, "Speech-to-text. 5-10 years. JustinRattner says so."

(That is, 2010-2015.)

Personally, I imagine he's already seen it.

Again, call American Airlines, or [WWW] observe consumer grade speech recognition. It would not surprise me if JustinRattner, "[WWW] named Scientist of the Year by R&D Magazine for his leadership in parallel and distributed computer architecture," has seen better, and knows where we're headed.

-- LionKimbro 2005-03-27 07:04:59

I just saw on how to write your own speech-to-text apps <i>today.</i>

-- LionKimbro 2005-04-15 05:37:34

last edited 2005-08-12 20:06:32 by LionKimbro