academia | advice | alcohol | American Indians | architecture | art | artificial intelligence | Barnard | best | biography | bitcoin | blogging | broken umbrellas | candide | censorship | children's books | Columbia | comics | consciousness | cooking | crime | criticism | dance | data analysis | design | dishonesty | economics | education | energy | epistemology | error correction | essays | family | fashion | finance | food | foreign policy | futurism | games | gender | Georgia | health | history | inspiration | intellectual property | Israel | journalism | Judaism | labor | language | law | leadership | letters | literature | management | marketing | memoir | movies | music | mystery | mythology | New Mexico | New York | parenting | philosophy | photography | podcast | poetry | politics | prediction | product | productivity | programming | psychology | public transportation | publishing | puzzles | race | reading | recommendation | religion | reputation | review | RSI | Russia | sci-fi | science | sex | short stories | social justice | social media | sports | startups | statistics | teaching | technology | Texas | theater | translation | travel | trivia | tv | typography | unreliable narrators | video | video games | violence | war | weather | wordplay | writing

Sunday, July 18, 2010

Google voice gives great oral

Google Voice is something I can't live without -- it's one of the few significant advantages an Android phone has over an iPhone. Thanks to its ability to transcribe my every phone message, I no longer even listen to half of my messages, since many are doctors' offices confirming appointments or other folks just leaving me a phone number to call back. Numbers are very easy for Google Voice to transcribe correctly.

Not so for everything else, especially the type of stop and go banter that fills most phone messages. Listening to my messages while following the Google algorithm's best guess at their content makes me realize how few complete sentences are spoken by callers.

Still, it's hard to believe you couldn't gin up an algorithm that could do better than Google Voice does on many of my calls. Of course, that would mean I'd lose Google Voice's unintentional comedy. Witness this surprisingly provocative message from the post office (my emphasis):
Voicemail from: Unknown Caller at 8:47 AM

Yes, Hi Good Morning. This is calling from the post office, the mailman will be. Yeah, Hello Baby. Ohh. Peggy shop on another 20 minutes. Thank you.
Or take this helpful update from my father, who is apparently on ecstasy:
I'm still kind of into the weather that has just come back this evening and and we might. She will want to get some of the like. Here at something like that. So, but I'm pretty well over the sky. I don't know. Spoke to sort of city. There's just a little bit tired fun it so I don't do anything right. Yeah. we should check in to the prices of storm door Of, snirtstorm security guards at Home Depot.
I had never even heard of the word "snirtstorm", which apparently is the combination of snow and a dirt storm, and seems to afflict the northern Midwest. (In fact, is such an unusual word that it is a great ingredient for Googlewhacking -- I found one with "snirtstorm parabola".)

But seriously, I'm no expert on voice recognition algorithms, but I think it's a pretty safe bet that if you think a human has said "snirtstorm", you had better go with your second best guess instead. Although I do appreciate the creative capitalization of words in mid-sentence.

Labels: , , ,

Blogger Alexis on Thu Jul 22, 06:13:00 PM:
That is hilarious! I love "snirtstorm."

But on the serious point, what Google is trying to do is the Holy Grail of speech recognition: large-vocabulary (free text) speaker-independent recognition. Accuracy for large-vocabulary speaker-dependent recognition is very good (Dragon NaturallySpeaking, which trains to your voice) and accuracy for small-vocabulary speaker-independent recognition is also quite good (telephone systems, command-and-control systems) but combining unlimited text with highly variable humans is a huge challeng. Google does remarkably well in this tough area. So it may be "hard to believe you couldn't gin up an algorithm that could do better than Google Voice does on many of my calls", but it's true. I don't think there's anything better out there for the task they're undertaking right now.
Blogger Ben on Fri Jul 23, 09:56:00 AM:
That makes sense -- considering how inaccurate Dragon can be for me even when I'm speaking slowly and clearly, I shouldn't expect Voice to be that accurate when both the speaker and the speaking style are more unpredictable and noisy.

But there are times when I feel sure I could improve Voice. Most obvious to me are the times when I know what the correct translation is just by looking at the transcription... at these times, I think the Google Voice people might do a better job if the word in question were left completely silent or beeped out, because then they would have to develop their contextual prediction and not rely so heavily on the audio.

Eg: the other day, my mom left me a message that began "Hey Dad, It's mom hate doing sweetheart." Leave aside that my name is Ben (known to Google since my account is linked to a Google Profile) and so an ambiguous opening word that sounds like both "Dad" and "Ben" should be resolved in favor of the latter. Is there any question that the word "hate" should not be "how you" or "how are you"? That is a correction well within the reach of current research.
Anonymous Tove on Tue Aug 03, 05:25:00 PM:
So funny! It reminds me of the actual stoner, deliciously surreal lyrics at the end of the Beach Boys "Heroes And Villains:"

I've been in this town so long
So long to the city
I'm fit with the stuff
To ride in the rough
And sunny down snuff I'm alright
By the heroes and--
Heroes and villains.
Anonymous Anonymous on Sat Dec 11, 09:08:00 PM:
It could be hate leaving sweetheart.

Google does use contextual prediction. Most of the grayed out uncertain phrases on my google voice messages are extrapolated from one word it got wrong.

I really don't care if a program can figure out how to write a grammatically correct English sentence "inspired" by my voice mail if half the words are wrong. It seems like that's what you would get if you went overboard on contextual prediction.

Anyway, my latest voicemail apparently said "I want to come, pull harder." so I'm going to go pull on something. Bye ;D