Discovery Channel

« back

Lip Reader Combines Audio, Video

Tracy Staedter, Discovery News

type size: [A] [A] [A]

March 13, 2007 — A lip-reading computer that could help solve crimes and assist consumers is the goal of a new project at the University of East Anglia in Norwich, England.

When coupled with a speech recognition system, the technology could work to not only decipher the words of criminals captured on video but could also improve voice-activated computers in cars or mobile phones.

"There is interest in using lip-reading for all sorts of human computer interaction, particularly in noisy environments," said Richard Harvey, a senior lecturer in the University's School of Computing Sciences.

"Noisy" can mean that an audio signal is muddled by other sounds, for example from a car radio or a crowd. But it can also mean that a visual signal is fuzzy or unclear.

People overcome such communication obstacles by pulling information from various places — lip movement, facial gestures, body language — to piece together what's being said. But computers designed for speech recognition typically focus on speech alone.

In previous experiments, Harvey and his team found that accuracy was significantly improved when a noisy audio signal was augmented with visual information.

For example, some speech sounds that are easily confused in the audio domain — "b" and "v," or "m" and "n" — are distinct in the visual domain. Conversely, some spoken words look identical in the visual domain, for example, "bat" and "pat."

The researchers will be working over the next three years to find the best way to combine audio with video.

First they will work with researchers at Surrey University in Guildford to figure out how and when a visual signal goes bad.

Next, they will work on extracting information from the face, particularly the lips. One approach models the shape and color of the lips as they move; another measures the size of the mouth opening.

Lastly, they will find the best way to match the visual cues from the lips to the works spoken, so that "bat" is indeed recognized as "bat," and not "pat."

"The fact is that it works and gives good results," said Peter Robinson, a professor of computer technology at the University of Cambridge. "There are a number of clever techniques to get from the image...to what goes into the processing and then combine that with the results from the speech analysis."

In three years, said Harvey, the team could have a camera able to recognize simple words and phrases.


« back

Picture: DCI |
Source: Discovery News
By visiting this site, you agree to the terms and conditions
of our Visitor Agreement. Please read. Privacy Policy.
Copyright © 2008 Discovery Communications
The leading global real-world media and entertainment company.
Discovery Channel The Learning Channel (TLC) Animal Planet Travel Channel Discovery Health Channel Discovery Store