Brno University of Technology detecting OOVs in DIRAC project

Honza Cernocky

cernocky@fit.vutbr.cz

SLTC Newsletter, July 2009

In machine recognition, low-probability items are unlikely to be recognized. For example, in automatic speech recognition (ASR), the linguistic message in speech data is coded in a sequence of speech sounds (phonemes). Substrings of phonemes represent words, sequences of words form phrases. A typical ASR system attempts to find the linguistic message in the phrase. This process relies heavily on prior knowledge in a text-derived language model and a pronunciation lexicon. Unexpected lexical items (words) in the phrase are typically replaced by acoustically acceptable in-vocabulary items. OOVs (out of vocabulary words) are truly the nightmare of large vocabulary speech recognition, as each of them typically causes not one, but multiple recognition errors (due to language model mismatch).

Researchers of Speech@FIT group at Brno University of Technology (BUT, Czech Republic) are focusing their research effort on the detection of OOVs by combining the posterior probabilities generated by strongly and weakly constrained recognizers [Hermansky2007a]. Time intervals, where these two recognizers do not agree, are candidates to contain OOVs or other events the recognizer did not see in the training (e.g., unusual pronunciations, words with short embedded pauses, and unseen noise). The strongly constrained posteriors are generated by a full large vocabulary continuous speech recognizer (LVCSR) with a language model while weakly constrained ones are produced by a much simpler phone recognizer. A neural net is trained to produce probabilities of misrecognitions due to OOV content.

Initial work was done during the Johns Hopkins University summer workshop in 2007 [Hermansky2007b, Burget2008] on clean read speech from Wall Street Journal, but using a very limited recognition dictionary to introduce OOVs.. BUT recently moved to a more realistic task and verified the proposed approach on the CallHome database of conversational telephone speech [Kombrink2009]. BUT also developed a visualization tool allowing to "see" the posteriors and to explain some outputs of the system that were previously considered detection errors (see figure below).


Visualization of posteriors from strongly and weakly constrained systems, and OOV detection. This example shows the OOV word "Tripoli" recognized as "triple like".

The research is done under the umbrella of DIRAC (Detection and Identification of Rare Audio-visual Cues) – an integrated project sponsored by the EC under 6th Framework Programme. BUT efforts in DIRAC are led by Prof. Hynek Hermansky, who recently moved from IDIAP Research Institute in Switzerland to the Johns Hopkins University, and is also affiliated with BUT.

Further reading

  • [Hermansky2007a] H. Hermansky: Dealing with unexpected words, The Neuromorphic Engineer, Volume 3, Issue 2 , March 2007.
  • [Hermansky2007b] H. Hermansky, L. Burget, P. Schwarz, P. Matejka, M. Hannemann, A. Rastrow, C. White, S. Khudanpur, and J. Cernocky: Recovery from Model Inconsistency in Multilingual Speech Recognition, JHU workgroup final report, Baltimore, US, 2007.
  • [Burget2008] L. Burget, P. Schwarz, P. Matejka, M. Hannemann, A. Rastrow, C. White, S. Khudanpur, H. Hermansky, and J. Cernocky: Combination of strongly and weakly constrained recognizers for reliable detection of OOVs, In: Proc. ICASSP 2008, Las Vegas, US.
  • [Kombrink2009] S. Kombrink, L. Burget, P. Matejka, M. Karafiat and H. Hermansky: Posterior-based Out of Vocabulary Word Detection in Telephone Speech, accepted to Interspeech 2009, Brighton, UK.
  • BUT Speech@FIT group: http://speech.fit.vutbr.cz/ DIRAC project: http://www.diracproject.org/
  • JHU’07 workgroup "Recovery from Model Inconsistency in Multilingual Speech Recognition": http://www.clsp.jhu.edu/ws2007/groups/rmimsr/

Add A Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

Code in the picture:
Title:
Your Name(*):
Email:
Notify me of any further comments to this thread:
Website:
Comment(*):