Filip Jurcicek

SLTC Newsletter, October 2009

10th Annual Conference of the International Speech Communication Association (INTERSPEECH 2009), which was held in Brighton, UK, September 6-10 2009, provided researchers with a great opportunity to share recent advances in the area of speech science and technology.

This year, the Interspeech Conference organizers received more than 1300 submissions, out of which 762 were accepted. The conference had almost 1200 attendees. The participants could attend 38 oral sessions, 39 poster sessions, and 10 special sessions. In addition to it, there were four keynote talks and eight tutorials.

As the theme of the conference was Speech and Intelligence, it offered several events in line with the theme. One of them was Loebner Prize contest, which was previously advertised in the January issue of the SLTC Newsletter. This year's winner was David Levy with his program called "Do-A-Lot". He had first become famous in 1968 for his bet that no computer would win a chess match against him for the next 10 years. In 1978, he eventually won his bet. In addition to it, he already won the Loebner Prize in 1997.

The conference offered eight tutorials to the attendees. The topics included speech prosody and synthesis, high dimensional data processing, speech, language and dialect recognition, and statistical dialogue management. However, two of the tutorials ("Emerging technologies for salient speech interfaces" and "Emotion Recognition in the Next Generation: an Overview and Recent Development") were also accompanied by a special session in the program of the conference.

The tutorial on "Emerging technologies for salient speech interfaces" was organized by Tanja Schultz (Karlsruhe University) and Bruce Denby (Universite Pierre et Marie Curie - Paris - VI). As it was explained, a salient speech interface enables speech communication without any sound. Such interfaces have the potential to offer higher robustness to the background noise because speech signal is digitalized before it is realized as sound. In addition to it, a salient interface offers extra privacy. As sound does not have to be produced, it cannot be overheard by bystanders. Later, in a special session on "Salient speech interfaces", researchers presented state-of-the-art techniques in the field. The tutorial on "Emotion Recognition in the Next Generation: an Overview and Recent Development" by Bjoern Schuller (Munich University of Technology) presented audio-based recognition of emotions such as anger, emphatic, neutral, etc. Later, it was followed by a special session "INTERSPEECH 2009 Emotion Challenge", where researchers presented results obtained on a standard data sets specially made available for this challenge. The corpus was recorded interaction between children and Sony's pet robot AIBO. The children expressed different emotions depending on whether AIBO was responding to their commands correctly or not. To explore different areas of emotional recognition, the challenge was divided into three subtasks: Open Performance Sub-Challenge, Classifier Sub-Challenge, and Feature Sub-Challenge.

Chi-Chun Lee, a member of the winning team [1] for the second sub-challenge, says: "We competed in Classifier Sub-Challenge, where we were restricted to use features provided by the organizers to come up with the classifier that can accurately classify each utterance into one of the five emotion categories. The result we obtained was 41.57% unweighted average accuracy, and it was 3.37% absolute (8.82% relative) improvement over the best baseline performance provided by the challenge organizer." Chi-Chun Lee also explains how their work differs: "I think the "ONE" thing that makes this work win this sub-challenge is that, while we were facing with 5-emotion class problems, we have decided to split it hierarchically using multi-level binary classifiers - where there are more sophisticated ways of optimizing the classifiers. Also, instead of conventional approach of trying to classify emotion vs. neutral as the first step, we deliberately made sure that the first level classification should be done on the "easiest" classification task while keeping the highly ambiguous task till the last level."

This year's Interspeech offered four excellent keynote talks. The first keynote speech was presented by Prof. Sadaoki Furui (Tokyo Institute of Technology) who was also awarded the 2009 ISCA Medal for Scientific Achievement. Prof. Sadaoki Furui's talk summarised his 40 years experience with research into speech and speaker recognition. The second keynote talk "Connecting human and machine learning via probabilistic models of cognition", was presented by Tom Griffiths (UC Berkeley). Interestingly, Tom Griffiths showed that joint learning of phonetic categories and lexicon can be advantageous to learning phonetic categories with predefined lexicon. Next, Deb Roy (MIT Media Lab) presented the talk entitled "New Horizons in the Study of Language Development". He described pilot efforts to analyse of 240,000 hours of audio and video recordings of one child's life. Finally, Mari Ostendorf (University of Washington) talked about rich text transcription and challenges connected with it.

At the closing ceremony of Interspeech 2009, several prizes were awarded and new ISCA Fellows were announced. First, the Christian Benoit Award, valued at 7622 Euros, has been attributed to Sascha Fagel (Berlin Institute of Technology) for his project called Thea - Talking Heads for Elderly and Alzheimer Patients in Ambient Assisted Living. The prize is sponsored by the Christian Benoit Association in memory of Christian Benoit, a researcher in the field of speech communication. Second, ISCA Award for the best paper published in the Speech Communication Journal for 2006-2008 was awarded to M. Benzeghibaa at el [5]. Third, the prizes for the best student paper were awarded to Thomas Drugman [2], Jonathan Malkin, Amarnag Subramanya [3], and Juraj Simko [4]. Next, 6 new ISCA Fellows were anounced. This year the ISCA Fellow Selection Committee elected Ann Cutler, Wolfgang Hess, Joseph Mariani, Hermann Ney, Roberto Pieraccini, and Elizabeth Shriberg for their significant contributions to the field of speech communication science and technology. Finally, the Interspeech 2009 Emotion Challenge prizes were awarded. In addition to it, Isabel Trancoso, the ISCA president, announced that she had finally received the acceptance from Thomson Reuters (ISI) to index Interspeech Proceedings of 2006 and 2007. The day after, she received the same answer from the Engineering Index of Elsevier. She will now start the process of indexing the proceedings of 2008 and 2009 in these two databases and continue the efforts concerning other major citation indexes.

As the Interspeech conference is an anual event, the next conferences are already in preparation. The 11th Interspeech conference will be held in Makuhari, Japan, 26-30 September 2010. Next, the 12th Interspeech conference will be held in Florence, Italy, 27-31 August 2011. Finally, the 13th Interspeech conference will be held in Portland, Oregon, USA, 9-13 September 2012.


  1. C. Lee, E. Mower, C. Busso, S. Lee, and S. Narayanan. "Emotion Recognition Using a Hierarchical Binary Decision Tree Approach". Proceedings of Interspeech 2009, Brighton, UK, September 2009
  2. T. Drugman, G. Wilfart, T.Dutoit: "A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis". Proceedings of Interspeech 2009, Brighton, UK, September 2009
  3. J. Malkin, A. Subramanya, J. Bilmes: "On the Semi-Supervised Learning of Multi-Layered Perceptrons". Proceedings of Interspeech 2009, Brighton, UK, September 2009
  4. J. Simko, F. Cummins: "Sequencing of Articulatory Gestures using Cost Optimization". Proceedings of Interspeech 2009, Brighton, UK, September 2009
  5. M. Benzeghiba, R. De Mori, O. Deroo, S. Dupont, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. Mertins, C. Ris, R. Rose, V. Tyagi and C. Wellekens: "Automatic speech recognition and speech variability: A review", Speech Communication, Volume 49, Issues 10-11, October-November 2007, Pages 763-786

Add A Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see:

Code in the picture:
Your Name(*):
Notify me of any further comments to this thread: