The Spoken Dialog Challenge
SLTC Newsletter, July 2009
In several of the fields of speech and language processing, challenges have been appearing. Each field that has welcomed a challenge has become more focused and seen a dramatic improvement in accuracy. Challenges bring communities together. They enable comparisons of techniques within the same environment. And they are the impetus behind fundamental advancements.
This is therefore the announcement of the Spoken Dialogue Challenge (SDC) for the spoken dialogue community. After listening to discussions on the need for better assessment and comparison of work at venues like SigDIAL and the Young Researchers' Roundtable on Spoken Dialogue Systems we have taken on the responsibility of running a Spoken Dialogue Challenge for the whole community with the aid of a group of advisors who are seasoned researchers in our field.
The goal of SDC is to enable as many research groups in the field as possible to compare results on similar tasks. The field of spoken dialogue research is large, encompassing areas of interest that are as different as reinforcement learning and error recovery on the theoretical level and telephone-based services and multiparty, multimodal conversations on the application level. Although one challenge cannot initially bring all of the different interests together, SDC is being created as a framework where challenge patterns can be tried and expanded as experience and community feedback dictate.
We propose the follow structure, deliberately creating an initial proposal that the community can discuss and contribute to. Our initial proposal is to start from the simple spoken dialogue task of building an information presentation system that gives bus information. We have chosen this as the first application because:
- it is clearly defined
- we can get large numbers of real users easily (Let's Go)
- there is a large amount of training data available (over 80,000 calls)
- the task is fairly simple and should be relatively easy to implement
- and there is a readily available baseline, which participants can use as a starting point.
The participants will create a bus information system. Depending on their background, they may:
- create a system from their in-house architecture
- create a system using open source systems such as Olympus 2.
- use the Let's Go system and plug in one or more other modules such as an ASR
At the outset of the challenge, there will be training data that can be used in any way the teams see fit to train their systems. This data will include speech and the back-end (access to a database of bus schedules).
The challenge will consist of having three levels of callers progressively interact with the participating systems. The first level of callers will be a round robin of all of the members of the participating teams. They will be given numbers to call, a well-defined calling period, and scenarios to be completed. The second level of callers will be a controlled set of native speaker undergraduates. They will have the same conditions as the first level callers. The third level will be reserved for the systems that perform well enough, by some objective measure, that they can be used with real callers to the Pittsburgh Port Authority. The whole challenge will be overseen by a group of well-known researchers from a variety of sites and research interests.
The design of this first challenge makes it open to researchers interested in such areas as ASR, turn-taking, system architecture, speech synthesis, error handling, confidence measures, dialogue management, adaptation techniques, statistical modeling and lexical entrainment. There are some research areas that will not be able to be adequately addressed in the first iteration of this challenge. It is our intention to expand the challenge in its following iterations, based on our first experience and the feedback and suggestions from the spoken dialogue community.
A paper on the Spoken Dialogue Challenge will be presented at SigDIAL, and a presentation and discussions will take place at the Young Researchers' Roundtable this fall. The intention is to finalize the challenge details before the end of 2009 and run the first year's challenge in 2010.
Updates and details of the challenge will be posted at the Spoken Dialogue Challnge webpage.
Also a mailing list has been set up for further discussion. To join the list send a message to email@example.com with the following line in the body of the message
To send messages to the mailing list, email to firstname.lastname@example.org.
Alan W Black is an associate professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon Unversity. He works in the areas of speech synthesis, speech-to-speech translation and spoken dialogue systems.
Maxine Eskenazi is an associate teaching professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon Unversity. She works in the areas of spoken dialogues systems and computer assisted language learning.