Speech Application Student Contest
K. W. "Bill" Scholz and James A. Larson
SLTC Newsletter, July 2009
AVIOS, the Applied Voice Input/Output Society, is a non-profit foundation dedicated to informing and educating developers of speech applications on best practices for application construction and deployment. In early 2006 we decided to focus this goal on college students by giving them an opportunity to demonstrate their developmental competence to the speech community. The competition has now grown into an annual contest whose winners are substantially remunerated for their efforts, and whose winning applications are posted on our website.
The development of a speech application contest requires multiple distinct activities:
- Platform identification
- Sponsorship recruitment
- Evaluation criteria selection
- Application judging
Platform identification
Ideally, an application development platform suitable for student use must be cost-free to the student, and approachable without an extensive learning curve. Students should be able to focus on intricacies of speech / multi-modal application development without having to master complex multi-layered development environments. After all, contest entries are typically developed from concept through delivery in a single college semester. VoiceXML platforms such as BeVocal / Nuance Café and Voxeo Prophecy, as well as multi-modal platforms such as the Opera Multimodal Browser have proven well suited to meet these requirements. In addition, students with prior Microsoft development experience have successfully developed applications using SAPI and .NET, and even WinMobile5.
Each year we have attempted to extend the palette of available platforms to encourage student exploration. We've recommend sophisticated development environments such as Voxeo's VoiceObjects platform and CMU's RavenClaw/Olympus, and are encouraging the use of AT&T's Speech MashUp. In the future we hope to negotiate no-cost access to a number of other full-featured speech application generation environments.
Sponsorship recruitment
Recruiting sponsors for the contest is fundamental to the success of the program. Sponsors provide not only prizes for winners, but endorsement of their favorite development platform. Endorsement includes ‘getting started' instruction, reference documentation, and access to technical support for students who have chosen to use that platform. Sponsoring institutions receive recognition on our website and at our annual conference, and are offered access to student resumes.
Evaluation criteria selection
In order to ensure valid and unbiased selection of winners, we have selected several a priori criteria for evaluating speech / multimodal application quality. Criteria include robustness, usefulness, technical superiority, user friendliness, and innovation; and in an effort to ensure objectivity through quantification, each criterion is summarized using a 5-point Likert scale.
Application judging
Contest applications have been evaluated by speech technology leaders from companies including Microsoft, Nuance, Convergys, and Fonix on the basis of technical superiority, innovation, user-friendliness, and usefulness of each application. Judges typically had decade or more of experience in the speech technology field and explicit experience in the design, development, deployment, and utilization of speech and multi-modal applications. Each judge possessed sufficient computer resource to access student entries their intended deployment channel (e.g., a judge must have access to a WinMobile handset if (s)he is to judge a student's WinMobile project). Judges independently evaluate each student project, then the evaluation scores and associated written evaluations are consolidated and ranked. Judges then meet physically or virtually to review the rankings, possibly adjusting ranks to match shared observations, and finally selecting the winners and runners-up.
Noteworthy Results
In the past three years, nearly 60 students from 10 academic institutions in 4 countries have participated in the contest. Winning applications included the following:
- An appointment manager for a clinic using dialog to make, confirm or cancel appointments. The caller could also review personal data, or get general information such as hours and location. The interface was easy to use, and facilitated a task which is common to patients of a clinic.
- A recipe reader and organizer that used voice and video to describe selected recipes in detail, including options to explore characteristics of each ingredient.
- A set of children's games all of which were robust, easy to use, and complete. The games, which involved counting, adding, feelings, days of the week, and the seasons of the year, were enthusiastically received by 4-year-olds. The cleverly-designed voice interface uses prompts to cue the child to the task and to appropriately narrow the task to enable good performance with the child's voice.
- A meal calorie counter accepted voice input of common fast-food items and provided a calorie count of the selected meal. Well-designed prompts appropriately narrowed the task for the speech recognition system.
- The most technically sophisticated and ambitious application used Windows Mobile and GPS in a client-server architecture. It supported a navigation tool that accepted a spoken city name and brought up the Google map for that location, supporting voice commands to change the map scale and to move in various directions.
- The most innovative Multimodal application was a Visual Flight Rules (VFR) communications tutorial which provided textual training material along with a graphic aircraft control panel, and proposed typical communication tasks which a student-pilot would speak to an Airport Flight Controller.
Contest Evaluation
The contest proved effective in meeting our goals by fostering creative thinking in the use of speech technology and in the development of speech and multimodal applications. For the past three years, the contest has exposed students to a variety of speech technologies, including:
- VoiceXML from BeVocal, Loquendo, Voice Objects, I6NET, and Voxeo
- Speech APIs from Microsoft, AT&T Research, and Google
- Mashups involving speech from Cepstral, and AT&T Research
- XHTML plus Voice from Opera
The above corporate sponsors have provided prizes such as software packages, popular hardware and monetary awards, including airfare and lodging for attending Voice Search Conference, to contest winners. In addition to experiences with commercial products, students were also able to use university prototypes, including CMU's RavenClaw/Olympus and MIT's WAMI.
An unanticipated benefit of the contest is to help students learn more about the corporate environment and to help the corporate sponsors identify emerging talent. Comments from participating students indicate that the contest has been a success: "The contest was a great chance for me to gain some in-depth knowledge." "I was very satisfied with learning to write voice applications." "There is a notable difference between theory and practice in speech recognition", and from one sponsor, "Wow! I want to contact that student!"
For more information about past contest entries and winners, the current contest, and future contests, please see the AVIOS web site at www.avios.com.
K. W. "Bill" Scholz is President, AVIOS.
James A. Larson is an Adjunct Professor in the Computer Science Department of Portland State University.


Add A Comment