Interview: Speech Interfaces for Low-Literate Users
SLTC Newsletter, August 2013
Information and communication technologies for development (ICT4D for short) are software tools designed for societies in the developing world. A common feature of developing societies is the lack of a literate population, which can prevent people from taking full advantage of today’s modern technologies. Agha Ali Raza and his colleagues have developed a speech-based technology that has reached over 165,000 people in the developing world.
Many disadvantaged users in developing countries have a low or non-existent level of literacy. There is a general fear that today’s technologies leave these users woefully behind, as many of them (for example, tablet computers and smartphones) assume the user can read. Enter the ICT4D community: a worldwide group of researchers focused on designing technologies specifically for societies in developing countries. Not only does this community need to consider problems like deployment in places with outdated infrastructures, they must also take into account the cultural and socioeconomic impact of the technologies they intend to deploy. How would a member of a developing country use the Internet, for example? Why would they use it? How can we make it easy to access if the user is illiterate?
A joint project between researchers at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania, USA and at Lahore University of Management Services (LUMS) in Lahore, Pakistan aims to design information technology specifically for a developing society. They deployed a speech application, called "Polly", to Pakistan that allows users to access entertainment and information just by using voice over a simple (not smart) phone. Their work, recently published at the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), has received critical acclaim, including a Best Paper award at the conference . SLTC interviewed two of the lead researchers on the project, Agha Ali Raza, a doctoral student in the Language Technologies Institute at CMU, and Roni Rosenfeld, a Professor in the School of Computer Science at CMU.
SLTC: Can you tell me about the goals of ICT4D (Information and Communication Technologies for Development? How does Polly relate to ICT4D?
The goal of ICT4D research is to find ways for Information and Communication Technology to aid socio-economic development around the world. Polly is designed to spread speech-based information access among low-literate mobile phone users. It employs simple entertainment to motivate its users to train themselves in the use of Interactive Voice Response (IVR) systems and viral spread to advertise these services.
SLTC: What is Polly and where does it operate?
Polly is a telephone-based speech service that allows its users to make a short recording of their voice; modify it using a variety of funny sound effects; and forward the resulting recording to friends. This simple voice game serves as a conduit for introducing more core development-related services to its users and for training them in the use of IVR systems. It has operated for over a year in Pakistan and has been recently deployed in India as well. In Pakistan it successfully spread the use of a job-audio-browsing service among some 165,000 users.
SLTC: How did you get Polly to go viral? After going viral, what were next steps for the project?
Polly provides attractive and simple-to-understand entertainment, for free. We believe that these factors played a pivotal role in its viral spread among our target audience, where there is a lot of pent up demand for entertainment. After going viral, our next steps were to advertise an audio-job-browsing service, and to test our users’ sensitivity to cellular charges (air-time credits).
SLTC: Who were your target users?
Low to middle socioeconomic status (SES) mobile phone users in Pakistan.
SLTC: Is Polly just an entertainment tool?
No, the entertainment part is used to help the service spread virally, and to train people in the use of IVR systems. The ultimate goal of Polly is to disseminate many different types of voice-based information services to users of mobile phones in poor countries around the world. Many such users are illiterate or low-literate, and voice-based communication in their preferred language over their simple mobile phones is our best hope of serving their information needs.
SLTC: Congratulations on the CHI Best Paper Award! What was the study you investigated in that work? How did you use Polly?
In the paper, we demonstrated a dramatic viral spread of Polly to some 165,000 people all over Pakistan (and even beyond). We also:
- Tried different strategies for shifting some of the cellular airtime costs to the users, and studied, via randomized controlled trials, the effect of these strategies on users’ behaviors.
- Demonstrated how a useful information service (audio-browsable job opportunities) can be delivered via the viral conduit of entertainment. These job ads have been listened to more than 386,000 times.
- Analyzed users’ pattern of using Polly over time, and their entertainment choices.
SLTC: What do you think CHI's recognition of your work means for the ICT4D Community? The Speech and Language Processing community?
It highlights the potential of virally spreading speech based information services, and their potential value to illiterate people throughout the world. It hopefully will attract others in the speech processing community to create many such services in diverse countries and a variety of languages. We will be very pleased to facilitate such efforts.
SLTC: How did you measure Polly's growth? Who was using it?
We measure Polly’s growth by the number of people who use it, the number of times they call it, and the number of times they accessed our information services (the job opportunities). Polly was used overwhelmingly by low- and mid-SES Pakistanis, most of them young men in their 20s and 30s.
SLTC: What were the greatest challenges?
Like most technology projects in developing countries, the greatest challenges were logistic in nature: e.g. procuring the 30 telephone lines and setting them up properly for smooth operation; dealing with infrastructure failures of various types, managing overloads on the system, etc.
SLTC: What did you find when Polly went "live" in Pakistan?
We found that it spread exceedingly fast, even faster than we had hoped for. We launched a pilot test in 2011 using a single phone line. That single line was overwhelmed within one week. So we shut it down, and re-launched in 2012 using a bank of 30 telephone lines. It took 10 days for these lines to saturate at full capacity. When two weeks later our telecom partner fixed a bug in their system, our capacity doubled overnight but was again saturated within days. It seemed there is tremendous pent-up demand in developing countries for these types of entertainment and information services.
SLTC: Were there any particularly useful lessons you learned from deploying a dialogue system in a developing nation?
Our two main lessons were:
- Logistics (e.g. telephony setup, system maintenance, power failures, system crashes) are the hardest part of this type of project.
- There is tremendous pent-up demand for entertainment and information among our target population.
SLTC: What kind of impact do you foresee this work having on the dialogue community? On developing countries?
We hope that our success will inspire many young researchers and developers in the dialog community to develop and deploy speech-based services for poor people throughout the world.
SLTC: Finally, where do you see this work headed? What services are next?
The biggest impact will be achieved when many developers try many types of information services in many countries and in many different languages. For now, we have been focusing on expanding the connection between entertainment, job opportunities, and job and skill training. We have been hard at work interfacing Polly to existing private and government-affiliated job services in Pakistan and India.
For more details, please visit: http://www.cs.cmu.edu/~Polly/
Thanks to Ali and Roni for their time! We look forward to seeing how their work continues to impact developing societies.
-  Agha Ali Raza, Farhan Ul Haq, Zain Tariq, Mansoor Pervaiz, Samia Razaq, Umar Saif and Roni Rosenfeld, Job Opportunities through Entertainment: Virally Spread Speech-Based Services for Low-Literate Users, in Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, 2013.
-  Agha Ali Raza, Mansoor Pervaiz, Christina Milo, Samia Razaq, Guy Alster, Jahanzeb Sherwani, Umar Saif, and Roni Rosenfeld, Viral Entertainment as a Vehicle for Disseminating Speech-Based Services to Low-Literate Users, in Proceedings of the Fifth International Conference on Information and Communication Technologies and Development (ICTD '12), 2012.
If you have comments, corrections, or additions to this article, please contact the author: Matthew Marge, mrma...@cs.cmu.edu.
Matthew Marge is a doctoral student in the Language Technologies Institute at Carnegie Mellon University. His interests are spoken dialogue systems, human-robot interaction, and crowdsourcing for natural language research.