Researchers developing way to synthesize speech with a brain-machine interface
Matthew Marge
SLTC Newsletter, February 2010
One of the great challenges to near-total paraplegics is the inability to communicate effectively with others. This problem is often due to the lack of control to the articulators that control speech production. Within the past decade, researchers in neurology have developed "brain-machine interfaces" - or BMIs - that allow people to communicate with only their thoughts. Advances in this field are now allowing a paraplegic patient to control a real-time speech synthesizer using a BMI that reads the neural signals that relate to the production of speech.
Until recently, the most advanced communication technology for paraplegics was mental control of typing keys - at the rate of about one word per minute. Keys can be selected when a brain-machine interface reads a patient's neural signals from the motor cortex of the brain. Unfortunately, this method of communication, although beneficial, makes it challenging for patients suffering from paralysis to hold live conversations with others, like family members or caretakers. Dr. Frank Guenther, Professor of Cognitive and Neural Systems at Boston University, and his collaborators are leading an innovative next step forward for BMIs - reading a paraplegic patient's thoughts and converting them to speech using a speech synthesizer [1].
The patient in this study (male, 26) suffers from "locked-in syndrome," which is almost complete paralysis, but with full control of cognitive abilities. The only voluntary control that the patient has is his ability to blink. For Guenther and his group, the focus was to locate the areas in the brain that control speech and its associated articulatory motor controls. They determined the precise areas that relate to the production of formant frequencies, the spectral representation of speech. Their primary goal for this study was to determine when the patient was attempting to produce vowels with his thoughts - a goal that could potentially be achieved just by learning about the brain signals that control formant frequencies in speech.
Once electrodes are implanted in the brain, a synthesis BMI can read the neural signals that pass through - wirelessly. In their system setup, when the patient's brain produces signals that correlate to the production of formant frequencies, an FM receiver picks up the neural signals transmitted (after amplification) by the implanted electrodes. Guenther and his colleagues used an existing model, the DIVA model, to determine where to place electrodes in the brain [2]. These signals are then digitized by a recording system, which converts them into "spikes". The "spikes" are then processed into the first and second formant frequencies, F1 and F2. This information is passed as input to a speech synthesizer, which provides almost immediate speech-based feedback to the patient.
The system's accuracy at determining the patient's intended vowels increased with practice. In practice sessions, the patient was instructed to listen to an example of the vowel he should speak (e.g., the "e" sound in "heat"), then think about speaking that vowel. Response time from the speech synthesizer after reading the patient's neural signals was almost instantaneous - 50ms. Over a period of 25 sessions, the patient's accuracy with BMI-based control of the synthesizer improved dramatically - from 45% to 70%.
Their key finding was that neural electrodes, when implanted in the brain, can potentially be a reliable method for allowing patients to communicate using a real-time speech synthesizer. This rapidly increases the pace of conversation that paraplegics can have with others. Also, this research can help us understand how the areas of the brain associated with speech-based motor control transmit information. This BMI is the first of its kind that uses an implanted electrode to control a BMI wirelessly - no other major hardware beyond an FM receiver and a computer is needed. Guenther and his colleagues believe that their results could be improved by installing more electrodes in the speech areas of the brain.
While vowel production by the patient has become fairly accurate with practice (70%), open questions remain when it comes to producing consonants. Guenther and his colleagues suggest one novel approach that holds promise when combined with their current work - to read the neural signals that control the motor pathways to articulatory speech muscles. Although this strategy has potential, it needs to be tested with people that have motor control of the speech articulators. They believe that this approach will be similar to what has been conducted with mouse cursor control-based BMIs. Despite the serious challenges that lie ahead for these researchers, the end goal is worth their efforts - those without the ability to speak may one day be able to hold real-time conversations with only their thoughts.
For more information, please see:
- [1] Guenther, FH, Brumberg, JS, Wright EJ, Nieto-Castanon A, Tourville JA, et al. (2009). A Wireless Brain-Machine Interface for Real-Time Speech Synthesis. PLoS ONE 4(12):e8218. doi:10.1371/journal.pone.0008218
- [2] Guenther FH, Ghosh SS, Tourville JA (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96: 280-301.
- Frank Guenther's speech lab at Boston University.
If you have comments, corrections, or additions to this article, please contact the author: Matthew Marge, mrma...@cs.cmu.edu.




Add A Comment