Speech and Language Technology as a Hearing Aid

Svetlana Stoyanchev

SLTC Newsletter, July 2009

This article gives an overview of speech and multimodal technology used to help people with hearing disabilities. One of the described methods uses visualization and natural language processing techniques. The other method involves an invasive procedure that enables deaf people to gain the ability to hear.

Visualization of American Sign Language

Current state-of-the-art systems perform translation of English to Sign English. Sign English is a word-to-word translation from English where each word in a sentence is replaced with its sign. American Sign Language (ASL), on the other hand, is a real natural language. It has its own grammar, lexicon, and linguistic structure. ASL is a graphic language- it uses eye gaze and facial expressions to convey meaning. In addition to the lexicon, ASL also has “classifier predicates” which consist of semantically meaningful handshapes. A handshape depends on the type of the entity described: whether it is a motion, a surface, a position, etc. ASL also requires spatial reasoning: 3D hand movement paths play an important role in ASL. Speakers use 3D space for referring to described objects along with the spatial orientation of their hands.

Automatic translation of speech or text into animation of sign language would significantly increase the accessibility of information to people with hearing disabilities. Machine translation is a rapidly developing field in language research. As ASL is a natural language, machine translation techniques may be applied to it. However, generation of ASL poses an additional challenge of visualization and spatial reasoning which is not present in other languages. Researchers at CUNY address the automatic generation of ASL [1] and present the first system that generates classifier predicates.

The researchers performed both subjective and objective human evaluation of the visually generated ASL. As a subjective evaluation listeners were asked to judge understandability, grammaticality, and naturalness of the generated expressions. As an objective evaluation listeners’ comprehension of automatically generated expressions was evaluated.

The subjects consistently judged the visually generated ASL more grammatical and understandable than visually generated Sign English. The comprehension task also showed that listeners correctly understood an utterance in more than 80% of cases while comprehension of English Sign was around 60%. These results show great potential for automatic translation to visualized ASL.

Cochlear Implant Technology

Another technological solution for the hearing impaired is offered by cochlear prosthesis. The prosthesis is inserted into the cochlea (the spiral-shaped cavity of the inner ear that contains nerve endings essential for hearing) by a surgeon. This technology combines scientific knowledge in biology and in signal processing. It involves an invasive procedure on a patient and relies on signal processing techniques for analyzing and transmitting sound information to the brain.

The normal hearing process is amazingly complex. Sound waves undergo a series of transformations before arriving to the brain:

  1. Sound waves are picked up by the outer ear
  2. The middle ear converts them to mechanical vibrations
  3. The inner ear converts the mechanical vibrations to vibrations of fluids inside the cochlea
  4. Pressure variation leads to displacement of a membrane
  5. Hair cells attached to the membrane bend according to the membrane displacement
  6. Bending of the hairs releases an electrochemical substance which causes neurons to fire. These neurons transmit acoustic information to the brain.

The most common cause of deafness is damage to the hair cells. A cochlear prosthesis is designed to electrically simulate the auditory nerve. It bypasses the normal hearing mechanism (steps 1 – 5) and electrically stimulates the auditory neurons directly. Cochlear implant researchers face a technical challenge of stimulating auditory neurons to convey meaningful information, such as amplitude and frequency of the acoustic signal to the brain. The cochlear implant consists of a microphone, a signal processor, a transmission system, and an electrode inserted by a surgeon into the cochlea.

Strategies used by cochlear implants for signal transmission are largely dependent on the signal processing techniques such as compressed-analogue, continuous interleaved sampling, and extraction of fundamental frequency of formants.

Assessing implant performance is difficult as the cause of deafness and personal characteristics affect the effectiveness of the implant. Depending on the physical characteristics, some people are able to get a significant improvement in hearing. You can hear some simulations of speech perception using cochlear implants at http://www.utdallas.edu/~loizou/cimplants/cdemos.htm

References

  • [1] Matt Huenerfauth et. al. "Evaluation of American Sign Language Generation by Native ASL Signers" ACM Transactions on Accessible Computing, 2008
  • [2] Loizou, P. "Mimicking the human ear," IEEE Signal Processing Magazine, vol. 15, no. 5, pp. 101-130, 1998.

Add A Comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

Code in the picture:
Title:
Your Name(*):
Email:
Notify me of any further comments to this thread:
Website:
Comment(*):