IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

blog_0623_header_general.jpg

Sep

30

Special Issue Deadlines

IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation

Manuscript Due: 30 September 2024
Publication Date: May 2025

Scope

Voice is the most commonly used modality by humans to communicate and psychologically blend into society. Recent technological advances have triggered the development of various voice-related applications in the information and communications technology market. However, noise, reverberation, and interfering speech are detrimental for effective communications between humans and other humans or machines, leading to performance degradation of associated voice-enabled services. To address the formidable speech-in-noise challenge, a range of speech enhancement (SE) and speech separation (SS) techniques are normally employed as important front-end speech processing units to handle distortions in input signals in order to provide more intelligible speech for automatic speech recognition (ASR), synthesis and dialogue systems. Emerging advances in artificial intelligence (AI) and machine learning, particularly deep neural networks, have led to remarkable improvements in SE and SS based solutions. A growing number of researchers have explored various extensions of these methods by utilising a variety of modalities as auxiliary inputs to the main speech processing task to access additional information from heterogeneous signals. In particular, multi-modal SE and SS systems have been shown to deliver enhanced performance in challenging noisy environments by augmenting the conventional speech modality with complementary information from multi-sensory inputs, such as video, noise type, signal-to-noise ratio (SNR), bone-conducted speech (vibrations), speaker, text information, electromyography, and electromagnetic midsagittal articulometer (EMMA) data. Various integration schemes, including early and late fusions, cross-attention mechanisms, and self-supervised learning algorithms, have also been successfully explored.

Topics

This timely special issue aims to collate latest advances in multi-modal SE and SS systems that exploit both conventional and unconventional modalities to further improve state-of-the-art performance in benchmark problems. We particularly welcome submissions for novel deep neural network based algorithms and architectures, including new feature processing methods for multimodal and cross-modal speech processing. We also encourage submissions that address practical issues related to multimodal data recording, energy-efficient system design and real-time low-latency solutions, such as for assistive hearing and speech communication applications.

Special Issue research topics of interest relate to open problems needing addressed These include, but are not limited to, the following.

Novel acoustic features and architectures for multi-modal SE (MM-SE) and multi-modal SS (MM-SS) solutions.
The integration of multiple data acquisition devices for multimodal learning and novel learning algorithms robust to imperfect data.
Few-shot/zero-shot learning and adaptation algorithms for MM-SE and MM-SS systems with a small amount of training and adaptation data.
Self-supervised and unsupervised learning techniques for MM-SE and MM-SS systems.
Adversarial learning for MM-SE and MM-SS.
Large language model-based Generative approaches for MM-SE and MM-SS
Low-delay, low-power, low-complexity MM-SE and MM-SS models
Approaches that effectively reduce model size and inference cost without reducing the speech quality and intelligibility of processed signals.
Novel objective functions including psychoacoustics and perceptually motivated loss functions for MM-SE and MM-ES
Holistic evaluation metrics for MM-SE and MM-SS systems.
Real-world applications and use-cases of MM-SE and MM-SS, including human-human and human-machine communications
Challenges and solutions in the integration of MM-SE and MM-SS into existing systems

We encourage submissions that not only propose novel approaches but also substantiate the findings with rigorous evaluations, including real-world datasets. Studies that provide insights into the challenges involved and the impact of MM-SE and MM-SS on end-users are particularly welcome.

Submission Guidelines

Manuscripts should be original and should not have been previously published or currently under consideration for publication elsewhere. All submissions will be peer-reviewed according to the IEEE Signal Processing Society review process. Authors should prepare their manuscripts according to the Instructions for Authors available from the Signal Processing Society website.

Follow the instructions given on the IEEE JSTSP webpage: and submit manuscripts.

Important Dates

Manuscript Submission Deadline: 30 September 2024
First Review Due: 15 December 2024
Revised Manuscript Due: 15 January 2024
Second Review Due: 15 February 2024
Final Decision: 28 February 2025

Guest Editors

For further information, please contact the guest editors at:

Amir Hussain, Edinburgh Napier University, UK (Lead GE)
Yu Tsao, Academia Sinica, Taiwan (co-Lead GE)
John H.L. Hansen, University of Texas at Dallas, USA
Naomi Harte, Trinity College Dublin, Ireland
Shinji Watanabe, Carnegie Mellon University, USA
Isabel Trancoso, Instituto Superior Técnico, IST, Univ. Lisbon, Portugal
Shixiong Zhang, Tencent AI Lab, USA

Tags:

JSTSP Special Issue

JSTSP Call for Papers

SPS on Twitter

DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting… https://t.co/NLH2u19a3y
ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in… https://t.co/V6Z3wKGK1O
The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat… https://t.co/0aYPMDSWDj
CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special… https://t.co/NPCGrSjQbh
Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:… https://t.co/4xal7voFER

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2024 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

ISBI_2025.jpg

(ISBI 2025) 2025 IEEE International Symposium on Biomedical Imaging

Farhan_Baqai.jpg

Distinguished Lecture: Prof. Farhan Baqai (Apple, USA)

Farhan_Baqai.jpg

Distinguished Lecture: Prof. Farhan Baqai (Apple, USA)

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation

Publications & Resources

For Authors

mentor_help_general_3.jpg

sergio_course_header.jpg

YuandZhangBlogImage_general.jpg

Top Reasons to Join SPS Today!

blog_0623_header_general.jpg

Sep

30

IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation

Scope

Topics

Submission Guidelines

Important Dates

Guest Editors

SPS on Twitter

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Sep

30

Scope

Topics

Submission Guidelines

Important Dates

Guest Editors

SPS on Twitter

IEEE SPS Educational Resources