Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Schuller Trancoso Björn W., Tanja Schultz
Reference:
Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals (Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Schuller Trancoso Björn W., Tanja Schultz), at INTERSPEECH 2020 - 21st Annual Conference of the International Speech Communication Association, September 2020
Bibtex Entry:
@inproceedings{diener2020towards,
title = {Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from
Electromyographic Signals},
author = {Diener, Lorenz and Amiriparian, Shahin and Botelho, Catarina and Scheck, Kevin and
Küster, Dennis and Trancoso, Isabel Schuller, Björn W. and Schultz, Tanja},
year = 2020,
month = sep,
booktitle = {{INTERSPEECH} 2020 - 21st Annual Conference of the International Speech
Communication Association},
video = {https://www.youtube.com/watch?v=sy7MeEmEusY},
doi = {10.21437/interspeech.2020-2848},
abstract = {Silent Computational Paralinguistics (SCP) - the assessment of speaker states and
traits from non-audibly spoken communication - has rarely been targeted in the rich body of
either Computational Paralinguistics or Silent Speech Processing. Here, we provide first steps
towards this challenging but potentially highly rewarding endeavour: Paralinguistics can enrich
spoken language interfaces, while Silent Speech Processing enables confidential and unobtrusive
spoken communication for everybody, including mute speakers. We approach SCP by using
speech-related biosignals stemming from facial muscle activities captured by surface
electromyography (EMG). To demonstrate the feasibility of SCP, we select one speaker trait
(speaker identity) and one speaker state (speaking mode). We introduce two promising strategies
for SCP: (1) deriving paralinguistic speaker information directly from EMG of silently produced
speech versus (2) first converting EMG into an audible speech signal followed by conventional
computational paralinguistic methods. We compare traditional feature extraction and decision
making approaches to more recent deep representation and transfer learning by convolutional and
recurrent neural networks, using openly available EMG data. We find that paralinguistics can be
assessed not only from acoustic speech but also from silent speech captured by EMG.},
url = {https://halcy.de/cites/pdf/diener2020towards.pdf},
}