Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals
, , , , , ,
[pdf download] [doi] [video] [back to list] [halcy.de home]
Reference:
Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals (Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Schuller Trancoso Björn W., Tanja Schultz), at INTERSPEECH 2020 - 21st Annual Conference of the International Speech Communication Association, September 2020
Bibtex Entry:
@inproceedings{diener2020towards,
  title        = {Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from
    Electromyographic Signals},
  author       = {Diener, Lorenz and Amiriparian, Shahin and Botelho, Catarina and Scheck, Kevin and
    Küster, Dennis and Trancoso, Isabel Schuller, Björn W. and Schultz, Tanja},
  year         = 2020,
  month        = sep,
  booktitle    = {{INTERSPEECH} 2020 - 21st Annual Conference of the International Speech
    Communication Association},
  video        = {https://www.youtube.com/watch?v=sy7MeEmEusY},
  doi          = {10.21437/interspeech.2020-2848},
  abstract     = {Silent Computational Paralinguistics (SCP) - the assessment of speaker states and
    traits from non-audibly spoken communication - has rarely been targeted in the rich body of
    either Computational Paralinguistics or Silent Speech Processing. Here, we provide first steps
    towards this challenging but potentially highly rewarding endeavour: Paralinguistics can enrich
    spoken language interfaces, while Silent Speech Processing enables confidential and unobtrusive
    spoken communication for everybody, including mute speakers. We approach SCP by using
    speech-related biosignals stemming from facial muscle activities captured by surface
    electromyography (EMG). To demonstrate the feasibility of SCP, we select one speaker trait
    (speaker identity) and one speaker state (speaking mode). We introduce two promising strategies
    for SCP: (1) deriving paralinguistic speaker information directly from EMG of silently produced
    speech versus (2) first converting EMG into an audible speech signal followed by conventional
    computational paralinguistic methods. We compare traditional feature extraction and decision
    making approaches to more recent deep representation and transfer learning by convolutional and
    recurrent neural networks, using openly available EMG data. We find that paralinguistics can be
    assessed not only from acoustic speech but also from silent speech captured by EMG.},
  url          = {https://halcy.de/cites/pdf/diener2020towards.pdf},
}