Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech

Catarina Botelho; Lorenz Diener; Dennis Küster; Kevin Scheck; Shahin Amiriparian; Björn W. Schuller; Tanja Schultz; Alberto Abad; Isabel Trancoso

doi:10.21437/Interspeech.2020-2926

Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz, Alberto Abad, Isabel Trancoso

[pdf download] [doi] [back to list] [halcy.de home]

Reference:

Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech (Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz, Alberto Abad, Isabel Trancoso), at INTERSPEECH 2020 - 21st Annual Conference of the International Speech Communication Association, September 2020

Bibtex Entry:

@inproceedings{botelho2020silent,
  title        = {Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle
    Activity from Speech},
  author       = {Botelho, Catarina and Diener, Lorenz and Küster, Dennis and Scheck, Kevin and
    Amiriparian, Shahin and Schuller, Björn W. and Schultz, Tanja and Abad, Alberto and Trancoso,
    Isabel},
  year         = 2020,
  month        = sep,
  booktitle    = {{INTERSPEECH} 2020 - 21st Annual Conference of the International Speech
    Communication Association},
  doi          = {10.21437/Interspeech.2020-2926},
  abstract     = {Electromyographic (EMG) signals recorded during speech production encode
    information on articulatory muscle activity and also on the facial expression of emotion, thus
    representing a speech-related biosignal with strong potential for paralinguistic applications.
    In this work, we estimate the electrical activity of the muscles responsible for speech
    articulation directly from the speech signal. To this end, we first perform a neural conversion
    of speech features into electromyographic time domain features, and then attempt to retrieve the
    original EMG signal from the time domain features. We propose a feed forward neural network to
    address the first step of the problem (speech features to EMG features) and a neural network
    composed of a convolutional block and a bidirectional long short-term memory block to address
    the second problem (true EMG features to EMG signal). We observe that four out of the five
    originally proposed time domain features can be estimated reasonably well from the speech
    signal. Further, the five time domain features are able to predict the original speech-related
    EMG signal with a concordance correlation coefficient of 0.663. We further compare our results
    with the ones achieved on the inverse problem of generating acoustic speech features from EMG
    features.},
  url          = {https://halcy.de/cites//pdf/botelho2020silent.pdf},
}