Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach
, , , , , , , , , , ,
[pdf download] [doi] [back to list] [halcy.de home]
Reference:
Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach (Miguel Angrick, Maarten Ottenhoff, Lorenz Diener, Darius Ivucic, Gabriel Ivucic, Sophocles Goulis, Albert J. Colon, Louis Wagner, Dean J. Krusienski, Pieter L. Kubben, Tanja Schultz, Christian Herff), at ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022
Bibtex Entry:
@inproceedings{angrick2022towards,
  title        = {Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection
    Approach},
  author       = {Angrick, Miguel and Ottenhoff, Maarten and Diener, Lorenz and Ivucic, Darius and
    Ivucic, Gabriel and Goulis, Sophocles and Colon, Albert J. and Wagner, Louis and Krusienski,
    Dean J. and Kubben, Pieter L. and Schultz, Tanja and Herff, Christian},
  year         = 2022,
  month        = may,
  booktitle    = {{ICASSP} 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal
    Processing},
  pages        = {1296--1300},
  doi          = {10.1109/ICASSP43922.2022.9747300},
  issn         = {2379-190X},
  abstract     = {Neurological disorders can severely impact speech communication. Recently, neural
    speech prostheses have been proposed that reconstruct intelligible speech from neural signals
    recorded superficially on the cortex. Thus far, it has been unclear whether similar
    reconstruction is feasible from deeper brain structures, and whether audible speech can be
    directly synthesized from these reconstructions with low-latency, as required for a practical
    speech neuroprosthetic. The present study aims to address both challenges. First, we implement a
    low-latency unit selection based synthesizer that converts neural signals into audible speech.
    Second, we evaluate our approach on open-loop recordings from 5 patients implanted with
    stereotactic depth electrodes who conducted a read-aloud task of Dutch utterances. We achieve
    correlation coefficients significantly higher than chance level of up to 0.6 and an average
    computational cost of 6.6 ms for each 10 ms frames. While the current reconstructed utterances
    are not intelligible, our results indicate promising decoding and run-time capabilities that are
    suitable for investigations of speech processes in closed-loop experiments.},
  url          = {https://halcy.de/cites/pdf/angrick2022towards.pdf},
}