Publications
2021
- SUPERB: Speech processing Universal PERformance BenchmarkarXiv preprint arXiv:2105.01051 2021
- Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker RecordingsIn IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
- Recent developments on espnet toolkit boosted by conformerIn ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language DocumentationIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
- ESPnet-SE: end-to-end speech enhancement and separation toolkit designed for ASR integrationIn 2021 IEEE Spoken Language Technology Workshop (SLT) 2021
- Investigation of end-to-end speaker-attributed ASR for continuous multi-talker recordingsIn IEEE Spoken Language Technology Workshop (SLT), 2021
2020
- The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plansarXiv preprint arXiv:2012.13006 2020
- Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture SignalsAdvances in Neural Information Processing Systems 2020
- End-to-End Far-Field Speech Recognition with Unified Dereverberation and BeamformingProc. Interspeech 2020 2020
- End-to-end multi-speaker speech recognition with transformerIn IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
- CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordingsarXiv preprint arXiv:2004.09249 2020
- Improving end-to-end single-channel multi-talker speech recognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020
- End-to-End ASR with Adaptive Span Self-AttentionProc. Interspeech 2020 2020
2019
- MIMO-Speech: End-to-end multi-channel multi-speaker speech recognitionIn IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019
- End-to-end monaural multi-speaker ASR system without pretrainingIn IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
- Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System$Conference of the International Speech Communication Association (InterSpeech), 2019
2018
- Single-channel multi-talker speech recognition with permutation invariant trainingSpeech Communication 2018
- Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional NetworksConference of the International Speech Communication Association (InterSpeech), 2018
- Adaptive permutation invariant training with auxiliary information for monaural multi-talker speech recognitionIn IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
- Past review, current progress, and challenges ahead on the cocktail party problemFrontiers of Information Technology & Electronic Engineering 2018
2017
- Recognizing Multi-Talker Speech with Permutation Invariant TrainingConference of the International Speech Communication Association (InterSpeech), 2017
2016
- Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.In Conference of the International Speech Communication Association (InterSpeech), 2016