Sound

Authors and titles for April 2021

Total of 229 entries : 1-50 51-100 101-150 151-200 ... 201-229

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2104.00355 [pdf, other]: Title: Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux

Comments: In Proceedings of Interspeech 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2104.00437 [pdf, other]: Title: Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

Comments: Accepted for publication to IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[3] arXiv:2104.00513 [pdf, other]: Title: Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-yi Lee, Lei Xie

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[4] arXiv:2104.00528 [pdf, other]: Title: OutlierNets: Highly Compact Deep Autoencoder Network Architectures for On-Device Acoustic Anomaly Detection

Saad Abbasi, Mahmoud Famouri, Mohammad Javad Shafiee, Alexander Wong

Comments: 7 pages

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[5] arXiv:2104.00705 [pdf, other]: Title: Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

Qing He, Zhiping Xiu, Thilo Koehler, Jilong Wu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2104.00732 [pdf, other]: Title: Out of a hundred trials, how many errors does your speaker verifier make?

Niko Brümmer, Luciana Ferrer, Albert Swart

Comments: Submitted to Interspeech 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[7] arXiv:2104.01027 [pdf, other]: Title: Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2104.01160 [pdf, other]: Title: PhyAug: Physics-Directed Data Augmentation for Deep Sensing Model Transfer in Cyber-Physical Systems

Wenjie Luo, Zhenyu Yan, Qun Song, Rui Tan

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[9] arXiv:2104.01161 [pdf, other]: Title: An Audio-Based Deep Learning Framework For BBC Television Programme Classification

Lam Pham, Chris Baume, Qiuqiang Kong, Tassadaq Hussain, Wenwu Wang, Mark Plumbley

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2104.01271 [pdf, other]: Title: PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Comments: Accepted to Interspeech 2021

Journal-ref: Proc. Interspeech 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[11] arXiv:2104.01304 [pdf, other]: Title: Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman

Comments: Under review for InterSpeech 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2104.01444 [pdf, other]: Title: Mixture of orthogonal sequences made from extended time-stretched pulses enables measurement of involuntary voice fundamental frequency response to pitch perturbation

Hideki Kawahara, Toshie Matsui, Kohei Yatabe, Ken-Ichi Sakakibara, Minoru Tsuzaki, Masanori Morise, Toshio Irino

Comments: 5 pages, 9 figures, submitted to Interspeech2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[13] arXiv:2104.01778 [pdf, other]: Title: AST: Audio Spectrogram Transformer

Yuan Gong, Yu-An Chung, James Glass

Comments: Accepted at Interspeech 2021. Code at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[14] arXiv:2104.01807 [pdf, other]: Title: StarGAN-based Emotional Voice Conversion for Japanese Phrases

Asuka Moritani, Ryo Ozaki, Shoki Sakamoto, Hirokazu Kameoka, Tadahiro Taniguchi

Comments: Submitted to Interspeech 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15] arXiv:2104.01978 [pdf, other]: Title: Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Haoqi Li, Yelin Kim, Cheng-Hao Kuo, Shrikanth Narayanan

Comments: paper accepted by INTERSPEECH2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[16] arXiv:2104.02005 [pdf, other]: Title: Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data

Tong Xia, Jing Han, Lorena Qendro, Ting Dang, Cecilia Mascolo

Comments: Accepted by INTERSPEECH 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2104.02109 [pdf, other]: Title: Streaming Multi-talker Speech Recognition with Joint Speaker Identification

Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

Comments: 5 pages, 2 figures, submitted to Interspeech 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[18] arXiv:2104.02207 [pdf, other]: Title: Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

Comments: Proc. of Interspeech 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19] arXiv:2104.02232 [pdf, other]: Title: Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

Comments: Submitted to Interspeech 2021 (under review)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2104.02306 [pdf, other]: Title: Binary Neural Network for Speaker Verification

Tinglong Zhu, Xiaoyi Qin, Ming Li

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2104.02309 [pdf, other]: Title: MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms

Kai Middlebrook, Shyam Sudhakaran, David Guy Brizan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:2104.02387 [pdf, other]: Title: Towards Consistent Hybrid HMM Acoustic Modeling

Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2104.02477 [pdf, other]: Title: COVID-19 Detection in Cough, Breath and Speech using Deep Transfer Learning and Bottleneck Features

Madhurananda Pahar, Marisa Klopper, Robin Warren, Thomas Niesler

Journal-ref: Computers in Biology and Medicine, 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2104.02535 [pdf, other]: Title: Optimal Transport-based Adaptation in Dysarthric Speech Tasks

Rosanna Turrisi, Leonardo Badino

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[25] arXiv:2104.02558 [pdf, other]: Title: Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[26] arXiv:2104.02868 [pdf, other]: Title: Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR

Xian Shi, Pan Zhou, Wei Chen, Lei Xie

Comments: Submitted to ASRU 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2104.03204 [pdf, other]: Title: Learning robust speech representation with an articulatory-regularized variational autoencoder

Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[28] arXiv:2104.03502 [pdf, other]: Title: Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Leonardo Pepino, Pablo Riera, Luciana Ferrer

Comments: 5 pages, 2 figures. Submitted to Interspeech 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29] arXiv:2104.03521 [pdf, other]: Title: Towards Multi-Scale Style Control for Expressive Speech Synthesis

Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng

Comments: 5 pages, 4 figures, submitted to INTERSPEECH 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[30] arXiv:2104.03538 [pdf, other]: Title: MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

Comments: Accepted by Interspeech 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[31] arXiv:2104.03587 [pdf, other]: Title: WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[32] arXiv:2104.03603 [pdf, other]: Title: AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

Comments: Accepted by Interspeech 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2104.03617 [pdf, html, other]: Title: Half-Truth: A Partially Fake Audio Detection Dataset

Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

Comments: accepted by Interspeech 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[34] arXiv:2104.03838 [pdf, other]: Title: Speech Denoising Without Clean Training Data: A Noise2Noise Approach

Madhav Mahesh Kashyap, Anuj Tambwekar, Krishnamoorthy Manohara, S Natarajan

Comments: Published in Interspeech 2021 ( See this https URL ). 5 pages, 2 figures, 1 table

Journal-ref: Proc. Interspeech 2021, 2716-2720

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2104.03876 [pdf, other]: Title: SerumRNN: Step by Step Audio VST Effect Programming

Christopher Mitcheltree, Hideki Koike

Comments: Audio samples of the system can be listened to at this http URL

Journal-ref: 10th International Conference on Artificial Intelligence in Music, Sound, Art, and Design (EvoMUSART 2021), Seville, Spain

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:2104.04050 [pdf, other]: Title: Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features

Mahsa Elyasi, Gaurav Bharaj

Comments: 5

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG)
[37] arXiv:2104.04111 [pdf, other]: Title: Generalized Spoofing Detection Inspired from Audio Generation Artifacts

Yang Gao, Tyler Vuong, Mahsa Elyasi, Gaurav Bharaj, Rita Singh

Comments: Camera ready version. Accepted by INTERSPEECH 2021

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[38] arXiv:2104.04143 [pdf, other]: Title: Heaps' Law and Vocabulary Richness in the History of Classical Music Harmony

Marc Serra-Peralta, Joan Serrà, Álvaro Corral

Comments: 12 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Physics and Society (physics.soc-ph)
[39] arXiv:2104.04325 [pdf, other]: Title: Joint Online Multichannel Acoustic Echo Cancellation, Speech Dereverberation and Source Separation

Yueyue Na, Ziteng Wang, Zhang Liu, Biao Tian, Qiang Fu

Comments: submitted to INTERSPEECH 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:2104.04598 [pdf, other]: Title: Cross-Modal learning for Audio-Visual Video Parsing

Jatin Lamba, Abhishek, Jayaprakash Akula, Rishabh Dabral, Preethi Jyothi, Ganesh Ramakrishnan

Comments: Work accepted at Interspeech 2021

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[41] arXiv:2104.04668 [pdf, other]: Title: Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN

Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

Comments: Submitted to INTERSPEECH 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2104.04702 [pdf, other]: Title: Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Leijing Hou, Shilei Zhang

Comments: 5 pages,4 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2104.05657 [pdf, other]: Title: End-to-End Mandarin Tone Classification with Short Term Context Information

Jiyang Tang, Ming Li

Comments: Accepted by APSIPA ASC 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44] arXiv:2104.05784 [pdf, other]: Title: Extremely Low Footprint End-to-End ASR System for Smart Device

Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin

Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:2104.06004 [pdf, other]: Title: Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion

Ziang Zhou, Yanze Xu, Ming Li

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[46] arXiv:2104.06074 [pdf, other]: Title: NoiseVC: Towards High Quality Zero-Shot Voice Conversion

Shijun Wang, Damian Borth

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2104.06162 [pdf, other]: Title: Visually Informed Binaural Audio Generation without Binaural Audios

Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin

Comments: Accepted by CVPR 2021. Code, models, and demo video are available on our webpage: \<this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[48] arXiv:2104.06517 [pdf, other]: Title: Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition

Eunjeong Koh, Shlomo Dubnov

Comments: AAAI Workshop on Affective Content Analysis 2021 Camera Ready Version

Journal-ref: AAAI 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[49] arXiv:2104.06607 [pdf, other]: Title: Revisiting the Onsets and Frames Model with Additive Attention

Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

Comments: Accepted in IJCNN 2021 Special Session S04. this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2104.06666 [pdf, other]: Title: End-to-end Keyword Spotting using Neural Architecture Search and Quantization

David Peter, Wolfgang Roth, Franz Pernkopf

Comments: arXiv admin note: text overlap with arXiv:2012.10138

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Total of 229 entries : 1-50 51-100 101-150 151-200 ... 201-229

Showing up to 50 entries per page: fewer | more | all