Skip to main content

Showing 1–50 of 83 results for author: Mao, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.08400  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks

    Authors: Luel Hagos Beyene, Vivek Verma, Min Ma, Jesujoba O. Alabi, Fabian David Schmidt, Joyce Nakatumba-Nabende, David Ifeoluwa Adelani

    Abstract: Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the pe… ▽ More

    Submitted 24 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: working paper

  2. arXiv:2504.20944  [pdf, other

    cs.LG eess.SP

    Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements

    Authors: Kleanthis Avramidis, Woojae Jeong, Aditya Kommineni, Sudarsana R. Kadiri, Marcus Ma, Colin McDaniel, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Dani Byrd, Assal Habibi, B. Rael Cahn, Idan A. Blank, Kristina Lerman, Takfarinas Medani, Richard M. Leahy, Shrikanth Narayanan

    Abstract: Identifying physiological and behavioral markers for mental health conditions is a longstanding challenge in psychiatry. Depression and suicidal ideation, in particular, lack objective biomarkers, with screening and diagnosis primarily relying on self-reports and clinical interviews. Here, we investigate eye tracking as a potential marker modality for screening purposes. Eye movements are directly… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Preprint. 12 pages, 5 figures

  3. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  4. arXiv:2503.09489  [pdf, ps, other

    cs.IT eess.SP

    Optimal ISAC Beamforming Structure and Efficient Algorithms for Sum Rate and CRLB Balancing

    Authors: Tianyu Fang, Mengyuan Ma, Markku Juntti, Nir Shlezinger, A. Lee Swindlehurst, Nhan Thanh Nguyen

    Abstract: Integrated sensing and communications (ISAC) has emerged as a promising paradigm to unify wireless communications and radar sensing, enabling efficient spectrum and hardware utilization. A core challenge with realizing the gains of ISAC stems from the unique challenges of dual purpose beamforming design due to the highly non-convex nature of key performance metrics such as sum rate for communicati… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: journal version of our previous work, submitted for possible publication

  5. Lifted Frequency-Domain Identification of Closed-Loop Multirate Systems: Applied to Dual-Stage Actuator Hard Disk Drives

    Authors: Max van Haren, Masahiro Mae, Lennart Blanken, Tom Oomen

    Abstract: Frequency-domain representations are crucial for the design and performance evaluation of controllers in multirate systems, specifically to address intersample performance. The aim of this paper is to develop an effective frequency-domain system identification technique for closed-loop multirate systems using solely slow-rate output measurements. By indirect identification of multivariable time-in… ▽ More

    Submitted 9 April, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Journal ref: Mechatronics, 108:103311 (2025)

  6. arXiv:2502.07243  [pdf, other

    cs.SD cs.AI eess.AS

    Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

    Authors: Xueyao Zhang, Xiaohui Zhang, Kainan Peng, Zhenyu Tang, Vimal Manohar, Yingru Liu, Jeff Hwang, Dangna Li, Yuhao Wang, Julian Chan, Yuan Huang, Zhizheng Wu, Mingbo Ma

    Abstract: The imitation of voice, targeted on specific speech attributes such as timbre and speaking style, is crucial in speech generation. However, existing methods rely heavily on annotated data, and struggle with effectively disentangling timbre and style, leading to challenges in achieving controllable generation, especially in zero-shot scenarios. To address these issues, we propose Vevo, a versatile… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025

  7. arXiv:2501.13130  [pdf, other

    eess.IV

    A Novel Scene Coupling Semantic Mask Network for Remote Sensing Image Segmentation

    Authors: Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Renxiang Guan, Tingfeng Hong, Mengjiao Zhao, Mengting Ma, Jiangtao Nie, Zhenhong Du, Siyang Song, Wei Zhang

    Abstract: As a common method in the field of computer vision, spatial attention mechanism has been widely used in semantic segmentation of remote sensing images due to its outstanding long-range dependency modeling capability. However, remote sensing images are usually characterized by complex backgrounds and large intra-class variance that would degrade their analysis performance. While vanilla spatial att… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

  8. arXiv:2412.13967  [pdf, ps, other

    eess.SP

    THz Channels for Short-Range Mobile Networks: Multipath Clusters and Human Body Shadowing

    Authors: Minseok Kim, Jun-ichi Takada, Minghe Mao, Che Chia Kang, Xin Du, Anirban Ghosh

    Abstract: The THz band (0.1-10 THz) is emerging as a crucial enabler for sixth-generation (6G) mobile communication systems, overcoming the limitations of current technologies and unlocking new opportunities for low-latency and ultra-high-speed communications by utilizing several tens of GHz transmission bandwidths. However, extremely high spreading losses and other interaction losses pose significant chall… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  9. arXiv:2412.13365  [pdf, other

    cs.AI cs.HC eess.SY

    Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction

    Authors: Shuyang Dong, Meiyi Ma, Josephine Lamp, Sebastian Elbaum, Matthew B. Dwyer, Lu Feng

    Abstract: There is a growing trend toward AI systems interacting with humans to revolutionize a range of application domains such as healthcare and transportation. However, unsafe human-machine interaction can lead to catastrophic failures. We propose a novel approach that predicts future states by accounting for the uncertainty of human interaction, monitors whether predictions satisfy or violate safety re… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  10. arXiv:2410.17709  [pdf, other

    eess.SY cs.DC

    Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure

    Authors: Chaoyun Zhang, Randolph Yao, Si Qin, Ze Li, Shekhar Agrawal, Binit R. Mishra, Tri Tran, Minghua Ma, Qingwei Lin, Murali Chintalapati, Dongmei Zhang

    Abstract: The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  11. arXiv:2409.17644  [pdf, ps, other

    eess.SP

    Model-Based Machine Learning for Max-Min Fairness Beamforming Design in JCAS Systems

    Authors: Mengyuan Ma, Tianyu Fang, Nir Shlezinger, A. L. Swindlehurst, Markku Juntti, Nhan Nguyen

    Abstract: Joint communications and sensing (JCAS) is expected to be a crucial technology for future wireless systems. This paper investigates beamforming design for a multi-user multi-target JCAS system to ensure fairness and balance between communications and sensing performance. We jointly optimize the transmit and receive beamformers to maximize the weighted sum of the minimum communications rate and sen… ▽ More

    Submitted 26 November, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: 5 pages, 5 figures

  12. arXiv:2409.17638  [pdf, ps, other

    eess.SP

    Digital and Hybrid Precoding Designs in Massive MIMO with Low-Resolution ADCs

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, A. Lee Swindlehurst, Markku Juntti

    Abstract: Low-resolution analog-to-digital converters (ADCs) have emerged as an efficient solution for massive multiple-input multiple-output (MIMO) systems to reap high data rates with reasonable power consumption and hardware complexity. In this paper, we study precoding designs for digital, fully connected (FC) hybrid, and partially connected (PC) hybrid beamforming architectures in massive MIMO systems… ▽ More

    Submitted 11 February, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: 5 pages, 7 figures

  13. arXiv:2409.04447  [pdf, other

    cs.SD cs.AI eess.AS

    Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples

    Authors: Qi Fan, Yutong Li, Yi Xin, Xinyu Cheng, Guanglai Gao, Miao Ma

    Abstract: The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly… ▽ More

    Submitted 23 August, 2024; originally announced September 2024.

    Comments: Accepted by ACM MM Workshop 2024

  14. arXiv:2408.12239  [pdf, other

    eess.SP

    Fast Burst-Sparsity Learning Approach for Massive MIMO-OTFS Channel Estimation

    Authors: Ming Ma, Jisheng Dai, Xue-Qin Jiang

    Abstract: Accurate channel estimation in orthogonal time frequency space (OTFS) systems with massive multiple-input multiple-output (MIMO) configurations is challenging due to high-dimensional sparse representation (SR). Existing methods often face performance degradation and/or high computational complexity. To address these issues and exploit intricate channel sparsity structure, this letter first leverag… ▽ More

    Submitted 27 January, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  15. arXiv:2408.11837  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

    Authors: Hanchen David Wang, Nibraas Khan, Anna Chen, Nilanjan Sarkar, Pamela Wisniewski, Meiyi Ma

    Abstract: Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE/ACM CHASE 2024

  16. arXiv:2408.06227  [pdf

    cs.CL cs.AI cs.SD eess.AS

    FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks

    Authors: Min Ma, Yuma Koizumi, Shigeki Karita, Heiga Zen, Jason Riesa, Haruko Ishikawa, Michiel Bacchiani

    Abstract: This paper introduces FLEURS-R, a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) corpus. FLEURS-R maintains an N-way parallel speech corpus in 102 languages as FLEURS, with improved audio quality and fidelity by applying the speech restoration model Miipher. The aim of FLEURS-R is to advance speech technology in more languages… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Journal ref: INTERSPEECH 2024

  17. Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection

    Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Minglei Ma, Yingen Yang

    Abstract: The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  18. arXiv:2407.04408  [pdf, ps, other

    eess.SP

    Hybrid Receiver Design for Massive MIMO-OFDM with Low-Resolution ADCs and Oversampling

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

    Abstract: Low-resolution analog-to-digital converters (ADCs) and hybrid beamforming have emerged as efficient solutions to reduce power consumption with satisfactory spectral efficiency (SE) in massive multiple-input multiple-output (MIMO) systems. In this paper, we investigate the performance of a hybrid receiver in massive MIMO orthogonal frequency-division multiplexing (OFDM) uplink systems with low-reso… ▽ More

    Submitted 21 January, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures, to be appeared in WCNC2025

  19. arXiv:2407.03796  [pdf, ps, other

    eess.SP

    Joint Beamforming Design and Bit Allocation in Massive MIMO with Resolution-Adaptive ADCs

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

    Abstract: Low-resolution analog-to-digital converters (ADCs) have emerged as a promising technology for reducing power consumption and complexity in massive multiple-input multiple-output (MIMO) systems while maintaining satisfactory spectral and energy efficiencies (SE/EE). In this work, we first identify the essential properties of optimal quantization and leverage them to derive a closed-form approximati… ▽ More

    Submitted 5 May, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 15 pages, 13 figures

  20. GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

    Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma

    Abstract: Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scal… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  21. arXiv:2405.18739  [pdf, other

    cs.NI eess.SP

    FlocOff: Data Heterogeneity Resilient Federated Learning with Communication-Efficient Edge Offloading

    Authors: Mulei Ma, Chenyu Gong, Liekang Zeng, Yang Yang, Liantao Wu

    Abstract: Federated Learning (FL) has emerged as a fundamental learning paradigm to harness massive data scattered at geo-distributed edge devices in a privacy-preserving way. Given the heterogeneous deployment of edge devices, however, their data are usually Non-IID, introducing significant challenges to FL including degraded training accuracy, intensive communication costs, and high computing complexity.… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  22. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  23. arXiv:2404.04904  [pdf, other

    cs.SD cs.AI eess.AS

    Cross-Domain Audio Deepfake Detection: Dataset and Analysis

    Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

    Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More

    Submitted 20 September, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  24. arXiv:2403.02039  [pdf, other

    eess.SY

    A Frequency-Domain Approach for Enhanced Performance and Task Flexibility in Finite-Time ILC

    Authors: Max van Haren, Kentaro Tsurumoto, Masahiro Mae, Lennart Blanken, Wataru Ohnishi, Tom Oomen

    Abstract: Iterative learning control (ILC) is capable of improving the tracking performance of repetitive control systems by utilizing data from past iterations. The aim of this paper is to achieve both task flexibility, which is often achieved by ILC with basis functions, and the performance of frequency-domain ILC, with an intuitive design procedure. The cost function of norm-optimal ILC is determined tha… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  25. arXiv:2402.12482  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech

    Authors: Adam Sabra, Cyprian Wronka, Michelle Mao, Samer Hijazi

    Abstract: As more speech technologies rely on a supervised deep learning approach with clean speech as the ground truth, a methodology to onboard said speech at scale is needed. However, this approach needs to minimize the dependency on human listening and annotation, only requiring a human-in-the-loop when needed. In this paper, we address this issue by outlining Speech Enhancement-based Curation Pipeline… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  26. arXiv:2311.07613   

    eess.SY cs.LG math.DS

    A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

    Authors: Mason Ma, Jiajie Wu, Chase Post, Tony Shi, Jingang Yi, Tony Schmitz, Hong Wang

    Abstract: This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-info… ▽ More

    Submitted 22 March, 2025; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: We completely redesigned and rewrote this paper. It will be a completely different paper with different title, author list, and content

  27. arXiv:2311.00332  [pdf, other

    q-bio.TO cs.CV eess.IV

    SDF4CHD: Generative Modeling of Cardiac Anatomies with Congenital Heart Defects

    Authors: Fanwei Kong, Sascha Stocker, Perry S. Choi, Michael Ma, Daniel B. Ennis, Alison Marsden

    Abstract: Congenital heart disease (CHD) encompasses a spectrum of cardiovascular structural abnormalities, often requiring customized treatment plans for individual patients. Computational modeling and analysis of these unique cardiac anatomies can improve diagnosis and treatment planning and may ultimately lead to improved outcomes. Deep learning (DL) methods have demonstrated the potential to enable effi… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  28. arXiv:2310.15407  [pdf, ps, other

    eess.SY eess.SP

    Finite-Time Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems

    Authors: Ju Wu, Tong Wang, Min Ma

    Abstract: This paper investigates the finite-time adaptive fuzzy tracking control problem for a class of pure-feedback system with full-state constraints. With the help of Mean-Value Theorem, the pure-feedback nonlinear system is transformed into strict-feedback case. By employing finite-time-stable like function and state transformation for output tracking error, the output tracking error converges to a pr… ▽ More

    Submitted 28 December, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: typos checked and corrected in 'Introduction'

  29. arXiv:2310.08804  [pdf, other

    eess.SP

    Spiking Semantic Communication for Feature Transmission with HARQ

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  30. arXiv:2309.10567  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multimodal Modeling For Spoken Language Identification

    Authors: Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

    Abstract: Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  31. arXiv:2308.01317  [pdf

    cs.CV eess.IV

    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

    Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

    Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  32. arXiv:2308.00393  [pdf, other

    cs.LG eess.SP

    A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

    Authors: Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei

    Abstract: Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  33. arXiv:2307.10982  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    MASR: Multi-label Aware Speech Representation

    Authors: Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

    Abstract: In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables th… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted at ASRU 2023

  34. arXiv:2307.04327  [pdf

    cs.RO eess.SY

    Legal Decision-making for Highway Automated Driving

    Authors: Xiaohan Ma, Wenhao Yu, Chengxiang Zhao, Changjun Wang, Wenhui Zhou, Guangming Zhao, Mingyue Ma, Weida Wang, Lin Yang, Rui Mu, Hong Wang, Jun Li

    Abstract: Compliance with traffic laws is a fundamental requirement for human drivers on the road, and autonomous vehicles must adhere to traffic laws as well. However, current autonomous vehicles prioritize safety and collision avoidance primarily in their decision-making and planning, which will lead to misunderstandings and distrust from human drivers and may even result in accidents in mixed traffic flo… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 14 pages, 17 figures

  35. Analysis of Oversampling in Uplink Massive MIMO-OFDM with Low-Resolution ADCs

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

    Abstract: Low-resolution analog-to-digital converters (ADCs) have emerged as an efficient solution for massive multiple-input multiple-output (MIMO) systems to reap high data rates with reasonable power consumption and hardware complexity. In this paper, we analyze the performance of oversampling in uplink massive MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) systems with low-resolution ADCs.… ▽ More

    Submitted 9 November, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Appeared in IEEE SPAWC2023. This version corrects some symbol typos

    Journal ref: 2023 IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

  36. arXiv:2306.10232   

    cs.NI eess.SP

    Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing

    Authors: Mulei Ma

    Abstract: In the rapidly evolving field of Heterogeneous Multi-access Edge Computing (HMEC), efficient task offloading plays a pivotal role in optimizing system throughput and resource utilization. However, existing task offloading methods often fall short of adequately modeling the dependency topology relationships between offloaded tasks, which limits their effectiveness in capturing the complex interdepe… ▽ More

    Submitted 30 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Insufficient completion, there are some errors in the current version

  37. arXiv:2306.04374  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Label Aware Speech Representation Learning For Language Identification

    Authors: Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

    Abstract: Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-train… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech 2023

  38. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  39. Beam Squint Analysis and Mitigation via Hybrid Beamforming Design in THz Communications

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: We investigate the beam squint effect in uniform planar arrays (UPAs) and propose an efficient hybrid beamforming (HBF) design to mitigate the beam squint in multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) systems operating at terahertz band. We first analyze the array gain and derive the closed-form beam squint ratio that characterizes the severity of the bea… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 6 pages, 7 figures, to be appeared in IEEE ICC2023

  40. arXiv:2303.03470  [pdf, other

    cs.CR eess.SY

    What Would Trojans Do? Exploiting Partial-Information Vulnerabilities in Autonomous Vehicle Sensing

    Authors: R. Spencer Hallyburton, Qingzhao Zhang, Z. Morley Mao, Michael Reiter, Miroslav Pajic

    Abstract: Safety-critical sensors in autonomous vehicles (AVs) form an essential part of the vehicle's trusted computing base (TCB), yet they are highly susceptible to attacks. Alarmingly, Tier 1 manufacturers have already exposed vulnerabilities to attacks introducing Trojans that can stealthily alter sensor outputs. We analyze the feasible capability and safety-critical outcomes of an attack on sensing at… ▽ More

    Submitted 13 March, 2025; v1 submitted 6 March, 2023; originally announced March 2023.

  41. arXiv:2303.01723  [pdf, other

    cs.IT cs.AI eess.SP

    AI-Empowered Hybrid MIMO Beamforming

    Authors: Nir Shlezinger, Mengyuan Ma, Ortal Lavi, Nhan Thanh Nguyen, Yonina C. Eldar, Markku Juntti

    Abstract: Hybrid multiple-input multiple-output (MIMO) is an attractive technology for realizing extreme massive MIMO systems envisioned for future wireless communications in a scalable and power-efficient manner. However, the fact that hybrid MIMO systems implement part of their beamforming in analog and part in digital makes the optimization of their beampattern notably more challenging compared with conv… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  42. arXiv:2302.12041  [pdf, other

    cs.IT eess.SP

    Deep Unfolding Hybrid Beamforming Designs for THz Massive MIMO Systems

    Authors: Nhan Thanh Nguyen, Mengyuan Ma, Nir Shlezinger, Yonina C. Eldar, A. L. Swindlehurst, Markku Juntti

    Abstract: Hybrid beamforming (HBF) is a key enabler for wideband terahertz (THz) massive multiple-input multiple-output (mMIMO) communications systems. A core challenge with designing HBF systems stems from the fact their application often involves a non-convex, highly complex optimization of large dimensions. In this paper, we propose HBF schemes that leverage data to enable efficient designs for both the… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: This paper has been submitted to IEEE Transaction on Signal Processing

  43. arXiv:2212.05751  [pdf, other

    eess.AS

    Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

    Authors: Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang

    Abstract: The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we pr… ▽ More

    Submitted 10 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by INTERSPEECH 2023

  44. arXiv:2210.06890  [pdf, ps, other

    eess.SP

    Switch-based Hybrid Beamforming Transceiver Design for Wideband Communications with Beam Squint

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: Hybrid beamforming (HBF) transceiver architectures based on frequency-independent phase shifters (PS-HBF) are sensitive to the phases and physical directions with limited capability to compensate for the detrimental effects of the beam squint. Motivated by the fact that switches are phase-independent and more power/cost efficient than PSs, we consider the switch-based HBF (SW-HBF) for wideband lar… ▽ More

    Submitted 26 September, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 16 pages, 20 figures

  45. arXiv:2210.06836  [pdf, other

    eess.SP

    SNN-SC: A Spiking Semantic Communication Framework for Collaborative Intelligence

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: Collaborative Intelligence (CI) has emerged as a promising framework for deploying Artificial Intelligence (AI) models on resource-constrained edge devices. In CI, the AI model is partitioned between the edge device and the cloud, with intermediate features transmitted from the edge sub-model to the cloud sub-model to complete the inference task. However, reducing feature transmission overhead whi… ▽ More

    Submitted 22 November, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted for publication in the IEEE Transactions on Vehicular Technology

  46. arXiv:2210.06747  [pdf, other

    eess.IV cs.CV

    DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation

    Authors: Lizhi Bai, Jun Yang, Chunqi Tian, Yaoru Sun, Maoyu Mao, Yanjun Xu, Weirong Xu

    Abstract: Combining RGB images and the corresponding depth maps in semantic segmentation proves the effectiveness in the past few years. Existing RGB-D modal fusion methods either lack the non-linear feature fusion ability or treat both modal images equally, regardless of the intrinsic distribution gap or information loss. Here we find that depth maps are suitable to provide intrinsic fine-grained patterns… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  47. arXiv:2208.02792  [pdf

    cs.RO eess.SY

    A Cooperative Perception Environment for Traffic Operations and Control

    Authors: Hanlin Chen, Brian Liu, Xumiao Zhang, Feng Qian, Z. Morley Mao, Yiheng Feng

    Abstract: Existing data collection methods for traffic operations and control usually rely on infrastructure-based loop detectors or probe vehicle trajectories. Connected and automated vehicles (CAVs) not only can report data about themselves but also can provide the status of all detected surrounding vehicles. Integration of perception data from multiple CAVs as well as infrastructure sensors (e.g., LiDAR)… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  48. Constellation Design for Deep Joint Source-Channel Coding

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: Deep learning-based joint source-channel coding (JSCC) has shown excellent performance in image and feature transmission. However, the output values of the JSCC encoder are continuous, which makes the constellation of modulation complex and dense. It is hard and expensive to design radio frequency chains for transmitting such full-resolution constellation points. In this paper, two methods of mapp… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  49. arXiv:2205.12446  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

    Authors: Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna

    Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Languag… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  50. arXiv:2205.03524  [pdf, other

    eess.IV cs.CV

    Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

    Authors: Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

    Abstract: Due to the sophisticated imaging process, an identical scene captured by different cameras could exhibit distinct imaging patterns, introducing distinct proficiency among the super-resolution (SR) models trained on images from different devices. In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.