Skip to main content

Showing 1–50 of 93 results for author: Yu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.07374  [pdf, other

    eess.SY

    Extended Version of "Distributed Adaptive Resilient Consensus Control for Uncertain Nonlinear Multiagent Systems Against Deception Attacks"

    Authors: Mengze Yu, Wei Wang, Jiaqi Yan

    Abstract: This paper studies distributed resilient consensus problem for a class of uncertain nonlinear multiagent systems susceptible to deception attacks. The attacks invade both sensor and actuator channels of each agent. A specific class of Nussbaum functions is adopted to manage the attack-incurred multiple unknown control directions. Additionally, a general form of these Nussbaum functions is provided… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 7 pages, 6 figures. submitted to IEEE Control Systems Letters

  2. arXiv:2504.20854  [pdf, other

    cs.NI cs.AI cs.DC eess.SY

    Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

    Authors: Jinsun Yoo, ChonLam Lao, Lianjie Cao, Bob Lantz, Minlan Yu, Tushar Krishna, Puneet Sharma

    Abstract: This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Presented as a poster in NSDI 25

  3. arXiv:2504.13624  [pdf

    eess.SP

    PV-VLM: A Multimodal Vision-Language Approach Incorporating Sky Images for Intra-Hour Photovoltaic Power Forecasting

    Authors: Huapeng Lin, Miao Yu

    Abstract: The rapid proliferation of solar energy has significantly expedited the integration of photovoltaic (PV) systems into contemporary power grids. Considering that the cloud dynamics frequently induce rapid fluctuations in solar irradiance, accurate intra-hour forecasting is critical for ensuring grid stability and facilitating effective energy management. To leverage complementary temporal, textual,… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  4. arXiv:2503.14154  [pdf, other

    cs.CV cs.MM eess.IV

    RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation

    Authors: Zhang Chen, Shuai Wan, Siyu Ren, Fuzheng Yang, Mengting Yu, Junhui Hou

    Abstract: One of the main challenges in point cloud compression (PCC) is how to evaluate the perceived distortion so that the codec can be optimized for perceptual quality. Current standard practices in PCC highlight a primary issue: while single-feature metrics are widely used to assess compression distortion, the classic method of searching point-to-point nearest neighbors frequently fails to adequately b… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  5. arXiv:2503.12936  [pdf, other

    eess.AS

    FNSE-SBGAN: Far-field Speech Enhancement with Schrodinger Bridge and Generative Adversarial Networks

    Authors: Tong Lei, Qinwen Hu, Ziyao Lin, Andong Li, Rilin Chen, Meng Yu, Dong Yu, Jing Lu

    Abstract: The prevailing method for neural speech enhancement predominantly utilizes fully-supervised deep learning with simulated pairs of far-field noisy-reverberant speech and clean speech. Nonetheless, these models frequently demonstrate restricted generalizability to mixtures recorded in real-world conditions. To address this issue, this study investigates training enhancement models directly on real m… ▽ More

    Submitted 15 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: 13 pages, 6 figures

  6. arXiv:2503.06216  [pdf

    eess.SP

    A Novel Distributed PV Power Forecasting Approach Based on Time-LLM

    Authors: Huapeng Lin, Miao Yu

    Abstract: Distributed photovoltaic (DPV) systems are essential for advancing renewable energy applications and achieving energy independence. Accurate DPV power forecasting can optimize power system planning and scheduling while significantly reducing energy loss, thus enhancing overall system efficiency and reliability. However, solar energy's intermittent nature and DPV systems' spatial distribution creat… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 23 pages, 8 figures

  7. arXiv:2502.14145  [pdf, other

    cs.CL eess.AS

    LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems

    Authors: Hao Zhang, Weiwei Li, Rilin Chen, Vinay Kothapally, Meng Yu, Dong Yu

    Abstract: Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD pred… ▽ More

    Submitted 24 February, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: In submission to INTERSPEECH 2025

  8. arXiv:2412.16773  [pdf, other

    stat.ML cs.LG eess.SP q-bio.NC

    Fast Multi-Group Gaussian Process Factor Models

    Authors: Evren Gokcen, Anna I. Jasper, Adam Kohn, Christian K. Machens, Byron M. Yu

    Abstract: Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian process factor models to characterize multi-population int… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  9. Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field

    Authors: Lan Jiang, Yuchao Zheng, Miao Yu, Haiqing Zhang, Fatemah Aladwani, Alessandro Perelli

    Abstract: Accurate brain tumor segmentation remains a challenging task due to structural complexity and great individual differences of gliomas. Leveraging the pre-eminent detail resilience of CRF and spatial feature extraction capacity of V-net, we propose a multimodal 3D Volume Generative Adversarial Network (3D-vGAN) for precise segmentation. The model utilizes Pseudo-3D for V-net improvement, adds condi… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 13 pages, 7 figures, Annual Conference on Medical Image Understanding and Analysis (MIUA) 2024

    MSC Class: 15-11 ACM Class: I.4.6; I.5.4

    Journal ref: Medical Image Understanding and Analysis (MIUA), Lecture Notes in Computer Science, Springer, vol. 14859, 2024

  10. arXiv:2410.13720  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Movie Gen: A Cast of Media Foundation Models

    Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

    Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More

    Submitted 26 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

  11. arXiv:2410.02714  [pdf, other

    eess.IV cs.CV cs.LG

    AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease

    Authors: Romoke Grace Akindele, Samuel Adebayo, Paul Shekonya Kanda, Ming Yu

    Abstract: Alzheimer's disease (AD) is a progressive neurodegenerative disorder with increasing prevalence among the aging population, necessitating early and accurate diagnosis for effective disease management. In this study, we present a novel hybrid deep learning framework that integrates both 2D Convolutional Neural Networks (2D-CNN) and 3D Convolutional Neural Networks (3D-CNN), along with a custom loss… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  12. arXiv:2410.01150  [pdf, other

    eess.AS cs.SD

    Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

    Authors: Hsin-Tien Chiang, Hao Zhang, Yong Xu, Meng Yu, Dong Yu

    Abstract: In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To overcome these limitations, we propose a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progre… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Paper in submission

  13. arXiv:2410.00997  [pdf

    eess.SP q-bio.TO

    A novel ultrasonic device for monitoring implant condition

    Authors: Amirhossein Yazdkhasti, Sophie Lloyd, Joseph H. Schwab, Miao Yu, Hamid Ghaednia

    Abstract: Every year more than 2.3 million joint replacement is performed worldwide. Around 10% of these replacements fail those results in revisions at a cost of $8 billion per year. In particular patients younger than 55 years of age face higher risks of failure due to greater demand on their joints. The long-term failure of joint replacement such as implant loosening significantly decreases the life expe… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 12 Pages, 8 figures, 1 table

  14. arXiv:2409.07556  [pdf, other

    eess.AS cs.SD

    SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

    Authors: Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu

    Abstract: In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot textbased speech editing and text-to-speech synthesis. SSR-Speech is built on a Transformer decoder and incorporates classifier-free guidance to enhance the stability of the generation process. A watermark Encodec is proposed to embed frame-level watermarks into the edited re… ▽ More

    Submitted 1 January, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: ICASSP 2025

  15. Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery

    Authors: Hitesh Kyatham, Shahriar Negahdaripour, Michael Xu, Xiaomin Lin, Miao Yu, Yiannis Aloimonos

    Abstract: Underwater robot perception is crucial in scientific subsea exploration and commercial operations. The key challenges include non-uniform lighting and poor visibility in turbid environments. High-frequency forward-look sonar cameras address these issues, by providing high-resolution imagery at maximum range of tens of meters, despite complexities posed by high degree of speckle noise, and lack of… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Journal ref: OCEANS 2024 - Halifax

  16. arXiv:2409.06954  [pdf, other

    eess.AS

    Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

    Authors: Yue Qiao, Vinay Kothapally, Meng Yu, Dong Yu

    Abstract: Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage netw… ▽ More

    Submitted 16 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  17. arXiv:2408.09315  [pdf, other

    eess.IV cs.CV

    Unpaired Volumetric Harmonization of Brain MRI with Conditional Latent Diffusion

    Authors: Mengqi Wu, Minhui Yu, Shuaiming Jing, Pew-Thian Yap, Zhengwu Zhang, Mingxia Liu

    Abstract: Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  18. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Cheng Ouyang, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 16 January, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 23 pages, 3 figures, 2 tables

  19. SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

    Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

    Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More

    Submitted 24 January, 2025; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 8 pages, Accepted to SLT 2024

    Journal ref: 2024 IEEE Spoken Language Technology Workshop (SLT), pp. 317-324, 2024

  20. arXiv:2406.09589  [pdf, other

    eess.AS

    Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

    Authors: Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

    Abstract: In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on microphone array configurations and the information of the target speaker's location or voiceprint. This study introduces the Solo Spatial Feature (S… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  21. arXiv:2405.19213  [pdf, other

    eess.SY cs.AI cs.LG cs.NI

    EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

    Authors: ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu

    Abstract: Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless… ▽ More

    Submitted 14 January, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 12 pages

  22. A Valuation Framework for Customers Impacted by Extreme Temperature-Related Outages

    Authors: Min Gyung Yu, Monish Mukherjee, Shiva Poudela, Sadie R. Bender, Sarmad Hanif, Trevor D. Hardy, Hayden M. Reeve

    Abstract: Extreme temperature outages can lead to not just economic losses but also various non-energy impacts (NEI) due to significant degradation of indoor operating conditions caused by service disruptions. However, existing resilience assessment approaches lack specificity for extreme temperature conditions. They often overlook temperature-related mortality and neglect the customer characteristics and g… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: Appl. Energy.368(2024)123450

  23. arXiv:2405.02504  [pdf, other

    eess.IV cs.CV

    Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

    Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

    Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More

    Submitted 11 November, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2403.05901  [pdf, other

    cs.ET eess.SY

    Unleashing the Power of T1-cells in SFQ Arithmetic Circuits

    Authors: Rassul Bairamkulov, Mingfei Yu, Giovanni De Micheli

    Abstract: Rapid single-flux quantum (RSFQ), a leading cryogenic superconductive electronics (SCE) technology, offers extremely low power dissipation and high speed. However, implementing RSFQ systems at VLSI complexity faces challenges, such as substantial area overhead from gate-level pipelining and path balancing, exacerbated by RSFQ's limited layout density. T1 flip-flop (T1-FF) is an RSFQ logic cell ope… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: To appear at the 2024 ACM/IEEE Design Automation and Test in Europe, Valencia, Spain, 25-27 March 2024. 2 pages, 1 figure, 1 table

  25. arXiv:2401.06650  [pdf, ps, other

    eess.SY

    LMI-based robust model predictive control for a quarter car with series active variable geometry suspension

    Authors: Zilin Feng, Anastasis Georgiou, Simos A. Evangelou, Min Yu, Imad M Jaimoukha, Daniele Dini

    Abstract: This paper proposes a robust model predictive control-based solution for the recently introduced series active variable geometry suspension (SAVGS) to improve the ride comfort and road holding of a quarter car. In order to close the gap between the nonlinear multi-body SAVGS model and its linear equivalent, a new uncertain system characterization is proposed that captures unmodeled dynamics, param… ▽ More

    Submitted 29 January, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 13 pages, 11 figures, 2 tables, IEEE Transactions on Control Systems Technology

  26. arXiv:2311.14316  [pdf, other

    eess.SP cs.AI

    Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

    Authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu

    Abstract: Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  27. arXiv:2311.13075  [pdf, other

    eess.AS

    Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

    Authors: Meng Yu, Dong Yu

    Abstract: Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others. While traditional beamforming and neural beamforming techniques, centered on creating a directional array, necessitate the designation of a singular target direction, they often overlook the concept of a field of view (FOV), that defines an angular… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 6 pages, 5 figures

  28. arXiv:2311.08271  [pdf, other

    cs.LG cs.IT cs.NI eess.SP

    Mobility-Induced Graph Learning for WiFi Positioning

    Authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko

    Abstract: A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network ca… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: submitted to a possible IEEE journal

  29. arXiv:2311.05477  [pdf, other

    eess.IV cs.CV cs.LG

    Using ResNet to Utilize 4-class T2-FLAIR Slice Classification Based on the Cholinergic Pathways Hyperintensities Scale for Pathological Aging

    Authors: Wei-Chun Kevin Tsai, Yi-Chien Liu, Ming-Chun Yu, Chia-Ju Chou, Sui-Hing Yan, Yang-Teng Fan, Yan-Hsiang Huang, Yen-Ling Chiu, Yi-Fang Chuang, Ran-Zan Wang, Yao-Chia Shih

    Abstract: The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable… ▽ More

    Submitted 11 September, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 8 pages, 2 figures, 2 tables

  30. arXiv:2310.13177  [pdf

    eess.SY

    Enhancing Building Energy Efficiency through Advanced Sizing and Dispatch Methods for Energy Storage

    Authors: Min Gyung Yu, Xu Ma, Bowen Huang, Karthik Devaprasad, Fredericka Brown, Di Wu

    Abstract: Energy storage and electrification of buildings hold great potential for future decarbonized energy systems. However, there are several technical and economic barriers that prevent large-scale adoption and integration of energy storage in buildings. These barriers include integration with building control systems, high capital costs, and the necessity to identify and quantify value streams for dif… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  31. arXiv:2310.03608  [pdf, other

    eess.IV cs.CV

    How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

    Authors: Menghan Yu, Sourabh Kulhare, Courosh Mehanian, Charles B Delahunt, Daniel E Shea, Zohreh Laverriere, Ishan Shah, Matthew P Horning

    Abstract: Acquiring large quantities of data and annotations is known to be effective for developing high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this s… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: accepted in Simulation and Synthesis in Medical Imaging (SASHIMI)

  32. arXiv:2309.16049  [pdf, other

    eess.AS cs.SD eess.SP

    Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

    Authors: Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

    Abstract: Acoustic howling suppression (AHS) is a critical challenge in audio communication systems. In this paper, we propose a novel approach that leverages the power of neural networks (NN) to enhance the performance of traditional Kalman filter algorithms for AHS. Specifically, our method involves the integration of NN modules into the Kalman filter, enabling refining reference signal, a key factor in e… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  33. arXiv:2309.16048  [pdf, other

    eess.AS cs.SD eess.SP

    Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

    Authors: Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

    Abstract: In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This framework integrates a neural network (NN) module into the closed-loop system during training with signals generated recursively on the fly to closely mimic the streaming process of acoustic howling suppression (AHS). The propose… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  34. arXiv:2309.09028  [pdf, other

    eess.AS cs.SD

    Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

    Authors: Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

    Abstract: Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynth… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  35. arXiv:2307.03668  [pdf

    eess.SP

    Using electrical impedance spectroscopy to identify equivalent circuit models of lubricated contacts with complex geometry: in-situ application to mini traction machine

    Authors: Min Yu, Jie Zhang, Arndt Joedicke, Tom Reddyhoff

    Abstract: Electrical contact resistance or capacitance as measured between a lubricated contact has been used in tribometers, partially reflecting the lubrication condition. In contrast, the electrical impedance provides rich information of magnitude and phase, which can be interpreted using equivalent circuit models, enabling more comprehensive measurements, including the variation of lubricant film thickn… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  36. arXiv:2305.02583  [pdf, other

    eess.AS cs.SD

    Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

    Authors: Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

    Abstract: Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage thei… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2302.09252

  37. arXiv:2305.01637  [pdf, other

    eess.AS cs.SD

    Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

    Authors: Hao Zhang, Meng Yu, Dong Yu

    Abstract: Hybrid meetings have become increasingly necessary during the post-COVID period and also brought new challenges for solving audio-related problems. In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult. This paper proposes a deep learning approach to tackle this problem by formulating a recurrent feedback suppressi… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  38. arXiv:2302.13273  [pdf, other

    cs.SD cs.MM eess.AS

    Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion

    Authors: Jianrong Wang, Jinyu Liu, Li Liu, Xuewei Li, Mei Yu, Jie Gao, Qiang Fang

    Abstract: Acoustic-to-articulatory inversion (AAI) aims to estimate the parameters of articulators from speech audio. There are two common challenges in AAI, which are the limited data and the unsatisfactory performance in speaker independent scenario. Most current works focus on extracting features directly from speech and ignoring the importance of phoneme information which may limit the performance of AA… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  39. arXiv:2302.09252  [pdf, other

    eess.AS cs.SD

    Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

    Authors: Hao Zhang, Meng Yu, Dong Yu

    Abstract: In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it. Deep AHS is trained in a teacher forcing way which converts the recurrent howling suppression process into an instantaneous speech separation process to simplify the problem and accelerate the model training. The proposed method utili… ▽ More

    Submitted 17 August, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted for publication in 2023 ICASSP

  40. arXiv:2301.12363  [pdf, other

    eess.AS cs.SD

    NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

    Authors: Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

    Abstract: The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges. However, the inability to model nonlinearity and the need to tune control parameters cast limitations on such adaptive filtering algorithms. In this paper, we integrate the frequency domain Kalman filter (FDKF) and deep neural networks… ▽ More

    Submitted 26 December, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: The term of the algorithm is renamed because it conflicts with an existing KalmanNet algorithm proposed by Revach et. al. (arXiv:2107.10043); Accepted by ASRU 2023

  41. arXiv:2212.12810  [pdf, other

    eess.IV cs.CV

    Hybrid Representation Learning for Cognitive Diagnosis in Late-Life Depression Over 5 Years with Structural MRI

    Authors: Lintao Zhang, Lihong Wang, Minhui Yu, Rong Wu, David C. Steffens, Guy G. Potter, Mingxia Liu

    Abstract: Late-life depression (LLD) is a highly prevalent mood disorder occurring in older adults and is frequently accompanied by cognitive impairment (CI). Studies have shown that LLD may increase the risk of Alzheimer's disease (AD). However, the heterogeneity of presentation of geriatric depression suggests that multiple biological mechanisms may underlie it. Current biological research on LLD progress… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  42. arXiv:2212.03997  [pdf, other

    eess.SY

    Analyzing At-Scale Distribution Grid Response to Extreme Temperatures

    Authors: Sarmad Hanif, Monish Mukherjee, Shiva Poudel, Rohit A Jinsiwale, Min Gyung Yu, Trevor Hardy, Hayden Reeve

    Abstract: Threats against power grids continue to increase, as extreme weather conditions and natural disasters (extreme events) become more frequent. Hence, there is a need for the simulation and modeling of power grids to reflect realistic conditions during extreme events conditions, especially distribution systems. This paper presents a modeling and simulation platform for electric distribution grids whi… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  43. arXiv:2211.12590  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Deep Neural Mel-Subband Beamformer for In-car Speech Separation

    Authors: Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

    Abstract: While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use. In this paper, we propose DL-based mel-subband spatio-temporal beamformer to perform speech separation… ▽ More

    Submitted 11 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  44. arXiv:2211.10023  [pdf, other

    cs.CV cs.LG eess.IV

    LiSnowNet: Real-time Snow Removal for LiDAR Point Cloud

    Authors: Ming-Yuan Yu, Ram Vasudevan, Matthew Johnson-Roberson

    Abstract: LiDARs have been widely adopted to modern self-driving vehicles, providing 3D information of the scene and surrounding objects. However, adverser weather conditions still pose significant challenges to LiDARs since point clouds captured during snowfall can easily be corrupted. The resulting noisy point clouds degrade downstream tasks such as mapping. Existing works in de-noising point clouds corru… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: The paper has been accepted for the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

  45. arXiv:2210.17014  [pdf

    eess.SY physics.optics

    Parametrically driven inertial sensing in chip-scale optomechanical cavities at the thermodynamical limits with extended dynamic range

    Authors: Jaime Gonzalo Flor Flores, Talha Yerebakan, Wenting Wang, Mingbin Yu, Dim-Lee Kwong, Andrey Matsko, Chee Wei Wong

    Abstract: Recent scientific and technological advances have enabled the detection of gravitational waves, autonomous driving, and the proposal of a communications network on the Moon (Lunar Internet or LunaNet). These efforts are based on the measurement of minute displacements and correspondingly the forces or fields transduction, which translate to acceleration, velocity, and position determination for na… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  46. arXiv:2209.07302  [pdf, other

    cs.SD eess.AS

    MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement

    Authors: Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

    Abstract: Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory ass… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ICONIP 2022

  47. arXiv:2206.06145  [pdf

    q-bio.MN eess.SY

    Identification of cancer-keeping genes as therapeutic targets by finding network control hubs

    Authors: Xizhe Zhang, Chunyu Pan, Xinru Wei, Meng Yu, Shuangjie Liu, Jun An, Jieping Yang, Baojun Wei, Wenjun Hao, Yang Yao, Yuyan Zhu, Weixiong Zhang

    Abstract: Finding cancer driver genes has been a focal theme of cancer research and clinical studies. One of the recent approaches is based on network structural controllability that focuses on finding a control scheme and driver genes that can steer the cell from an arbitrary state to a designated state. While theoretically sound, this approach is impractical for many reasons, e.g., the control scheme is o… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Contact the corresponding authors for supplementary material

  48. arXiv:2205.10401  [pdf, other

    eess.AS cs.SD

    NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

    Authors: Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

    Abstract: Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back. In this paper, we present an all-deep-learning framework that implicitly estimates the second order statistics of echo/noise and target speech, and jointly solves echo and noise suppression th… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  49. arXiv:2205.00434  [pdf, other

    cs.CV eess.IV

    Reinforced Swin-Convs Transformer for Underwater Image Enhancement

    Authors: Tingdi Ren, Haiyong Xu, Gangyi Jiang, Mei Yu, Ting Luo

    Abstract: Underwater Image Enhancement (UIE) technology aims to tackle the challenge of restoring the degraded underwater images due to light absorption and scattering. To address problems, a novel U-Net based Reinforced Swin-Convs Transformer for the Underwater Image Enhancement method (URSCT-UIE) is proposed. Specifically, with the deficiency of U-Net based on pure convolutions, we embedded the Swin Trans… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Submitted by NeurIPS 2022

  50. arXiv:2203.17068  [pdf, other

    eess.AS cs.SD

    EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

    Authors: Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

    Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convoluti… ▽ More

    Submitted 15 December, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted in SLT 2022