Skip to main content

Showing 1–14 of 14 results for author: Bao, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2504.02061  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Aligned Better, Listen Better for Audio-Visual Large Language Models

    Authors: Yuxin Guo, Shuailei Ma, Shijie Ma, Xiaoyi Bao, Chen-Wei Xie, Kecheng Zheng, Tingyu Weng, Siyang Sun, Yun Zheng, Wei Zou

    Abstract: Audio is essential for multimodal video understanding. On the one hand, video inherently contains audio, which supplies complementary information to vision. Besides, video large language models (Video-LLMs) can encounter many audio-centric settings. However, existing Video-LLMs and Audio-Visual Large Language Models (AV-LLMs) exhibit deficiencies in exploiting audio information, leading to weak un… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted to ICLR 2025

  2. arXiv:2501.08518  [pdf, other

    cs.HC cs.AI eess.SP q-bio.QM

    Easing Seasickness through Attention Redirection with a Mindfulness-Based Brain--Computer Interface

    Authors: Xiaoyu Bao, Kailin Xu, Jiawei Zhu, Haiyun Huang, Kangning Li, Qiyun Huang, Yuanqing Li

    Abstract: Seasickness is a prevalent issue that adversely impacts both passenger experiences and the operational efficiency of maritime crews. While techniques that redirect attention have proven effective in alleviating motion sickness symptoms in terrestrial environments, applying similar strategies to manage seasickness poses unique challenges due to the prolonged and intense motion environment associate… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  3. arXiv:2412.04746  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance

    Authors: Xuchan Bao, Judith Yue Li, Zhong Yi Wan, Kun Su, Timo Denk, Joonseok Lee, Dima Kuzmin, Fei Sha

    Abstract: Modern music retrieval systems often rely on fixed representations of user preferences, limiting their ability to capture users' diverse and uncertain retrieval needs. To address this limitation, we introduce Diff4Steer, a novel generative retrieval framework that employs lightweight diffusion models to synthesize diverse seed embeddings from user queries that represent potential directions for mu… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024 Creative AI Track

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

  4. arXiv:2408.00381  [pdf, other

    cs.IT eess.SY

    Statistical AoI Guarantee Optimization for Supporting xURLLC in ISAC-enabled V2I Networks

    Authors: Yanxi Zhang, Mingwu Yao, Qinghai Yang, Dongqi Yan, Xu Zhang, Xu Bao, Muyu Mei

    Abstract: This paper addresses the critical challenge of supporting next-generation ultra-reliable and low-latency communication (xURLLC) within integrated sensing and communication (ISAC)-enabled vehicle-to-infrastructure (V2I) networks. We incorporate channel evaluation and retransmission mechanisms for real-time reliability enhancement. Using stochastic network calculus (SNC), we establish a theoretical… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  5. arXiv:2308.12231  [pdf, other

    eess.IV cs.CV

    SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation

    Authors: Qing Xu, Wenwei Kuang, Zeyu Zhang, Xueyao Bao, Haoran Chen, Wenting Duan

    Abstract: Image segmentation plays an essential role in nuclei image analysis. Recently, the segment anything model has made a significant breakthrough in such tasks. However, the current model exists two major issues for cell segmentation: (1) the image encoder of the segment anything model involves a large number of parameters. Retraining or even fine-tuning the model still requires expensive computationa… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  6. Improving COVID-19 CT Classification of CNNs by Learning Parameter-Efficient Representation

    Authors: Yujia Xu, Hak-Keung Lam, Guangyu Jia, Jian Jiang, Junkai Liao, Xinqi Bao

    Abstract: COVID-19 pandemic continues to spread rapidly over the world and causes a tremendous crisis in global human health and the economy. Its early detection and diagnosis are crucial for controlling the further spread. Many deep learning-based methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging. However, challenges still remain, including… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  7. arXiv:2208.03128  [pdf, other

    eess.SP cs.SD eess.AS

    Time-Frequency Distributions of Heart Sound Signals: A Comparative Study using Convolutional Neural Networks

    Authors: Xinqi Bao, Yujia Xu, Hak-Keung Lam, Mohamed Trabelsi, Ines Chihi, Lilia Sidhom, Ernest N. Kamavuako

    Abstract: Time-Frequency Distributions (TFDs) support the heart sound characterisation and classification in early cardiac screening. However, despite the frequent use of TFDs in signal analysis, no study comprehensively compared their performances on deep learning for automatic diagnosis. Furthermore, the combination of signal processing methods as inputs for Convolutional Neural Networks (CNNs) has been p… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

  8. arXiv:2203.08406  [pdf, ps, other

    cs.IT eess.SP

    Levenberg-Marquardt Method Based Cooperative Source Localization in SIMO Molecular Communication via Diffusion Systems

    Authors: Yuqi Miao, Wence Zhang, Xu Bao

    Abstract: Molecular communication underpins nano-scale communications in nanotechnology. The combination of multinanomachines to form nano-networks is one of the main enabling methods. Due to the importance of source localization in establishing nano-networks, this paper proposes a cooperative source localization method for Molecular Communication via Diffusion (MCvD) systems using multiple spherical absorp… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

  9. arXiv:2110.13670  [pdf, other

    eess.IV cs.CV

    W-Net: A Two-Stage Convolutional Network for Nucleus Detection in Histopathology Image

    Authors: Anyu Mao, Jialun Wu, Xinrui Bao, Zeyu Gao, Tieliang Gong, Chen Li

    Abstract: Pathological diagnosis is the gold standard for cancer diagnosis, but it is labor-intensive, in which tasks such as cell detection, classification, and counting are particularly prominent. A common solution for automating these tasks is using nucleus segmentation technology. However, it is hard to train a robust nucleus segmentation model, due to several challenging problems, the nucleus adhesion,… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: BIBM 2021 accepted,including 8 pages, 3 figures

  10. arXiv:2110.13652  [pdf, other

    eess.IV cs.CV cs.LG

    A Precision Diagnostic Framework of Renal Cell Carcinoma on Whole-Slide Images using Deep Learning

    Authors: Jialun Wu, Haichuan Zhang, Zeyu Gao, Xinrui Bao, Tieliang Gong, Chunbao Wang, Chen Li

    Abstract: Diagnostic pathology, which is the basis and gold standard of cancer diagnosis, provides essential information on the prognosis of the disease and vital evidence for clinical treatment. Tumor region detection, subtype and grade classification are the fundamental diagnostic indicators for renal cell carcinoma (RCC) in whole-slide images (WSIs). However, pathological diagnosis is subjective, differe… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: BIBM 2021 accepted, 9 pages including reference, 3 figures and 1 table

  11. arXiv:2004.08063  [pdf, other

    eess.SP

    Outage Analysis for Intelligent Reflecting Surface Assisted Vehicular Communication Networks

    Authors: Jue Wang, Wence Zhang, Xu Bao, Tiecheng Song, Cunhua Pan

    Abstract: Vehicular communication is an important application of the fifth generation of mobile communication systems (5G). Due to its low cost and energy efficiency, intelligent reflecting surface (IRS) has been envisioned as a promising technique that can enhance the coverage performance significantly by passive beamforming. In this paper, we analyze the outage probability performance in IRS-assisted vehi… ▽ More

    Submitted 20 April, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

  12. arXiv:1904.06659  [pdf

    eess.SP physics.optics

    Computational distributed fiber-optic sensing

    Authors: Da-Peng Zhou, Wei Peng, Liang Chen, Xiaoyi Bao

    Abstract: Ghost imaging allows image reconstruction by correlation measurements between a light beam that interacts with the object without spatial resolution and a spatially resolved light beam that never interacts with the object. The two light beams are copies of each other. Its computational version removes the requirement of a spatially resolved detector when the light intensity pattern is pre-known. H… ▽ More

    Submitted 14 April, 2019; originally announced April 2019.

    Comments: 10 pages, 5 figures

  13. arXiv:1811.09620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

    Authors: Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse

    Abstract: In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having… ▽ More

    Submitted 22 October, 2023; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: 17 pages, published as a conference paper at ICLR 2019

    Journal ref: ICLR 2019

  14. arXiv:1610.06283  [pdf, other

    cs.RO cs.LG cs.NE eess.SY

    Deep Neural Networks for Improved, Impromptu Trajectory Tracking of Quadrotors

    Authors: Qiyang Li, Jingxing Qian, Zining Zhu, Xuchan Bao, Mohamed K. Helwa, Angela P. Schoellig

    Abstract: Trajectory tracking control for quadrotors is important for applications ranging from surveying and inspection, to film making. However, designing and tuning classical controllers, such as proportional-integral-derivative (PID) controllers, to achieve high tracking precision can be time-consuming and difficult, due to hidden dynamics and other non-idealities. The Deep Neural Network (DNN), with it… ▽ More

    Submitted 19 July, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

    Comments: 7 pages, 8 figures. Accepted final version. To appear in the proc. of the 2017 IEEE International Conference on Robotics and Automation