Skip to main content

Showing 1–50 of 128 results for author: Xue, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.22685  [pdf, other

    eess.IV cs.AI cs.CV

    DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography

    Authors: Marcus J. Vroemen, Yuqian Chen, Yui Lo, Tengfei Xu, Weidong Cai, Fan Zhang, Josien P. W. Pluim, Lauren J. O'Donnell

    Abstract: Diffusion MRI (dMRI) tractography enables in vivo mapping of brain structural connections, but traditional connectome generation is time-consuming and requires gray matter parcellation, posing challenges for large-scale studies. We introduce DeepMultiConnectome, a deep-learning model that predicts structural connectomes directly from tractography, bypassing the need for gray matter parcellation wh… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 15 pages, 5 figures, 5 tables

  2. arXiv:2505.21138  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis

    Authors: Tianyi Xu, Hongjie Chen, Wang Qing, Lv Hang, Jian Kang, Li Jie, Zhennan Lin, Yongxiang Li, Xie Lei

    Abstract: Large-scale training corpora have significantly improved the performance of ASR models. Unfortunately, due to the relative scarcity of data, Chinese accents and dialects remain a challenge for most ASR models. Recent advancements in self-supervised learning have shown that self-supervised pre- training, combined with large language models (LLM), can effectively enhance ASR performance in low-resou… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  3. arXiv:2505.10786  [pdf, ps, other

    eess.SP cs.HC

    Bridging BCI and Communications: A MIMO Framework for EEG-to-ECoG Wireless Channel Modeling

    Authors: Jiaheng Wang, Zhenyu Wang, Tianheng Xu, Yuan Si, Ang Li, Ting Zhou, Xi Zhao, Honglin Hu

    Abstract: As a method to connect human brain and external devices, Brain-computer interfaces (BCIs) are receiving extensive research attention. Recently, the integration of communication theory with BCI has emerged as a popular trend, offering potential to enhance system performance and shape next-generation communications. A key challenge in this field is modeling the brain wireless communication channel… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2504.20653  [pdf, other

    cs.SE eess.SY

    ComplexVCoder: An LLM-Driven Framework for Systematic Generation of Complex Verilog Code

    Authors: Jian Zuo, Junzhe Liu, Xianyong Wang, Yicheng Liu, Navya Goli, Tong Xu, Hao Zhang, Umamaheswara Rao Tida, Zhenge Jia, Mengying Zhao

    Abstract: Recent advances have demonstrated the promising capabilities of large language models (LLMs) in generating register-transfer level (RTL) code, such as Verilog. However, existing LLM-based frameworks still face significant challenges in accurately handling the complexity of real-world RTL designs, particularly those that are large-scale and involve multi-level module instantiations. To address this… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  5. arXiv:2504.17912  [pdf, ps, other

    cs.SD eess.AS eess.SP

    STNet: Prediction of Underwater Sound Speed Profiles with An Advanced Semi-Transformer Neural Network

    Authors: Wei Huang, Jiajun Lu, Hao Zhang, Tianhe Xu

    Abstract: Real time acquisition of accurate underwater sound velocity profile (SSP) is crucial for tracking the propagation trajectory of underwater acoustic signals, making it play a key role in ocean communication positioning. SSPs can be directly measured by instruments or inverted leveraging sound field data. Although measurement techniques provide a good accuracy, they are constrained by limited spatia… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  6. arXiv:2504.02402  [pdf, other

    cs.SD cs.AI eess.AS

    EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling

    Authors: Hao Yin, Shi Guo, Xu Jia, Xudong XU, Lu Zhang, Si Liu, Dong Wang, Huchuan Lu, Tianfan Xue

    Abstract: When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes, which can be used for recovering the sound. Early studies always encounter trade-offs related to sampling rate, bandwidth, field of view, and the simplicity of the optical path. Recent advances in event camera hardware show good potential for its application in visual sound recovery, becau… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Our project page: https://yyzq1.github.io/EvMic/

  7. arXiv:2504.01375  [pdf, other

    eess.SP

    Simultaneous Pre-compensation for Bandwidth Limitation and Fiber Dispersion in Cost-Sensitive IM/DD Transmission Systems

    Authors: Zhe Zhao, Aiying Yang, Xiaoqian Huang, Peng Guo, Shuhua Zhao, Tianjia Xu, Wenkai Wan, Tianwai Bo, Zhongwei Tan, Yi Dong, Yaojun Qiao

    Abstract: We propose a pre-compensation scheme for bandwidth limitation and fiber dispersion (pre-BL-EDC) based on the modified Gerchberg-Saxton (GS) algorithm. Experimental results demonstrate 1.0/1.0/2.0 dB gains compared to modified GS pre-EDC for 20/28/32 Gbit/s bandwidth-limited systems.

    Submitted 2 April, 2025; originally announced April 2025.

  8. arXiv:2503.21401  [pdf, other

    cs.RO cs.LG eess.SY

    AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

    Authors: Tianyu Xu, Yaoyu Cheng, Pinxi Shen, Lin Zhao

    Abstract: Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power. In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadr… ▽ More

    Submitted 28 March, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  9. arXiv:2503.19292  [pdf, other

    eess.IV cs.AI cs.CV

    Adaptive Wavelet Filters as Practical Texture Feature Amplifiers for Parkinson's Disease Screening in OCT

    Authors: Xiaoqing Zhang, Hanfeng Shi, Xiangyu Li, Haili Ye, Tao Xu, Na Li, Yan Hu, Fan Lv, Jiangfan Chen, Jiang Liu

    Abstract: Parkinson's disease (PD) is a prevalent neurodegenerative disorder globally. The eye's retina is an extension of the brain and has great potential in PD screening. Recent studies have suggested that texture features extracted from retinal layers can be adopted as biomarkers for PD diagnosis under optical coherence tomography (OCT) images. Frequency domain learning techniques can enhance the featur… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  10. arXiv:2503.17564  [pdf, other

    eess.IV cs.CV cs.LG

    ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

    Authors: Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel

    Abstract: Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, working with these models is challenging, with iss… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  11. arXiv:2503.12783  [pdf, other

    cs.CV eess.IV

    Mixed-granularity Implicit Representation for Continuous Hyperspectral Compressive Reconstruction

    Authors: Jianan Li, Huan Chen, Wangcai Zhao, Rui Chen, Tingfa Xu

    Abstract: Hyperspectral Images (HSIs) are crucial across numerous fields but are hindered by the long acquisition times associated with traditional spectrometers. The Coded Aperture Snapshot Spectral Imaging (CASSI) system mitigates this issue through a compression technique that accelerates the acquisition process. However, reconstructing HSIs from compressed data presents challenges due to fixed spatial a… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: Accepted by TNNLS

  12. arXiv:2503.12482  [pdf, other

    eess.SP

    Fuzzy Clustering for Low-Complexity Time Domain Chromatic Dispersion Compensation Scheme in Coherent Optical Fiber Communication Systems

    Authors: Wenkai Wan, Aiying Yang, Peng Guo, Zhe Zhao, Tianjia Xu, Jinxuan Wu, Zhiheng Liu

    Abstract: Chromatic dispersion compensation (CDC), implemented in either the time-domain or frequency-domain, is crucial for enhancing power efficiency in the digital signal processing of modern optical fiber communication systems. Developing low-complexity CDC schemes is essential for hardware implemention, particularly for high-speed and long-haul optical fiber communication systems. In this work, we prop… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  13. arXiv:2503.00616  [pdf, other

    eess.SP

    Net-Zero Integrated Sensing and Communication in Backscatter Systems

    Authors: Yu Zhang, Tongyang Xu, Christos Masouros, Zhu Han

    Abstract: Future wireless networks targeted for improving spectral and energy efficiency, are expected to simultaneously provide sensing functionality and support low-power communications. This paper proposes a novel net-zero integrated sensing and communication (ISAC) model for backscatter systems, including an access point (AP), a net-zero device, and a user receiver. We fully utilize the backscatter mech… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  14. arXiv:2503.00602  [pdf, other

    eess.SP eess.SY

    Zero-Power Backscatter Sensing and Communication Proof-of-Concept

    Authors: Yu Zhang, Xiaoyu Shi, Tongyang Xu

    Abstract: In this paper, we present an experimental setup to evaluate the performance of a radio frequency identification (RFID)-based integrated sensing and communication (ISAC) system. We focus on both the communication and sensing capabilities of the system. Our experiments evaluate the system's performance in various channel fading scenarios and with different substrate materials, including wood, plasti… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  15. arXiv:2502.16188  [pdf

    eess.SY

    Pseudo-Measurement Enhancement in Power Distribution Systems

    Authors: Tao Xu, Kaiqi Wang, Jiadong Zhang, Ji Qiao, Zixuan Zhao, Hong Zhu, Kai Sun

    Abstract: With the rapid development of smart distribution networks (DNs), the integrity and accuracy of grid measurement data are crucial to the safety and stability of the entire system. However, the quality of the user power consumption data cannot be guaranteed during the collection and transmission process. To this end, this paper proposes a low-rank tensor completion model based on CANDECOMP/PARAFAC d… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Journal ref: IEEE PES General Meeting 2025

  16. arXiv:2502.06939  [pdf

    eess.IV cs.CV cs.LG

    Generalizable automated ischaemic stroke lesion segmentation with vision transformers

    Authors: Chris Foulon, Robert Gray, James K. Ruffle, Jonathan Best, Tianbo Xu, Henry Watkins, Jane Rondina, Guilherme Pombo, Dominic Giles, Paul Wright, Marcela Ovando-Tellez, H. Rolf Jäger, Jorge Cardoso, Sebastien Ourselin, Geraint Rees, Parashkev Nachev

    Abstract: Ischaemic stroke, a leading cause of death and disability, critically relies on neuroimaging for characterising the anatomical pattern of injury. Diffusion-weighted imaging (DWI) provides the highest expressivity in ischemic stroke but poses substantial challenges for automated lesion segmentation: susceptibility artefacts, morphological heterogeneity, age-related comorbidities, time-dependent sig… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 29 pages, 7 figures, 2 tables, 1 supplementary table, 2 supplementary figures

  17. arXiv:2502.06171  [pdf

    eess.IV cs.CV

    A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation

    Authors: Wenhui Lei, Hanyu Chen, Zitian Zhang, Luyang Luo, Qiong Xiao, Yannian Gu, Peng Gao, Yankai Jiang, Ci Wang, Guangtao Wu, Tongjia Xu, Yingjie Zhang, Xiaofan Zhang, Pranav Rajpurkar, Shaoting Zhang, Zhenning Wang

    Abstract: Artificial intelligence-assisted imaging analysis has made substantial strides in tumor diagnosis and management. Here we present PASTA, a pan-tumor CT foundation model that achieves state-of-the-art performance on 45 of 46 representative oncology tasks -- including lesion segmentation, tumor detection in plain CT, tumor staging, survival prediction, structured report generation, and cross-modalit… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 57 pages, 7 figures

  18. arXiv:2501.18418  [pdf, other

    eess.IV cs.CV

    Task-based Regularization in Penalized Least-Squares for Binary Signal Detection Tasks in Medical Image Denoising

    Authors: Wentao Chen, Tianming Xu, Weimin Zhou

    Abstract: Image denoising algorithms have been extensively investigated for medical imaging. To perform image denoising, penalized least-squares (PLS) problems can be designed and solved, in which the penalty term encodes prior knowledge of the object being imaged. Sparsity-promoting penalties, such as total variation (TV), have been a popular choice for regularizing image denoising problems. However, such… ▽ More

    Submitted 31 January, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: SPIE Medical Imaging 2025

  19. arXiv:2501.15743  [pdf, other

    eess.IV cs.CV

    Z-Stack Scanning can Improve AI Detection of Mitosis: A Case Study of Meningiomas

    Authors: Hongyan Gu, Ellie Onstott, Wenzhong Yan, Tengyou Xu, Ruolin Wang, Zida Wu, Xiang 'Anthony' Chen, Mohammad Haeri

    Abstract: Z-stack scanning is an emerging whole slide imaging technology that captures multiple focal planes alongside the z-axis of a glass slide. Because z-stacking can offer enhanced depth information compared to the single-layer whole slide imaging, this technology can be particularly useful in analyzing small-scaled histopathological patterns. However, its actual clinical impact remains debated with mi… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: To appear 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

  20. arXiv:2501.13306  [pdf, other

    cs.SD cs.CL eess.AS

    OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

    Authors: Xuelong Geng, Kun Wei, Qijie Shao, Shuiyun Liu, Zhennan Lin, Zhixian Zhao, Guojian Li, Wenjie Tian, Peikun Chen, Yangze Li, Pengcheng Guo, Mingchen Shao, Shuiyuan Wang, Yuang Cao, Chengyou Wang, Tianyi Xu, Yuhang Dai, Xinfa Zhu, Yue Li, Li Zhang, Lei Xie

    Abstract: Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover… ▽ More

    Submitted 16 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: OSUM Technical Report v2. The experimental results reported herein differ from those in v1 because of adding new data and training in more steps

  21. arXiv:2501.11755  [pdf

    eess.IV cs.CV

    A generalizable 3D framework and model for self-supervised learning in medical imaging

    Authors: Tony Xu, Sepehr Hosseini, Chris Anderson, Anthony Rinaldi, Rahul G. Krishnan, Anne L. Martel, Maged Goubran

    Abstract: Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  22. arXiv:2501.03880  [pdf, other

    eess.IV cs.CV cs.LG

    SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation

    Authors: Ying Chen, Rami Al-Maskari, Izabela Horvath, Mayar Ali, Luciano Hoher, Kaiyuan Yang, Zengming Lin, Zhiwei Zhai, Mengzhe Shen, Dejin Xun, Yi Wang, Tony Xu, Maged Goubran, Yunheng Wu, Kensaku Mori, Johannes C. Paetzold, Ali Erturk

    Abstract: Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segme… ▽ More

    Submitted 12 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 2st version

  23. arXiv:2412.10822  [pdf

    eess.SY

    Automated Driving with Evolution Capability: A Reinforcement Learning Method with Monotonic Performance Enhancement

    Authors: Jia Hu, Xuerun Yan, Tian Xu, Haoran Wang

    Abstract: Reinforcement Learning (RL) offers a promising solution to enable evolutionary automated driving. However, the conventional RL method is always concerned with risk performance. The updated policy may not obtain a performance enhancement, even leading to performance deterioration. To address this challenge, this research proposes a High Confidence Policy Improvement Reinforcement Learning-based (HC… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: 24 pages, 16figures

  24. arXiv:2412.09856  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

    Authors: Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai

    Abstract: Text-to-video generation enhances content creation but is highly computationally intensive: The computational cost of Diffusion Transformers (DiTs) scales quadratically in the number of pixels. This makes minute-length video generation extremely expensive, limiting most existing models to generating videos of only 10-20 seconds length. We propose a Linear-complexity text-to-video Generation (LinGe… ▽ More

    Submitted 24 May, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

  25. arXiv:2412.01425  [pdf, other

    cs.SD cs.AI eess.AS

    Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio

    Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Yujie Chen, Hao Gu, Guanjun Li, Junzuo Zhou, Yong Ren, Tao Xu

    Abstract: Open environment oriented open set model attribution of deepfake audio is an emerging research topic, aiming to identify the generation models of deepfake audio. Most previous work requires manually setting a rejection threshold for unknown classes to compare with predicted probabilities. However, models often overfit training instances and generate overly confident predictions. Moreover, threshol… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: Accepted by ISCSLP 2024

  26. arXiv:2411.12653  [pdf, ps, other

    eess.SY stat.ML

    Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression

    Authors: Jixian Liu, Tao Xu, Jianping He, Chongrong Fang

    Abstract: The predict-then-optimize (PTO) framework is indispensable for addressing practical stochastic decision-making tasks. It consists of two crucial steps: initially predicting unknown parameters of an optimization model and subsequently solving the problem based on these predictions. Elmachtoub and Grigas [1] introduced the Smart Predict-then-Optimize (SPO) loss for the framework, which gauges the de… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 10 pages

  27. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  28. arXiv:2410.13720  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Movie Gen: A Cast of Media Foundation Models

    Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

    Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More

    Submitted 26 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

  29. arXiv:2409.18783  [pdf, other

    eess.IV cs.CV

    DualDn: Dual-domain Denoising via Differentiable ISP

    Authors: Ruikang Li, Yujin Wang, Shiqi Chen, Fan Zhang, Jinwei Gu, Tianfan Xue

    Abstract: Image denoising is a critical component in a camera's Image Signal Processing (ISP) pipeline. There are two typical ways to inject a denoiser into the ISP pipeline: applying a denoiser directly to captured raw frames (raw domain) or to the ISP's output sRGB images (sRGB domain). However, both approaches have their limitations. Residual noise from raw-domain denoising can be amplified by the subseq… ▽ More

    Submitted 4 November, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted at ECCV 2024, Project page: https://openimaginglab.github.io/DualDn/

  30. arXiv:2409.17996  [pdf, other

    eess.IV cs.CV cs.LG

    PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

    Authors: Xin Cai, Zhiyuan You, Hailong Zhang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras offer significant advantages in size, weight, and cost compared to traditional lens-based systems. Without a focusing lens, lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, current algorithms struggle with inaccurate forward imaging models and insufficient priors to reconstruct high-quality images. To overcome these li… ▽ More

    Submitted 7 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024 Spotlight

  31. arXiv:2409.03005  [pdf, other

    cs.RO cs.LG eess.SY

    PIETRA: Physics-Informed Evidential Learning for Traversing Out-of-Distribution Terrain

    Authors: Xiaoyi Cai, James Queeney, Tong Xu, Aniket Datar, Chenhui Pan, Max Miller, Ashton Flather, Philip R. Osteen, Nicholas Roy, Xuesu Xiao, Jonathan P. How

    Abstract: Self-supervised learning is a powerful approach for developing traversability models for off-road navigation, but these models often struggle with inputs unseen during training. Existing methods utilize techniques like evidential deep learning to quantify model uncertainty, helping to identify and avoid out-of-distribution terrain. However, always avoiding out-of-distribution terrain can be overly… ▽ More

    Submitted 23 December, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: To appear in RA-L. Video: https://youtu.be/OTnNZ96oJRk

  32. arXiv:2408.10680  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper

    Authors: Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie

    Abstract: Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages. However, adapting these models to new or specific languages is computationally extensive and faces catastrophic forgetting problems. Addressing these issues, our study investigates strategies to enhance the model on new languages in the absence of original training data, w… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  33. arXiv:2408.06776  [pdf, other

    eess.SY cs.AI

    Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks

    Authors: Qiong Liu, Ye Guo, Tong Xu

    Abstract: Inverter-based volt-var control is studied in this paper. One key issue in DRL-based approaches is the limited measurement deployment in active distribution networks, which leads to problems of a partially observable state and unknown reward. To address those problems, this paper proposes a robust DRL approach with a conservative critic and a surrogate reward. The conservative critic utilizes the… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  34. arXiv:2408.02074  [pdf

    eess.IV cs.AI cs.CV

    Applying Conditional Generative Adversarial Networks for Imaging Diagnosis

    Authors: Haowei Yang, Yuxiang Hu, Shuyao He, Ting Xu, Jiajie Yuan, Xingxin Gu

    Abstract: This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation a… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

  35. Content-driven Magnitude-Derivative Spectrum Complementary Learning for Hyperspectral Image Classification

    Authors: Huiyan Bai, Tingfa Xu, Huan Chen, Peifu Liu, Jianan Li

    Abstract: Extracting discriminative information from complex spectral details in hyperspectral image (HSI) for HSI classification is pivotal. While current prevailing methods rely on spectral magnitude features, they could cause confusion in certain classes, resulting in misclassification and decreased accuracy. We find that the derivative spectrum proves more adept at capturing concealed information, there… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: accepted by TGRS

  36. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  37. arXiv:2407.01530  [pdf, other

    eess.IV cs.CV

    xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

    Authors: Tianrun Chen, Chaotao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Yan Wang, Ying Zang, Zejian Li

    Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  38. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  39. arXiv:2406.04776  [pdf, ps, other

    eess.SP cs.AI

    OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

    Authors: Tongyang Xu, Shuangyang Li, Jinhong Yuan

    Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc b… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  40. Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes

    Authors: Wei Huang, Pengfei Wu, Tianhe Xu, Hao Zhang, Kaitao Meng

    Abstract: Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Journal ref: IEEE Internet of Things Journal, 2024

  41. arXiv:2405.07685  [pdf, other

    eess.SY

    Edge Computing for IoT: Novel Insights from a Comparative Analysis of Access Control Models

    Authors: Tao Xue, Ying Zhang, Yanbin Wang, Wenbo Wang, Shuailou Li, Haibin Zhang

    Abstract: IoT edge computing positions computing resources closer to the data sources to reduce the latency, relieve the bandwidth pressure on the cloud, and enhance data security. Nevertheless, data security in IoT edge computing still faces critical threats (e.g., data breaches). Access control is fundamental for mitigating these threats. However, IoT edge computing introduces notable challenges for achie… ▽ More

    Submitted 9 September, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  42. arXiv:2405.02801  [pdf, other

    cs.SD cs.AI eess.AS

    Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

    Authors: Jiajun Li, Tianze Xu, Xuesong Chen, Xinrui Yao, Shuchang Liu

    Abstract: In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the creation of music, images, and other artistic forms across a wide range of industries. However, current models for image- and video-to-music synthesis struggle to capture the nuanced emotions and atmosphere conveyed by visual content. To fill this gap, we propose Mozart's Touch, a multi-modal music gene… ▽ More

    Submitted 25 November, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures, submitted to AIGC 2024

  43. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 4 November, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  44. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2403.08551  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

    Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

    Abstract: Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation an… ▽ More

    Submitted 9 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024. Project Page:https://xingtongge.github.io/GaussianImage-page/ Code: https://github.com/Xinjie-Q/GaussianImage

  46. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 8 February, 2025; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2025

  47. arXiv:2402.18152  [pdf, other

    eess.IV cs.AI cs.CV

    Boosting Neural Representations for Videos with a Conditional Decoder

    Authors: Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

    Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting frame… ▽ More

    Submitted 16 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accept by CVPR 2024

  48. arXiv:2402.00535  [pdf, ps, other

    eess.SP

    A Low-Cost Multi-Band Waveform Security Framework in Resource-Constrained Communications

    Authors: Tongyang Xu, Zhongxiang Wei, Tianhua Xu, Gan Zheng

    Abstract: Traditional physical layer secure beamforming is achieved via precoding before signal transmission using channel state information (CSI). However, imperfect CSI will compromise the performance with imperfect beamforming and potential information leakage. In addition, multiple RF chains and antennas are needed to support the narrow beam generation, which complicates hardware implementation and is n… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  49. arXiv:2401.10070  [pdf, other

    cs.CL cs.SD eess.AS

    Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

    Authors: Yichao Du, Zhirui Zhang, Linan Yue, Xu Huang, Yuqing Zhang, Tong Xu, Linli Xu, Enhong Chen

    Abstract: To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the wh… ▽ More

    Submitted 18 January, 2025; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  50. arXiv:2401.08920  [pdf, other

    eess.IV cs.CV

    Idempotence and Perceptual Image Compression

    Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More

    Submitted 30 January, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024