Skip to main content

Showing 1–50 of 155 results for author: Guo, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03559  [pdf

    cs.CV eess.IV

    Predicting Asphalt Pavement Friction Using Texture-Based Image Indicator

    Authors: Bingjie Lu, Zhengyang Lu, Yijiashun Qi, Hanzhe Guo, Tianyao Sun, Zunduo Zhao

    Abstract: Pavement skid resistance is of vital importance for road safety. The objective of this study is to propose and validate a texture-based image indicator to predict pavement friction. This index enables pavement friction to be measured easily and inexpensively using digital images. Three different types of asphalt surfaces (dense-graded asphalt mix, open-grade friction course, and chip seal) were ev… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2506.20282  [pdf, ps, other

    eess.IV cs.CV

    Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration

    Authors: Jiaxing Huang, Heng Guo, Le Lu, Fan Yang, Minfeng Xu, Ge Yang, Wei Luo

    Abstract: Osteoporosis, characterized by reduced bone mineral density (BMD) and compromised bone microstructure, increases fracture risk in aging populations. While dual-energy X-ray absorptiometry (DXA) is the clinical standard for BMD assessment, its limited accessibility hinders diagnosis in resource-limited regions. Opportunistic computed tomography (CT) analysis has emerged as a promising alternative f… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by MICCAI 2025

  3. arXiv:2506.20231  [pdf, ps, other

    eess.SP

    Sensing-Aware Transmit Waveform/Receive Filter Design for OFDM-MBS Systems

    Authors: Xinghe Li, Kainan Cheng, Hongzhi Guo, Huiyong Li, Ziyang Cheng

    Abstract: In this letter, we study the problem of cooperative sensing design for an orthogonal frequency division multiplexing (OFDM) multiple base stations (MBS) system. We consider a practical scenario where the base stations (BSs) exploit certain subcarriers to realize a sensing function. Since the high sidelobe level (SLL) of OFDM waveforms degrades radar detection for weak targets, and the cross-correl… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

  4. arXiv:2506.08534  [pdf, ps, other

    eess.IV cs.AI cs.CV

    DCD: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber View

    Authors: Donglian Li, Hui Guo, Minglang Chen, Huizhen Chen, Jialing Chen, Bocheng Liang, Pengchen Liang, Ying Tan

    Abstract: Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workl… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  5. arXiv:2505.23821  [pdf, ps, other

    cs.CR cs.SD eess.AS

    SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking

    Authors: Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Xun Chen, Miao Pan

    Abstract: With the surge of social media, maliciously tampered public speeches, especially those from influential figures, have seriously affected social stability and public trust. Existing speech tampering detection methods remain insufficient: they either rely on external reference data or fail to be both sensitive to attacks and robust to benign operations, such as compression and resampling. To tackle… ▽ More

    Submitted 1 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  6. arXiv:2505.23036  [pdf, ps, other

    cs.SD eess.AS

    AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition

    Authors: Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, Hongxiao Guo, Shaoji Zhang, Hui Bu, Wei Chen

    Abstract: This paper delineates AISHELL-5, the first open-source in-car multi-channel multi-speaker Mandarin automatic speech recognition (ASR) dataset. AISHLL-5 includes two parts: (1) over 100 hours of multi-channel speech data recorded in an electric vehicle across more than 60 real driving scenarios. This audio data consists of four far-field speech signals captured by microphones located on each car do… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 5 pages, 1 figures, 3 tables, accepted by InterSpeech 2025

  7. arXiv:2505.20769  [pdf, ps, other

    eess.SY

    Physics-Informed Neural Network for Cross-Domain Predictive Control of Tapered Amplifier Thermal Stabilization

    Authors: Yanpei Shi, Bo Feng, Yuxin Zhong, Haochen Guo, Bangcheng Han, Rui Feng

    Abstract: Thermally induced laser noise poses a critical limitation to the sensitivity of quantum sensor arrays employing ultra-stable amplified lasers, primarily stemming from nonlinear gain-temperature coupling effects in tapered amplifiers (TAs). To address this challenge, we present a robust intelligent control strategy that synergistically integrates an encoder-decoder physics-informed gated recurrent… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.15972  [pdf, other

    math.OC eess.SY

    Extremum Seeking for PDE Systems using Physics-Informed Neural Networks

    Authors: Haojin Guo, Zongyi Guo, Jianguo Guo, Tiago Roux Oliveira

    Abstract: Extremum Seeking (ES) is an effective real-time optimization method for PDE systems in cascade with nonlinear quadratic maps. To address PDEs in the feedback loop, a boundary control law and a re-design of the additive probing signal are mandatory. The latter, commonly called "trajectory generation" or "motion planning," involves designing perturbation signals that anticipate their propagation thr… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 23 pages, 16 figures

  9. arXiv:2505.14730  [pdf

    q-bio.QM cs.CV eess.IV

    Predicting Neo-Adjuvant Chemotherapy Response in Triple-Negative Breast Cancer Using Pre-Treatment Histopathologic Images

    Authors: Hikmat Khan, Ziyu Su, Huina Zhang, Yihong Wang, Bohan Ning, Shi Wei, Hua Guo, Zaibo Li, Muhammad Khalid Khan Niazi

    Abstract: Triple-negative breast cancer (TNBC) is an aggressive subtype defined by the lack of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression, resulting in limited targeted treatment options. Neoadjuvant chemotherapy (NACT) is the standard treatment for early-stage TNBC, with pathologic complete response (pCR) serving as a key prognostic ma… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  10. arXiv:2505.10933  [pdf, ps, other

    eess.SP cs.IT

    Cross-layer Integrated Sensing and Communication: A Joint Industrial and Academic Perspective

    Authors: Henk Wymeersch, Nuutti Tervo, Stefan Wänstedt, Sharief Saleh, Joerg Ahlendorf, Ozgur Akgul, Vasileios Tsekenis, Sokratis Barmpounakis, Liping Bai, Martin Beale, Rafael Berkvens, Nabeel Nisar Bhat, Hui Chen, Shrayan Das, Claude Desset, Antonio de la Oliva, Prajnamaya Dass, Jeroen Famaey, Hamed Farhadi, Gerhard P. Fettweis, Yu Ge, Hao Guo, Rreze Halili, Katsuyuki Haneda, Abdur Rahman Mohamed Ismail , et al. (18 additional authors not shown)

    Abstract: Integrated sensing and communication (ISAC) enables radio systems to simultaneously sense and communicate with their environment. This paper, developed within the Hexa-X-II project funded by the European Union, presents a comprehensive cross-layer vision for ISAC in 6G networks, integrating insights from physical-layer design, hardware architectures, AI-driven intelligence, and protocol-level inno… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  11. arXiv:2504.20777  [pdf, other

    eess.SP

    Bayesian Deep End-to-End Learning for MIMO-OFDM System with Delay-Domain Sparse Precoder

    Authors: Nilesh Kumar Jha, Huayan Guo, Vincent K. N. Lau

    Abstract: This paper introduces a novel precoder design aimed at reducing pilot overhead for effective channel estimation in multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) applications utilizing high-order modulation. We propose an innovative demodulation reference signal scheme that achieves up to an 8x reduction in overhead by implementing a delay-domain sparsity con… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 13 pages, 15 figures

  12. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  13. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  14. arXiv:2504.07758  [pdf, other

    cs.CV eess.IV

    PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution

    Authors: Shuangfan Zhou, Chu Zhou, Youwei Lyu, Heng Guo, Zhanyu Ma, Boxin Shi, Imari Sato

    Abstract: Polarization cameras can capture multiple polarized images with different polarizer angles in a single shot, bringing convenience to polarization-based downstream tasks. However, their direct outputs are color-polarization filter array (CPFA) raw images, requiring demosaicing to reconstruct full-resolution, full-color polarized images; unfortunately, this necessary step introduces artifacts that m… ▽ More

    Submitted 22 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  15. arXiv:2504.06400  [pdf, other

    eess.SP

    Panoptic: True Joint mmWave Communication and Sensing with Compressive Sidelobe Forming

    Authors: Heyu Guo, Ruiyi Shen, Florian Kosterhon, Yasaman Ghasempour

    Abstract: The integration of communication and sensing functions within mmWave systems has gained attention due to the potential for enhanced passive sensing and improved communication reliability. State-of-the-art techniques separate these two functions in frequency, use of hardware, or time, i.e., sending known preambles for channel sensing or unknown symbols for communications. In this paper, we introduc… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Submitted on IEEE Journal on Selected Areas in Communications

  16. arXiv:2504.00678  [pdf, other

    eess.SP

    RapidPD: Rapid Human and Pet Presence Detection System for Smart Vehicles via Wi-Fi

    Authors: Hancheng Guo, Zhen Chen, Mo Huang, Xiu Yin Zhang

    Abstract: Heatstroke and life threatening incidents resulting from the retention of children and animals in vehicles pose a critical global safety issue. Current presence detection solutions often require specialized hardware or suffer from detection delays that do not meet safety standards. To tackle this issue, by re-modeling channel state information (CSI) with theoretical analysis of path propagation, t… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 12 pages, 13 figures, 3 tables

  17. arXiv:2503.20499  [pdf, other

    cs.SD eess.AS

    FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System

    Authors: Hao-Han Guo, Yao Hu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie

    Abstract: In this work, we upgrade FireRedTTS to a new version, FireRedTTS-1S, a high-quality streaming foundation text-to-speech system. FireRedTTS-1S achieves streaming speech generation via two steps: text-to-semantic decoding and semantic-to-acoustic decoding. In text-to-semantic decoding, a semantic-aware speech tokenizer converts the speech signal into semantic tokens, which can be synthesized from th… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  18. arXiv:2503.15390  [pdf, other

    eess.IV cs.CV

    FedSCA: Federated Tuning with Similarity-guided Collaborative Aggregation for Heterogeneous Medical Image Segmentation

    Authors: Yumin Zhang, Yan Gao, Haoran Duan, Hanqing Guo, Tejal Shah, Rajiv Ranjan, Bo Wei

    Abstract: Transformer-based foundation models (FMs) have recently demonstrated remarkable performance in medical image segmentation. However, scaling these models is challenging due to the limited size of medical image datasets within isolated hospitals, where data centralization is restricted due to privacy concerns. These constraints, combined with the data-intensive nature of FMs, hinder their broader ap… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  19. arXiv:2503.12698  [pdf, other

    eess.IV cs.CV

    A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

    Authors: Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong , et al. (9 additional authors not shown)

    Abstract: Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  20. arXiv:2503.10047  [pdf, other

    eess.IV cs.CV

    Dual-domain Modulation Network for Lightweight Image Super-Resolution

    Authors: Wenjie Li, Heng Guo, Yuefeng Hou, Guangwei Gao, Zhanyu Ma

    Abstract: Lightweight image super-resolution (SR) aims to reconstruct high-resolution images from low-resolution images with limited computational costs. We find existing frequency-based SR methods cannot balance the reconstruction of overall structures and high-frequency parts. Meanwhile, these methods are inefficient for handling frequency features and unsuitable for lightweight SR. In this paper, we show… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  21. arXiv:2503.09789  [pdf, other

    eess.SP

    Model-Agnostic Uncertainty Quantification for Fast NFC Tag Identification using RF Fingerprinting

    Authors: Dickson Akuoko Sarpong, Adam Kamrath, Rohit Bhusal, Hongzhi Guo

    Abstract: Near Field Communication (NFC) is widely used in security applications such as door access systems and ID cards. However, clone attacks can replicate digital information, enabling unauthorized access. RF fingerprinting offers a robust defense by extracting unique physical-layer features from NFC cards that cannot be cloned. While RF fingerprinting has been extensively applied to Internet of Things… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  22. arXiv:2503.06879  [pdf, other

    eess.SY

    Reinforcement Learning Based Symbolic Regression for Load Modeling

    Authors: Ding Lin, Han Guo, Jianhui Wang, Meng Yue, Tianqiao Zhao

    Abstract: With the increasing penetration of renewable energy sources, growing demand variability, and evolving grid control strategies, accurate and efficient load modeling has become a critical yet challenging task. Traditional methods, such as fixed-form parametric models and data-driven approaches, often struggle to balance accuracy, computational efficiency, and interpretability. This paper introduces… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 9pages

  23. arXiv:2503.06697  [pdf, other

    eess.SY

    Diffusion Model Based Probabilistic Day-ahead Load Forecasting

    Authors: Ding Lin, Han Guo, Jianhui Wang

    Abstract: Accurate probabilistic load forecasting is crucial for maintaining the safety and stability of power systems. However, the mainstream approach, multi-step prediction, must be improved by cumulative errors and latency issues, which limits its effectiveness in probabilistic day-ahead load forecasting (PDALF). To overcome these challenges, we introduce DALNet, a novel denoising diffusion model design… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 10pages

  24. arXiv:2503.06412  [pdf, other

    cs.RO cs.MA eess.SY

    Vision-Based Cooperative MAV-Capturing-MAV

    Authors: Canlun Zheng, Yize Mi, Hanqing Guo, Huaben Chen, Shiyu Zhao

    Abstract: MAV-capturing-MAV (MCM) is one of the few effective methods for physically countering misused or malicious MAVs.This paper presents a vision-based cooperative MCM system, where multiple pursuer MAVs equipped with onboard vision systems detect, localize, and pursue a target MAV. To enhance robustness, a distributed state estimation and control framework enables the pursuer MAVs to autonomously coor… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  25. arXiv:2503.00455  [pdf, other

    cs.SD cs.AI cs.MA cs.MM eess.AS

    PodAgent: A Comprehensive Framework for Podcast Generation

    Authors: Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee

    Abstract: Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-ag… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  26. arXiv:2502.16584  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Audio-FLAN: A Preliminary Release

    Authors: Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learnin… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  27. arXiv:2502.09826  [pdf, other

    eess.SY

    Safe Reinforcement Learning-based Control for Hydrogen Diesel Dual-Fuel Engines

    Authors: Vasu Sharma, Alexander Winkler, Armin Norouzi, Jakob Andert, David Gordon, Hongsheng Guo

    Abstract: The urgent energy transition requirements towards a sustainable future stretch across various industries and are a significant challenge facing humanity. Hydrogen promises a clean, carbon-free future, with the opportunity to integrate with existing solutions in the transportation sector. However, adding hydrogen to existing technologies such as diesel engines requires additional modeling effort. R… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to IFAC for possible publication

  28. arXiv:2502.08857  [pdf, other

    eess.AS

    ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

    Authors: Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer , et al. (4 additional authors not shown)

    Abstract: ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from ~2,000 speakers (cf. ~100 earlier… ▽ More

    Submitted 24 April, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Database link: https://zenodo.org/records/14498691, Database mirror link: https://huggingface.co/datasets/jungjee/asvspoof5, ASVspoof 5 Challenge Workshop Proceeding: https://www.isca-archive.org/asvspoof_2024/index.html

  29. arXiv:2502.08089  [pdf, other

    cs.RO eess.SY

    A Cooperative Bearing-Rate Approach for Observability-Enhanced Target Motion Estimation

    Authors: Canlun Zheng, Hanqing Guo, Shiyu Zhao

    Abstract: Vision-based target motion estimation is a fundamental problem in many robotic tasks. The existing methods have the limitation of low observability and, hence, face challenges in tracking highly maneuverable targets. Motivated by the aerial target pursuit task where a target may maneuver in 3D space, this paper studies how to further enhance observability by incorporating the \emph{bearing rate} i… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: accepted by icra 2025

  30. arXiv:2501.16014  [pdf, other

    eess.IV

    Spatial-Angular Representation Learning for High-Fidelity Continuous Super-Resolution in Diffusion MRI

    Authors: Ruoyou Wu, Jian Cheng, Cheng Li, Juan Zou, Wenxin Fan, Hua Guo, Yong Liang, Shanshan Wang

    Abstract: Diffusion magnetic resonance imaging (dMRI) often suffers from low spatial and angular resolution due to inherent limitations in imaging hardware and system noise, adversely affecting the accurate estimation of microstructural parameters with fine anatomical details. Deep learning-based super-resolution techniques have shown promise in enhancing dMRI resolution without increasing acquisition time.… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 10 pages, 6 figures

  31. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  32. arXiv:2412.18619  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM eess.AS

    Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

    Authors: Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee , et al. (2 additional authors not shown)

    Abstract: Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success. As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks f… ▽ More

    Submitted 29 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: 69 papes, 18 figures, repo at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

  33. arXiv:2412.08577  [pdf, other

    cs.SD cs.MM eess.AS

    Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

    Authors: Hongming Guo, Ruibo Fu, Yizhong Geng, Shuai Liu, Shuchen Shi, Tao Wang, Chunyu Qiang, Chenxing Li, Ya Li, Zhengqi Wen, Yukun Liu, Xuefei Liu

    Abstract: Text-to-audio (TTA) model is capable of generating diverse audio from textual prompts. However, most mainstream TTA models, which predominantly rely on Mel-spectrograms, still face challenges in producing audio with rich content. The intricate details and texture required in Mel-spectrograms for such audio often surpass the models' capacity, leading to outputs that are blurred or lack coherence. I… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  34. arXiv:2412.06666  [pdf

    eess.IV cs.CV physics.med-ph

    Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset

    Authors: Shanshan Wang, Shoujun Yu, Jian Cheng, Sen Jia, Changjun Tie, Jiayu Zhu, Haohao Peng, Yijing Dong, Jianzhong He, Fan Zhang, Yaowen Xing, Xiuqin Jia, Qi Yang, Qiyuan Tian, Hua Guo, Guobin Li, Hairong Zheng

    Abstract: Diffusion magnetic resonance imaging (dMRI) provides critical insights into the microstructural and connectional organization of the human brain. However, the availability of high-field, open-access datasets that include raw k-space data for advanced research remains limited. To address this gap, we introduce Diff5T, a first comprehensive 5.0 Tesla diffusion MRI dataset focusing on the human brain… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 19 pages, 4 figures, 1 table

  35. arXiv:2412.02815  [pdf, other

    eess.SP

    Near-Field Measurement System for the Upper Mid-Band

    Authors: Ali Rasteh, Raghavendra Palayam Hari, Hao Guo, Marco Mezzavilla, Sundeep Rangan

    Abstract: The upper mid-band (or FR3, spanning 6-24 GHz) is a crucial frequency range for next-generation mobile networks, offering a favorable balance between coverage and spectrum efficiency. From another perspective, the systems operating in the near-field in both indoor environment and outdoor environments can support line-of-sight multiple input multiple output (MIMO) communications and be beneficial f… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: Accepted and presented at the 58th Asilomar Conference on Signals, Systems, and Computers

  36. arXiv:2412.01168  [pdf, other

    cs.RO eess.SY

    On the Surprising Effectiveness of Spectrum Clipping in Learning Stable Linear Dynamics

    Authors: Hanyao Guo, Yunhai Han, Harish Ravichandar

    Abstract: When learning stable linear dynamical systems from data, three important properties are desirable: i) predictive accuracy, ii) provable stability, and iii) computational efficiency. Unconstrained minimization of reconstruction errors leads to high accuracy and efficiency but cannot guarantee stability. Existing methods to enforce stability often preserve accuracy, but do so only at the cost of inc… ▽ More

    Submitted 17 May, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  37. Computation-power Coupled Modeling for IDCs and Collaborative Optimization in ADNs

    Authors: Chuyi Li, Kedi Zheng, Hongye Guo, Chongqing Kang, Qixin Chen

    Abstract: The batch and online workload of Internet data centers (IDCs) offer temporal and spatial scheduling flexibility. Given that power generation costs vary over time and location, harnessing the flexibility of IDCs' energy consumption through workload regulation can optimize the power flow within the system. This paper focuses on multi-geographically distributed IDCs managed by an Internet service com… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Smart Grid. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: IEEE Transactions on Smart Grid, VOL. 15, NO. 3, MAY 2024

  38. arXiv:2411.15269  [pdf, other

    eess.IV cs.CV cs.LG

    MambaIRv2: Attentive State Space Restoration

    Authors: Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, Yawei Li

    Abstract: The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration… ▽ More

    Submitted 10 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: Accepted by CVPR2025

  39. A Data-Driven Pool Strategy for Price-Makers Under Imperfect Information

    Authors: Kedi Zheng, Hongye Guo, Qixin Chen

    Abstract: This paper studies the pool strategy for price-makers under imperfect information. In this occasion, market participants cannot obtain essential transmission parameters of the power system. Thus, price-makers should estimate the market results with respect to their offer curves using available historical information. The linear programming model of economic dispatch is analyzed with the theory of… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Power Systems. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: IEEE Transactions on Power Systems, vol. 38, no. 1, pp. 278-289, Jan. 2023

  40. arXiv:2411.12888  [pdf, other

    cs.IT eess.SP

    An Experimental Multi-Band Channel Characterization in the Upper Mid-Band

    Authors: Roberto Bomfin, Ahmad Bazzi, Hao Guo, Hyeongtaek Lee, Marco Mezzavilla, Sundeep Rangan, Junil Choi, Marwa Chafii

    Abstract: The following paper provides a multi-band channel measurement analysis on the frequency range (FR)3. This study focuses on the FR3 low frequencies 6.5 GHz and 8.75 GHz with a setup tailored to the context of integrated sensing and communication (ISAC), where the data are collected with and without the presence of a target. A method based on multiple signal classification (MUSIC) is used to refine… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  41. arXiv:2411.11110  [pdf, other

    eess.IV cs.CV

    Retinal Vessel Segmentation via Neuron Programming

    Authors: Tingting Wu, Ruyi Min, Peixuan Song, Hengtao Guo, Tieyong Zeng, Feng-Lei Fan

    Abstract: The accurate segmentation of retinal blood vessels plays a crucial role in the early diagnosis and treatment of various ophthalmic diseases. Designing a network model for this task requires meticulous tuning and extensive experimentation to handle the tiny and intertwined morphology of retinal blood vessels. To tackle this challenge, Neural Architecture Search (NAS) methods are developed to fully… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  42. arXiv:2411.06540  [pdf, other

    eess.AS

    Debatts: Zero-Shot Debating Text-to-Speech Synthesis

    Authors: Yiqiao Huang, Yuancheng Wang, Jiaqi Li, Haotian Guo, Haorui He, Shunsi Zhang, Zhizheng Wu

    Abstract: In debating, rebuttal is one of the most critical stages, where a speaker addresses the arguments presented by the opposing side. During this process, the speaker synthesizes their own persuasive articulation given the context from the opposing side. This work proposes a novel zero-shot text-to-speech synthesis system for rebuttal, namely Debatts. Debatts takes two speech prompts, one from the opp… ▽ More

    Submitted 4 December, 2024; v1 submitted 10 November, 2024; originally announced November 2024.

  43. arXiv:2410.14620  [pdf, other

    cs.IT eess.SP

    Site-Specific Outdoor Propagation Assessment and Ray-Tracing Analysis for Wireless Digital Twins

    Authors: Morteza Ghaderi Aram, Hao Guo, Mingsheng Yin, Tommy Svensson

    Abstract: Digital twinning is becoming increasingly vital in the design and real-time control of future wireless networks by providing precise cost-effective simulations, predictive insights, and real-time data integration. This paper explores the application of digital twinning in optimizing wireless communication systems within urban environments, where building arrangements can critically impact network… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  44. arXiv:2410.11180  [pdf, other

    cs.LG eess.SY

    Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

    Authors: Jinyu Liu, Hongye Guo, Yun Li, Qinghu Tang, Fuquan Huang, Tunan Chen, Haiwang Zhong, Qixin Chen

    Abstract: Over the past decade, bidding in power markets has attracted widespread attention. Reinforcement Learning (RL) has been widely used for power market bidding as a powerful AI tool to make decisions under real-world uncertainties. However, current RL methods mostly employ low dimensional bids, which significantly diverge from the N price-power pairs commonly used in the current power markets. The N-… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  45. arXiv:2409.11630  [pdf, other

    cs.SD eess.AS

    Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

    Authors: Haohan Guo, Fenglong Xie, Dongchao Yang, Xixin Wu, Helen Meng

    Abstract: The neural codec language model (CLM) has demonstrated remarkable performance in text-to-speech (TTS) synthesis. However, troubled by ``recency bias", CLM lacks sufficient attention to coarse-grained information at a higher temporal scale, often producing unnatural or even unintelligible speech. This work proposes CoFi-Speech, a coarse-to-fine CLM-TTS approach, employing multi-scale speech coding… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  46. arXiv:2409.10072  [pdf, other

    cs.SD eess.AS

    Speaker Contrastive Learning for Source Speaker Tracing

    Authors: Qing Wang, Hongmei Guo, Jian Kang, Mengjie Du, Jie Li, Xiao-Lei Zhang, Lei Xie

    Abstract: As a form of biometric authentication technology, the security of speaker verification systems is of utmost importance. However, SV systems are inherently vulnerable to various types of attacks that can compromise their accuracy and reliability. One such attack is voice conversion, which modifies a persons speech to sound like another person by altering various vocal characteristics. This poses a… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 2 figures, accepted by SLT

  47. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Yao Hu, Kun Liu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 11 April, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  48. arXiv:2409.00933  [pdf, other

    cs.SD eess.AS

    SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

    Authors: Haohan Guo, Fenglong Xie, Kun Xie, Dongchao Yang, Dake Guo, Xixin Wu, Helen Meng

    Abstract: The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It compresses speech into a shorter, multi-stream discrete semantic sequence with multiple tokens at each frame. Meanwhile, the ordered product quantization is proposed… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT 2024

  49. arXiv:2409.00750  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

    Authors: Yuancheng Wang, Haoyue Zhan, Liwei Liu, Ruihong Zeng, Haotian Guo, Jiachen Zheng, Qiang Zhang, Xueyao Zhang, Shunsi Zhang, Zhizheng Wu

    Abstract: The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems. The autoregressive systems implicitly model duration but exhibit certain deficiencies in robustness and lack of duration controllability. Non-autoregressive systems require explicit alignment information between text and speech during training and predict durations for linguist… ▽ More

    Submitted 20 October, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

  50. arXiv:2408.14585   

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 17 February, 2025; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: We request to withdraw our paper from arXiv due to unresolved author disagreements about the data interpretation and study conclusions. To maintain scientific integrity, we believe withdrawing the paper is necessary. We regret any confusion caused