Skip to main content

Showing 1–50 of 64 results for author: Cao, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.22790  [pdf, ps, other

    eess.IV cs.CV cs.MM

    ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge

    Authors: Yixu Chen, Bowen Chen, Hai Wei, Alan C. Bovik, Baojun Li, Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Dounia Hammou, Fei Yin, Rafal Mantiuk, Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

    Abstract: This paper reports IEEE International Conference on Multimedia \& Expo (ICME) 2025 Grand Challenge on Generalizable HDR and SDR Video Quality Measurement. With the rapid development of video technology, especially High Dynamic Range (HDR) and Standard Dynamic Range (SDR) contents, the need for robust and generalizable Video Quality Assessment (VQA) methods has become increasingly demanded. Existin… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: ICME 2025 Grand Challenges

  2. arXiv:2504.20854  [pdf, other

    cs.NI cs.AI cs.DC eess.SY

    Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

    Authors: Jinsun Yoo, ChonLam Lao, Lianjie Cao, Bob Lantz, Minlan Yu, Tushar Krishna, Puneet Sharma

    Abstract: This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Presented as a poster in NSDI 25

  3. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  4. arXiv:2504.08504  [pdf, other

    eess.SP

    STF-GCN: A Multi-Domain Graph Convolution Network Method for Automatic Modulation Recognition via Adaptive Correlation

    Authors: Mingyuan Shao, Zhengqiu Fu, Dingzhao Li, Fuqing Zhang, Yilin Cai, Shaohua Hong, Lin Cao, Yuan Peng, Jie Qi

    Abstract: Automatic Modulation Recognition (AMR) is an essential part of Intelligent Transportation System (ITS) dynamic spectrum allocation. However, current deep learning-based AMR (DL-AMR) methods are challenged to extract discriminative and robust features at low signal-to-noise ratios (SNRs), where the representation of modulation symbols is highly interfered by noise. Furthermore, current research on… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  5. arXiv:2502.13972  [pdf, other

    eess.SP cs.AI cs.LG

    IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification

    Authors: Yan Huang, Yongru Chen, Lei Cao, Yongnian Cao, Xuechun Yang, Yilin Dong, Tianyu Liu

    Abstract: In recent years, deep learning (DL) models have shown outstanding performance in EEG classification tasks, particularly in Steady-State Visually Evoked Potential(SSVEP)-based Brain-Computer-Interfaces(BCI)systems. DL methods have been successfully applied to SSVEP-BCI. This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures. IncepForm… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  6. arXiv:2501.11062  [pdf, other

    eess.SP

    Design and Prototyping of Filtering Active STAR-RIS with Adjustable Power Splitting

    Authors: Rongguang Song, Haifan Yin, Xilong Pei, Lin Cao, Taorui Yang, Xue Ren, Yuanwei Liu

    Abstract: Reconfigurable Intelligent Surfaces (RISs) have emerged as a transformative technology for next-generation wireless communication systems, offering unprecedented control over electromagnetic wave propagation. In particular, Simultaneously Transmitting and Reflecting RISs (STAR-RISs) have garnered significant attention due to their full-space coverage. This paper presents an active STAR-RIS, which… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

  7. arXiv:2410.12866  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS q-bio.NC

    Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

    Authors: Di Wu, Siyuan Li, Chen Feng, Lu Cao, Yue Zhang, Jie Yang, Mohamad Sawan

    Abstract: Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditiona… ▽ More

    Submitted 18 February, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: ICLR2025 Poster (Preprint V2)

  8. arXiv:2409.09557  [pdf

    cs.RO eess.SY

    Adaptable, shape-conforming robotic endoscope

    Authors: Jiayang Du, Lin Cao, Sanja Dogramazi

    Abstract: This paper introduces a size-adaptable robotic endoscope design, which aims to improve the efficiency and comfort of colonoscopy. The robotic endoscope proposed in this paper combines the expansion mechanism and the external drive system, which can adjust the shape according to the different pipe diameters, thus improving the stability and propulsion force during propulsion. As an actuator in the… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Title: Adaptable, shape-conforming robotic endoscope Authors: Jiayang Du, Lin Cao, Sanja Dogramazi Comments: 15 pages with 10 figures Subj-class: robotic colonoscope This manuscript has been submitted to other journals and is currently under review. Another manuscript borrowed some of the results of this manuscript, so it is necessary to cite the reference

  9. arXiv:2409.00749  [pdf, other

    cs.CV eess.IV

    Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency

    Authors: Wei Sun, Weixia Zhang, Yuqin Cao, Linhan Cao, Jun Jia, Zijian Chen, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: UHD images, typically with resolutions equal to or higher than 4K, pose a significant challenge for efficient image quality assessment (IQA) algorithms, as adopting full-resolution images as inputs leads to overwhelming computational complexity and commonly used pre-processing methods like resizing or cropping may cause substantial loss of detail. To address this problem, we design a multi-branch… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: The proposed model won first prize in ECCV AIM 2024 Pushing the Boundaries of Blind Photo Quality Assessment Challenge

  10. arXiv:2408.04273  [pdf, other

    eess.IV cs.CV

    SG-JND: Semantic-Guided Just Noticeable Distortion Predictor For Image Compression

    Authors: Linhan Cao, Wei Sun, Xiongkuo Min, Jun Jia, Zicheng Zhang, Zijian Chen, Yucheng Zhu, Lizhou Liu, Qiubo Chen, Jing Chen, Guangtao Zhai

    Abstract: Just noticeable distortion (JND), representing the threshold of distortion in an image that is minimally perceptible to the human visual system (HVS), is crucial for image compression algorithms to achieve a trade-off between transmission bit rate and image quality. However, traditional JND prediction methods only rely on pixel-level or sub-band level features, lacking the ability to capture the i… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ICIP 2024

  11. Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

    Authors: Zaiyan Zhang, Jining Yan, Yuanqi Liang, Jiaxin Feng, Haixu He, Li Cao

    Abstract: Remote sensing images often suffer from substantial data loss due to factors such as thick cloud cover and sensor limitations. Existing methods for imputing missing values in remote sensing images fail to fully exploit spatiotemporal auxiliary information, which restricts the accuracy of their reconstructions. To address this issue, this paper proposes a novel deep learning-based approach called M… ▽ More

    Submitted 18 November, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.05913  [pdf, other

    cs.NI eess.SP

    Revisiting Multi-User Downlink in IEEE 802.11ax: A Designers Guide to MU-MIMO

    Authors: Liu Cao, Lyutianyang Zhang, Sumit Roy, Sian Jin

    Abstract: Downlink (DL) Multi-User (MU) Multiple Input Multiple Output (MU-MIMO) is a key technology that allows multiple concurrent data transmissions from an Access Point (AP) to a selected sub-set of clients for higher network efficiency in IEEE 802.11ax. However, DL MU-MIMO feature is typically turned off as the default setting in AP vendors' products, that is, turning on the DL MU-MIMO may not help inc… ▽ More

    Submitted 19 August, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 7 pages, 6 figures, magazine paper

  13. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  14. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  16. arXiv:2404.11278  [pdf, other

    physics.ins-det eess.IV

    Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

    Authors: Dikai Li, Jian Yu, Qian Chen, Ziming Li, Chunhui Zhang, Xiangyu Wan, Zhibing He, Leifeng Cao

    Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More

    Submitted 5 November, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  17. arXiv:2404.01164  [pdf, ps, other

    eess.SY

    Unified Predefined-time Stability Conditions of Nonlinear Systems with Lyapunov Analysis

    Authors: Bing Xiao, Haichao Zhang, Shijie Zhao, Lu Cao

    Abstract: This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  18. arXiv:2312.11460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

    Authors: Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

    Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu… ▽ More

    Submitted 1 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

  19. arXiv:2311.03679  [pdf, other

    cs.CV eess.IV

    Unsupervised convolutional neural network fusion approach for change detection in remote sensing images

    Authors: Weidong Yan, Pei Yan, Li Cao

    Abstract: With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  20. arXiv:2311.02447  [pdf, other

    cs.IT eess.SP

    Quantized-but-uncoded Distributed Detection (QDD) with Unreliable Reporting Channels

    Authors: Lei Cao, Ramanarayanan Viswanathan

    Abstract: Distributed detection primarily centers around two approaches: Unquantized Distributed Detection (UDD), where each sensor reports its complete observation to the fusion center (FC), and quantized-and-Coded DD (CDD), where each sensor first partitions the observation space and then reports to the FC a codeword. In this paper, we introduce Quantized-but-uncoded DD (QDD), where each sensor, after qua… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 11 pages, 8 figure, submitted to IEEE T-IT

  21. arXiv:2310.16137  [pdf, other

    cs.IT eess.SP

    Codebook-based Uplink Transmission Enhancement in 5G Advanced: Sub-band Precoding

    Authors: Liu Cao, Yahia Shabara, Parisa Cheraghi

    Abstract: The transformative enhancements of fifth-generation (5G) mobile devices bring about new challenges to achieve better uplink (UL) performance. Particularly, in codebook-based transmission, the wide-band (WB) precoding and the legacy UL codebook may become main bottlenecks for higher efficient data transmission. In this paper, we investigate the codebook-based UL single-layer transmission performanc… ▽ More

    Submitted 29 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: This work has been accepted by IEEE VCC 2023. 5 pages, 7 figures

  22. arXiv:2310.05368  [pdf, other

    cs.AI cs.MA cs.SD eess.AS

    Measuring Acoustics with Collaborative Multiple Agents

    Authors: Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

    Abstract: As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by set… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Main paper (9 pages and 5 figures and 2 tables) and appendix (16 pages and 13 figures and 10 tables). Accepted for publication by IJCAI 2023

  23. arXiv:2309.16680  [pdf, other

    cs.NI eess.SY

    Semi-Persistent Scheduling in NR Sidelink Mode 2: MAC Packet Reception Ratio Model and ns-3 Validation

    Authors: Liu Cao, Sumit Roy, Collin Brady

    Abstract: 5G New Radio (NR) Sidelink (SL) has demonstrated the promising capability for infrastructure-less cellular coverage. Understanding the fundamentals of the NR SL channel access mechanism, Semi-Persistent Scheduling (SPS), which is specified by the 3rd Generation Partnership Project (3GPP), is a necessity to enhance the NR SL Packet Reception Ratio (PRR). However, most existing works fail to account… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 July, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. 13 pages, 22 figures

  24. arXiv:2309.09843  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Instruction-Following Speech Recognition

    Authors: Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

    Abstract: Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions. With the advent of Large Language Models (LLMs) in speech processing, more organic, text-prompt-based interactions have become possible. However, the mechanisms behind these models' speech understanding and "reasoning" capabilities remai… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  25. arXiv:2308.03263  [pdf, other

    eess.SP

    Prototyping and real-world field trials of RIS-aided wireless communications

    Authors: Xilong Pei, Haifan Yin, Li Tan, Lin Cao, Taorui Yang

    Abstract: Reconfigurable intelligent surface (RIS) is a promising technology that has the potential to change the way we interact with the wireless propagating environment. In this paper, we design and fabricate an RIS system that can be used in the fifth generation (5G) mobile communication networks. We also propose a practical two-step spatial-oversampling codebook algorithm for the beamforming of RIS, wh… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 10 pages, 21 figures

  26. arXiv:2307.02297  [pdf, other

    eess.SP

    RIS with insufficient phase shifting capability: Modeling, beamforming, and experimental validations

    Authors: Lin Cao, Haifan Yin, Li Tan, Xilong Pei

    Abstract: Most research works on reconfigurable intelligent surfaces (RIS) rely on idealized models of the reflection coefficients, i.e., uniform reflection amplitude for any phase and sufficient phase shifting capability. In practice however, such models are oversimplified. This paper introduces a realistic reflection coefficient model for RIS based on measurements. The reflection coefficients are modeled… ▽ More

    Submitted 16 April, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 13 pages, 11 figures

  27. arXiv:2303.12693  [pdf, other

    eess.SY cs.AI

    Resilient Output Containment Control of Heterogeneous Multiagent Systems Against Composite Attacks: A Digital Twin Approach

    Authors: Yukang Cui, Lingbo Cao, Michael V. Basin, Jun Shen, Tingwen Huang, Xin Gong

    Abstract: This paper studies the distributed resilient output containment control of heterogeneous multiagent systems against composite attacks, including denial-of-services (DoS) attacks, false-data injection (FDI) attacks, camouflage attacks, and actuation attacks. Inspired by digital twins, a twin layer (TL) with higher security and privacy is used to decouple the above problem into two tasks: defense pr… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  28. arXiv:2303.02938  [pdf, other

    eess.SP

    RIS-aided Wireless Communications: Can RIS Beat Metal Plate?

    Authors: Jiangfeng Hu, Haifan Yin, Li Tan, Lin Cao, Xilong Pei

    Abstract: Reconfigurable Intelligent Surface (RIS) has recently been regarded as a paradigm-shifting technology beyond 5G, for its flexibility on smartly adjusting the response to the impinging electromagnetic (EM) waves. Usually, RIS can be implemented by properly reconfiguring the adjustable parameters of each RIS unit to align the signal phase on the receiver side. And it is believed that the phase align… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 5 pages, 5 figures

  29. Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective

    Authors: Hui He, Qi Zhang, Shoujin Wang, Kun Yi, Zhendong Niu, Longbing Cao

    Abstract: Performance unfairness among variables widely exists in multivariate time series (MTS) forecasting models since such models may attend/bias to certain (advantaged) variables. Addressing this unfairness problem is important for equally attending to all variables and avoiding vulnerable model biases/risks. However, fair MTS forecasting is challenging and has been less studied in the literature. To b… ▽ More

    Submitted 23 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 13 pages, 5 figures, accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

    MSC Class: 68Txx ACM Class: I.2.6

  30. arXiv:2301.02784  [pdf, other

    eess.SY

    Active Fault Isolation for Discrete Event Systems

    Authors: Lin Cao, Shaolong Shu, Feng Lin

    Abstract: In practice, we can not only disable some events, but also enforce the occurrence of some events prior to the occurrence of other events by external control. In this paper, we combine these two control mechanisms to synthesize a more powerful supervisor. Here our control goal is to design an isolation supervisor which ensures in the closed-loop system, faults are isolatable in the sense that after… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    MSC Class: 93B99 ACM Class: G.2; H.4

  31. arXiv:2301.00656  [pdf, other

    eess.AS cs.CL cs.LG

    TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

    Authors: Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

    Abstract: Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen te… ▽ More

    Submitted 14 March, 2023; v1 submitted 12 December, 2022; originally announced January 2023.

    Comments: Accepted by ICASSP 2023

  32. arXiv:2210.13740  [pdf, other

    cs.NI eess.SY

    Latency-aware End-to-end Multi-path Data Transmission for URLLC Services

    Authors: Liu Cao, Abbas Kiani, Amanda Xiang, Kaippallimalil John, Tony Saboorian

    Abstract: 5th Generation Mobile Communication Technology (5G) utilizes the Access Traffic Steering, Switching, and Splitting (ATSSS) rule to enable multi-path data transmission, which is currently being standardized. Recently, the 3rd Generation Partnership Project (3GPP) SA1 and SA2 have been working on the multi-path solution for possible improvement from different perspectives. However, the existing 3GPP… ▽ More

    Submitted 21 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: This work has been submitted to the IEEE for possible publication. 5 pages, 6 figures

  33. arXiv:2210.01353  [pdf, other

    cs.SD cs.AI eess.AS

    Pay Self-Attention to Audio-Visual Navigation

    Authors: Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang

    Abstract: Audio-visual embodied navigation, as a hot research topic, aims training a robot to reach an audio target using egocentric visual (from the sensors mounted on the robot) and audio (emitted from the target) input. The audio-visual information fusion strategy is naturally important to the navigation performance, but the state-of-the-art methods still simply concatenate the visual and audio features,… ▽ More

    Submitted 5 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Main paper (10 pages and 7 figures) and appendix (21 figures and 4 tables). Accepted for publication by BMVC 2022. For data and code, see https://yyf17.github.io/FSAAVN/index.html

  34. arXiv:2209.02944  [pdf, other

    cs.IT eess.SP

    Architecture-Algorithmic Trade-offs in Multi-path Channel Estimation for mmWAVE Systems

    Authors: Lyutianyang Zhang, Sumit Roy, Liu Cao

    Abstract: 5G mmWave massive MIMO systems are likely to be deployed in dense urban scenarios, where increasing network capacity is the primary objective. A key component in mmWave transceiver design is channel estimation which is challenging due to the very large signal bandwidths (order of GHz) implying significant resolved spatial multipath, coupled with large # of Tx/Rx antennas for large-scale MIMO. This… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  35. arXiv:2206.12046  [pdf, other

    cs.CV cs.LG eess.IV

    Bilateral Network with Channel Splitting Network and Transformer for Thermal Image Super-Resolution

    Authors: Bo Yan, Leilei Cao, Fengliang Qi, Hongbin Wang

    Abstract: In recent years, the Thermal Image Super-Resolution (TISR) problem has become an attractive research topic. TISR would been used in a wide range of fields, including military, medical, agricultural and animal ecology. Due to the success of PBVS-2020 and PBVS-2021 workshop challenge, the result of TISR keeps improving and attracts more researchers to sign up for PBVS-2022 challenge. In this paper,… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: The second place solution for CVPR2022 PBVS-TISR challenge

  36. Multi-Access Point Coordination for Next-Gen Wi-Fi Networks Aided by Deep Reinforcement Learning

    Authors: Lyutianyang Zhang, Hao Yin, Sumit Roy, Liu Cao

    Abstract: Wi-Fi in the enterprise - characterized by overlapping Wi-Fi cells - constitutes the design challenge for next-generation networks. Standardization for recently started IEEE 802.11be (Wi-Fi 7) Working Groups has focused on significant medium access control layer changes that emphasize the role of the access point (AP) in radio resource management (RRM) for coordinating channel access due to the hi… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: To appear in IEEE Systems Journal. 12 pages, 13 figures

  37. arXiv:2205.10897  [pdf, other

    eess.SP

    Efficient PHY Layer Abstraction under Imperfect Channel Estimation

    Authors: Liu Cao, Lyutianyang Zhang, Sian Jin, Sumit Roy

    Abstract: As most existing work investigate the PHY layer abstraction under an assumption of perfect channel estimation, it may become unreliable if there exists channel estimation error in a real communication system. This letter improves an efficient PHY layer method, EESM-log-SGN PHY layer abstraction, by considering the presence of channel estimation error. We develop two methods for implementing the EE… ▽ More

    Submitted 8 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Submitted to IEEE Wireless Communications Letters. 5 pages, 7 figures

  38. arXiv:2204.12736  [pdf

    cs.CV cs.LG eess.IV

    A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising

    Authors: Jiahong Zhang, Meijun Qu, Ye Wang, Lihong Cao

    Abstract: Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input… ▽ More

    Submitted 3 November, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

  39. arXiv:2204.06746  [pdf, other

    eess.IV cs.CV

    Information fusion approach for biomass estimation in a plateau mountainous forest using a synergistic system comprising UAS-based digital camera and LiDAR

    Authors: Rong Huang, Wei Yao, Zhong Xu, Lin Cao, Xin Shen

    Abstract: Forest land plays a vital role in global climate, ecosystems, farming and human living environments. Therefore, forest biomass estimation methods are necessary to monitor changes in the forest structure and function, which are key data in natural resources research. Although accurate forest biomass measurements are important in forest inventory and assessments, high-density measurements that invol… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  40. arXiv:2203.02507  [pdf

    eess.IV physics.optics

    Parallel Fourier Ptychography reconstruction

    Authors: Guocheng Zhou, Shaohui Zhang, Yao Hu, Lei Cao, Yong Huang, Qun Hao

    Abstract: Fourier ptychography has attracted a wide range of focus for its ability of large space-bandwidth-produce, and quantative phase measurement. It is a typical computational imaging technique which refers to optimizing both the imaging hardware and reconstruction algorithms simultaneously. The data redundancy and inverse problem algorithms are the sources of FPM's excellent performance. But at the sa… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: 12 pages with 11 figures

  41. arXiv:2203.00008  [pdf

    physics.med-ph eess.IV physics.optics

    Learned end-to-end high-resolution lensless fiber imaging toward intraoperative real-time cancer diagnosis

    Authors: Jiachen Wu, Tijue Wang, Ortrud Uckermann, Roberta Galli, Gabriele Schackert, Liangcai Cao, Jürgen Czarske, Robert Kuschmierz

    Abstract: Endomicroscopy is indispensable for minimally invasive diagnostics in clinical practice. For optical keyhole monitoring of surgical interventions, high-resolution fiber endoscopic imaging is considered to be very promising, especially in combination with label-free imaging techniques to realize in vivo diagnosis. However, the inherent honeycomb-artifacts of coherent fiber bundles (CFB) reduce the… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

  42. arXiv:2202.10239  [pdf, other

    physics.optics eess.IV

    Fourier ptychography multi-parameter neural network with composite physical priori optimization

    Authors: Delong Yang, Shaohui Zhang, Chuanjian Zheng, Guocheng Zhou, Lei Cao, Yao Hu, Qun Hao

    Abstract: Fourier ptychography microscopy(FP) is a recently developed computational imaging approach for microscopic super-resolution imaging. By turning on each light-emitting-diode (LED) located on different position on the LED array sequentially and acquiring the corresponding images that contain different spatial frequency components, high spatial resolution and quantitative phase imaging can be achieve… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 13 pages, 12 figures, solving inverse problem of computational imaging by neural network

  43. arXiv:2112.12055  [pdf

    physics.optics eess.IV physics.bio-ph q-bio.QM

    Quantitative phase imaging through an ultra-thin lensless fiber endoscope

    Authors: Jiawei Sun, Jiachen Wu, Song Wu, Liangcai Cao, Ruchi Goswami, Salvatore Girardo, Jochen Guck, Nektarios Koukourakis, Juergen W. Czarske

    Abstract: Quantitative phase imaging (QPI) is a label-free technique providing both morphology and quantitative biophysical information in biomedicine. However, applying such a powerful technique to in vivo pathological diagnosis remains challenging. Multi-core fiber bundles (MCFs) enable ultra-thin probes for in vivo imaging, but current MCF imaging techniques are limited to amplitude imaging modalities. W… ▽ More

    Submitted 6 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 16pages, 6 figures

  44. arXiv:2111.12758  [pdf

    physics.optics cs.AI eess.IV physics.bio-ph

    Lensless multicore-fiber microendoscope for real-time tailored light field generation with phase encoder neural network (CoreNet)

    Authors: Jiawei Sun, Jiachen Wu, Nektarios Koukourakis, Robert Kuschmierz, Liangcai Cao, Juergen Czarske

    Abstract: The generation of tailored light with multi-core fiber (MCF) lensless microendoscopes is widely used in biomedicine. However, the computer-generated holograms (CGHs) used for such applications are typically generated by iterative algorithms, which demand high computation effort, limiting advanced applications like in vivo optogenetic stimulation and fiber-optic cell manipulation. The random and di… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  45. arXiv:2110.03841  [pdf, ps, other

    eess.AS cs.CL

    Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition

    Authors: Zhiyun Lu, Yanwei Pan, Thibault Doutre, Parisa Haghani, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman

    Abstract: End-to-end models have achieved state-of-the-art results on several automatic speech recognition tasks. However, they perform poorly when evaluated on long-form data, e.g., minutes long conversational telephony audio. One reason the model fails on long-form speech is that it has only seen short utterances during training. In this paper we study the effect of training utterance length on the word e… ▽ More

    Submitted 1 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: submitted to INTERSPEECH 2022

  46. arXiv:2110.03327  [pdf, other

    eess.AS cs.LG

    Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

    Authors: Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

    Abstract: As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions,… ▽ More

    Submitted 2 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted as a conference paper at ICASSP 2022

  47. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  48. arXiv:2109.05496  [pdf

    eess.IV cs.CV

    A Complex Constrained Total Variation Image Denoising Algorithm with Application to Phase Retrieval

    Authors: Yunhui Gao, Liangcai Cao

    Abstract: This paper considers the constrained total variation (TV) denoising problem for complex-valued images. We extend the definition of TV seminorms for real-valued images to dealing with complex-valued ones. In particular, we introduce two types of complex TV in both isotropic and anisotropic forms. To solve the constrained denoising problem, we adopt a dual approach and derive an accelerated gradient… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: 11 pages, 7 figures

  49. arXiv:2104.14346  [pdf, other

    cs.CL cs.SD eess.AS

    Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

    Authors: Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

    Abstract: Streaming end-to-end automatic speech recognition (ASR) systems are widely used in everyday applications that require transcribing speech to text in real-time. Their minimal latency makes them suitable for such tasks. Unlike their non-streaming counterparts, streaming models are constrained to be causal with no future context and suffer from higher word error rates (WER). To improve streaming mode… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

  50. arXiv:2104.12870  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

    Authors: David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

    Abstract: Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to joi… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021