Skip to main content

Showing 1–32 of 32 results for author: Tan, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.23052  [pdf, other

    eess.IV

    ShiftLIC: Lightweight Learned Image Compression with Spatial-Channel Shift Operations

    Authors: Youneng Bao, Wen Tan, Chuanmin Jia, Mu Li, Yongsheng Liang, Yonghong Tian

    Abstract: Learned Image Compression (LIC) has attracted considerable attention due to their outstanding rate-distortion (R-D) performance and flexibility. However, the substantial computational cost poses challenges for practical deployment. The issue of feature redundancy in LIC is rarely addressed. Our findings indicate that many features within the LIC backbone network exhibit similarities. This paper… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  2. arXiv:2502.20311  [pdf, other

    cs.LG cs.SD eess.AS

    Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

    Authors: Marcus Yu Zhe Wee, Justin Juin Hng Wong, Lynus Lim, Joe Yu Wei Tan, Prannaya Gupta, Dillion Lim, En Hao Tew, Aloysius Keng Siew Han, Yong Zhi Lim

    Abstract: Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  3. arXiv:2502.04399  [pdf, other

    cs.LG cs.AI eess.SY

    Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning

    Authors: Bokeng Zheng, Bo Rao, Tianxiang Zhu, Chee Wei Tan, Jingpu Duan, Zhi Zhou, Xu Chen, Xiaoxi Zhang

    Abstract: Advances in artificial intelligence (AI) including foundation models (FMs), are increasingly transforming human society, with smart city driving the evolution of urban living.Meanwhile, vehicle crowdsensing (VCS) has emerged as a key enabler, leveraging vehicles' mobility and sensor-equipped capabilities. In particular, ride-hailing vehicles can effectively facilitate flexible data collection and… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  4. arXiv:2501.08809  [pdf, other

    cs.SD cs.AI eess.AS

    XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

    Authors: Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu

    Abstract: In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quali… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: accepted by TMM

  5. arXiv:2501.01108  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

    Authors: Haina Zhu, Yizhi Zhou, Hangting Chen, Jianwei Yu, Ziyang Ma, Rongzhi Gu, Yi Luo, Wei Tan, Xie Chen

    Abstract: Recent years have witnessed the success of foundation models pre-trained with self-supervised learning (SSL) in various music informatics understanding tasks, including music tagging, instrument classification, key detection, and more. In this paper, we propose a self-supervised music representation learning model for music understanding. Distinguished from previous studies adopting random project… ▽ More

    Submitted 3 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

  6. arXiv:2412.13786  [pdf, other

    eess.AS cs.SD

    SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

    Authors: Chenyu Yang, Shuai Wang, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Yaoxun Xu, Yizhi Zhou, Haina Zhu, Haizhou Li

    Abstract: The emergence of novel generative modeling paradigms, particularly audio language models, has significantly advanced the field of song generation. Although state-of-the-art models are capable of synthesizing both vocals and accompaniment tracks up to several minutes long concurrently, research about partial adjustments or editing of existing songs is still underexplored, which allows for more flex… ▽ More

    Submitted 28 January, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI2025

  7. arXiv:2410.00168  [pdf, other

    cs.CL cs.SD eess.AS

    SSR: Alignment-Aware Modality Connector for Speech Language Models

    Authors: Weiting Tan, Hirofumi Inaguma, Ning Dong, Paden Tomasello, Xutai Ma

    Abstract: Fusing speech into pre-trained language model (SpeechLM) usually suffers from inefficient encoding of long-form speech and catastrophic forgetting of pre-trained text modality. We propose SSR-Connector (Segmented Speech Representation Connector) for better modality fusion. Leveraging speech-text alignments, our approach segments and compresses speech features to match the granularity of text embed… ▽ More

    Submitted 17 May, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

    Comments: IWSLT 2025

  8. arXiv:2409.13216  [pdf, other

    cs.SD eess.AS

    MuCodec: Ultra Low-Bitrate Music Codec

    Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Shun Lei, Zhiwei Lin, Zhiyong Wu

    Abstract: Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on modeling semantic or acoustic information cannot effectively reconstruct music with both vocals and backgrounds. To address this issue, we propose MuCod… ▽ More

    Submitted 28 September, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  9. arXiv:2407.04840  [pdf

    eess.SY

    Analysis of Dead Reckoning Accuracy in Swarm Robotics System

    Authors: Weihang Tan, Timothy Anglea, Yongqiang Wang

    Abstract: The objective of this paper is to determine the position of a single mobile robot in a swarm using dead reckoning techniques. We investigate the accuracy of navigation by using this process. The paper begins with the research background and social importance. Then, the specific experimental setup and analysis of experimental results are presented. Finally, the results are detailed and some potenti… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  10. arXiv:2403.06700  [pdf, other

    eess.IV

    Enhancing Adversarial Training with Prior Knowledge Distillation for Robust Image Compression

    Authors: Zhi Cao, Youneng Bao, Fanyang Meng, Chao Li, Wen Tan, Genhong Wang, Yongsheng Liang

    Abstract: Deep neural network-based image compression (NIC) has achieved excellent performance, but NIC method models have been shown to be susceptible to backdoor attacks. Adversarial training has been validated in image compression models as a common method to enhance model robustness. However, the improvement effect of adversarial training on model robustness is limited. In this paper, we propose a prior… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  11. arXiv:2403.03809  [pdf, ps, other

    eess.SP

    Variational Bayesian Learning based Joint Localization and Path Loss Exponent with Distance-dependent Noise in Wireless Sensor Network

    Authors: Yunfei Li, Yiting Luo, Weiqiang Tan, Chunguo Li, Shaodan Ma, Guanghua Yang

    Abstract: This paper focuses on the challenge of jointly optimizing location and path loss exponent (PLE) in distance-dependent noise. Departing from the conventional independent noise model used in localization and path loss exponent estimation problems, we consider a more realistic model incorporating distance-dependent noise variance, as revealed in recent theoretical analyses and experimental results. T… ▽ More

    Submitted 20 July, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  12. arXiv:2402.01172  [pdf, other

    cs.CL cs.SD eess.AS

    Streaming Sequence Transduction through Dynamic Compression

    Authors: Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, Philipp Koehn

    Abstract: We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrat… ▽ More

    Submitted 21 May, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: IWSLT 2025

  13. arXiv:2304.13583  [pdf, other

    eess.IV cs.CV

    Multi-Modality Deep Network for Extreme Learned Image Compression

    Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

    Abstract: Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior i… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 13 pages, 14 figures, accepted by AAAI 2023

  14. arXiv:2303.06032  [pdf, other

    cs.LG cs.CR cs.CV eess.IV

    Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

    Authors: Justus Renkhoff, Wenkai Tan, Alvaro Velasquez, illiam Yichen Wang, Yongxin Liu, Jian Wang, Shuteng Niu, Lejla Begic Fazlic, Guido Dartmann, Houbing Song

    Abstract: Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input ima… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  15. arXiv:2209.13112  [pdf, other

    eess.AS cs.SD

    Automated Sex Classification of Children's Voices and Changes in Differentiating Factors with Age

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Weiting Tan

    Abstract: Sex classification of children's voices allows for an investigation of the development of secondary sex characteristics which has been a key interest in the field of speech analysis. This research investigated a broad range of acoustic features from scripted and spontaneous speech and applied a hierarchical clustering-based machine learning model to distinguish the sex of children aged between 5 a… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  16. arXiv:2207.06617  [pdf, other

    eess.IV cs.CV

    Perception-Oriented Stereo Image Super-Resolution

    Authors: Chenxi Ma, Bo Yan, Weimin Tan, Xuhao Jiang

    Abstract: Recent studies of deep learning based stereo image super-resolution (StereoSR) have promoted the development of StereoSR. However, existing StereoSR models mainly concentrate on improving quantitative evaluation metrics and neglect the visual quality of super-resolved stereo images. To improve the perceptual performance, this paper proposes the first perception-oriented stereo image super-resoluti… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 9 pages, 10 figures, ACM MM 2021

  17. arXiv:2206.14861  [pdf, other

    eess.IV cs.CV

    Two-Stage COVID19 Classification Using BERT Features

    Authors: Weijun Tan, Qi Yao, Jingfeng Liu

    Abstract: We propose an automatic COVID1-19 diagnosis framework from lung CT-scan slice images using double BERT feature extraction. In the first BERT feature extraction, A 3D-CNN is first used to extract CNN internal feature maps. Instead of using the global average pooling, a late BERT temporal pooing is used to aggregate the temporal information in these feature maps, followed by a classification layer.… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2106.14403

  18. arXiv:2206.11599  [pdf, other

    eess.IV cs.CV

    Universal Learned Image Compression With Low Computational Cost

    Authors: Bowen Li, Yao Xin, Youneng Bao, Fanyang Meng, Yongsheng Liang, Wen Tan

    Abstract: Recently, learned image compression methods have developed rapidly and exhibited excellent rate-distortion performance when compared to traditional standards, such as JPEG, JPEG2000 and BPG. However, the learning-based methods suffer from high computational costs, which is not beneficial for deployment on devices with limited resources. To this end, we propose shift-addition parallel modules (SAPM… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 5 pages

  19. arXiv:2205.02939  [pdf

    eess.SY

    Modelling Pre-fatigue, Low-velocity Impact and Fatigue behaviours of Composite Helicopter Tail Structures under Multipoint Coordinated Loading Spectrum

    Authors: Zheng-Qiang Cheng, Wei Tan, Jun-Jiang Xiong

    Abstract: This paper aims to numerically study the pre-fatigue, low-velocity impact (LVI) and fatigue progressive damage behaviours of a full-scale composite helicopter tail structure under multipoint coordinated loading spectrum. First, a fatigue progressive damage model (PDM) incorporating multiaxial fatigue residual strength degradation rule, fatigue failure criteria based on fatigue residual strength co… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 43 pages, 16 figures

  20. arXiv:2203.02158  [pdf, other

    eess.IV cs.AI cs.CV

    Transformations in Learned Image Compression from a Modulation Perspective

    Authors: Youneng Bao, Fangyang Meng, Wen Tan, Chao Li, Yonghong Tian, Yongsheng Liang

    Abstract: In this paper, a unified transformation method in learned image compression(LIC) is proposed from the perspective of modulation. Firstly, the quantization in LIC is considered as a generalized channel with additive uniform noise. Moreover, the LIC is interpreted as a particular communication system according to the consistency in structures and optimization objectives. Thus, the technology of comm… ▽ More

    Submitted 12 March, 2024; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: 10 pages, 8 figures

  21. arXiv:2202.04595  [pdf, other

    eess.IV cs.CV cs.LG

    Exploring Structural Sparsity in Neural Image Compression

    Authors: Shanzhi Yin, Chao Li, Wen Tan, Youneng Bao, Yongsheng Liang, Wei Liu

    Abstract: Neural image compression have reached or out-performed traditional methods (such as JPEG, BPG, WebP). However,their sophisticated network structures with cascaded convolution layers bring heavy computational burden for practical deployment. In this paper, we explore the structural sparsity in neural image compression network to obtain real-time acceleration without any specialized hardware design… ▽ More

    Submitted 11 March, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 6 pages, 5 figures, submitted to ICIP 2022

    MSC Class: 68U10(primary); 94A08 68T07(secondary) ACM Class: I.2.6; I.4.2

  22. arXiv:2108.07148  [pdf, other

    eess.IV cs.CV

    Data Augmentation and CNN Classification For Automatic COVID-19 Diagnosis From CT-Scan Images On Small Dataset

    Authors: Weijun Tan, Hongwei Guo

    Abstract: We present an automatic COVID1-19 diagnosis framework from lung CT images. The focus is on signal processing and classification on small datasets with efforts putting into exploring data preparation and augmentation to improve the generalization capability of the 2D CNN classification models. We propose a unique and effective data augmentation method using multiple Hounsfield Unit (HU) normalizati… ▽ More

    Submitted 30 September, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Journal ref: IEEE ICMLA 2021

  23. arXiv:2106.14403  [pdf, other

    eess.IV cs.CV

    A 3D CNN Network with BERT For Automatic COVID-19 Diagnosis From CT-Scan Images

    Authors: Weijun Tan, Jingfeng Liu

    Abstract: We present an automatic COVID1-19 diagnosis framework from lung CT-scan slice images. In this framework, the slice images of a CT-scan volume are first proprocessed using segmentation techniques to filter out images of closed lung, and to remove the useless background. Then a resampling method is used to select one or multiple sets of a fixed number of slice images for training and validation. A 3… ▽ More

    Submitted 4 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Journal ref: 2021 ICCV Workshops

  24. arXiv:2010.09268  [pdf, other

    eess.SP cs.LG

    DeepWiPHY: Deep Learning-based Receiver Design and Dataset for IEEE 802.11ax Systems

    Authors: Yi Zhang, Akash Doshi, Rob Liston, Wai-tian Tan, Xiaoqing Zhu, Jeffrey G. Andrews, Robert W. Heath

    Abstract: In this work, we develop DeepWiPHY, a deep learning-based architecture to replace the channel estimation, common phase error (CPE) correction, sampling rate offset (SRO) correction, and equalization modules of IEEE 802.11ax based orthogonal frequency division multiplexing (OFDM) receivers. We first train DeepWiPHY with a synthetic dataset, which is generated using representative indoor channel mod… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Journal paper (16 pages and 12 figures) to appear in IEEE Transactions on Wireless Communications

  25. arXiv:2007.12560  [pdf

    eess.SY cs.LG

    Adaptive Energy Management for Real Driving Conditions via Transfer Reinforcement Learning

    Authors: Teng Liu, Wenhao Tan, Xiaolin Tang, Jiaxin Chen, Dongpu Cao

    Abstract: This article proposes a transfer reinforcement learning (RL) based adaptive energy managing approach for a hybrid electric vehicle (HEV) with parallel topology. This approach is bi-level. The up-level characterizes how to transform the Q-value tables in the RL framework via driving cycle transformation (DCT). Especially, transition probability matrices (TPMs) of power request are computed for diff… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: 10 pages, 10 figures

  26. arXiv:2007.10880  [pdf

    eess.SP eess.SY

    Driving Conditions-Driven Energy Management for Hybrid Electric Vehicles: A Review

    Authors: Teng Liu, Wenhao Tan, Xiaolin Tang, Jinwei Zhang, Yang Xing, Dongpu Cao

    Abstract: Motivated by the concerns on transported fuel consumption and global air pollution, industrial engineers, and academic researchers have made many efforts to construct more efficient and environment-friendly vehicles. Hybrid electric vehicles (HEVs) are the representative ones because they can satisfy the power demand by coordinating energy supplements among different energy storage devices. To ach… ▽ More

    Submitted 1 August, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 42 pages, 12 figures, 6 tables

  27. arXiv:2007.08690  [pdf

    eess.SP cs.AI cs.LG

    Transfer Deep Reinforcement Learning-enabled Energy Management Strategy for Hybrid Tracked Vehicle

    Authors: Xiaowei Guo, Teng Liu, Bangbei Tang, Xiaolin Tang, Jinwei Zhang, Wenhao Tan, Shufeng Jin

    Abstract: This paper proposes an adaptive energy management strategy for hybrid electric vehicles by combining deep reinforcement learning (DRL) and transfer learning (TL). This work aims to address the defect of DRL in tedious training time. First, an optimization control modeling of a hybrid tracked vehicle is built, wherein the elaborate powertrain components are introduced. Then, a bi-level control fram… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: 11 pages, 11 figures

  28. arXiv:2007.08337  [pdf

    eess.SP cs.LG

    Transferred Energy Management Strategies for Hybrid Electric Vehicles Based on Driving Conditions Recognition

    Authors: Teng Liu, Xiaolin Tang, Jiaxin Chen, Hong Wang, Wenhao Tan, Yalian Yang

    Abstract: Energy management strategies (EMSs) are the most significant components in hybrid electric vehicles (HEVs) because they decide the potential of energy conservation and emission reduction. This work presents a transferred EMS for a parallel HEV via combining the reinforcement learning method and driving conditions recognition. First, the Markov decision process (MDP) and the transition probability… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: 6 pages, 5 figures

  29. arXiv:2004.11839  [pdf, other

    eess.SP cs.LG stat.ML

    Detecting Driver's Distraction using Long-term Recurrent Convolutional Network

    Authors: Chang Wei Tan, Mahsa Salehi, Geoffrey Mackellar

    Abstract: In this study we demonstrate a novel Brain Computer Interface (BCI) approach to detect driver distraction events to improve road safety. We use a commercial wireless headset that generates EEG signals from the brain. We collected real EEG signals from participants who undertook a 40-minute driving simulation and were required to perform different tasks while driving. These signals are segmented in… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: 3 pages 2 figures

  30. arXiv:1905.04709  [pdf, ps, other

    cs.MM cs.IT cs.SD eess.AS

    Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

    Authors: Gang Min, Changqing Zhang, Xiongwei Zhang, Wei Tan

    Abstract: Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for extracting the latent representing features (LRFs) of speech, which are then efficiently quantized by an analysis-by-synthesis vector quantization (AbS VQ) method. AbS… ▽ More

    Submitted 14 May, 2019; v1 submitted 12 May, 2019; originally announced May 2019.

  31. arXiv:1811.08762  [pdf

    cs.HC eess.SY

    Tablet-based Information System for Commercial Air-craft: Onboard Context-Sensitive Information System (OCSIS)

    Authors: Guy Andre Boy, Wei Tan

    Abstract: Pilots currently use paper-based documentation and electronic systems to help them perform procedures to ensure safety, efficiency and comfort on commercial aircrafts. Management of interconnections among paper-based operational documents can be a challenge for pilots, especially when time pressure is high in normal, abnormal, and emergency situations. This dissertation is a contribution to the de… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Journal ref: Proceedings of HCI International 2018, Jul 2018, Orlando, United States

  32. arXiv:1610.09606  [pdf, other

    eess.SY

    Impedance control of a cable-driven series elastic actuator with the 2-DOF control structure

    Authors: Wulin Zou, Zhuo Yang, Wen Tan, Meng Wang, Jingtai Liu, Ningbo Yu

    Abstract: Series elastic actuators (SEAs) are growingly important in physical human-robot interaction (HRI) due to their inherent safety and compliance. Cable-driven SEAs also allow flexible installation and remote torque transmission, etc. However, there are still challenges for the impedance control of cable-driven SEAs, such as the reduced bandwidth caused by the elastic component, and the performance ba… ▽ More

    Submitted 30 October, 2016; originally announced October 2016.

    Comments: 6 pages, IROS2016, Accepted