Skip to main content

Showing 1–50 of 227 results for author: Lin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.09558  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

    Authors: Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao

    Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.06291  [pdf, ps, other

    eess.SP cs.CE cs.HC cs.LG

    ALFEE: Adaptive Large Foundation Model for EEG Representation

    Authors: Wei Xiong, Junming Lin, Jiangtong Li, Jie Li, Changjun Jiang

    Abstract: While foundation models excel in text, image, and video domains, the critical biological signals, particularly electroencephalography(EEG), remain underexplored. EEG benefits neurological research with its high temporal resolution, operational practicality, and safety profile. However, low signal-to-noise ratio, inter-subject variability, and cross-paradigm differences hinder the generalization of… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 17pages, 17 figures

  3. arXiv:2504.20108  [pdf, other

    cs.LG eess.IV eess.SP

    Swapped Logit Distillation via Bi-level Teacher Alignment

    Authors: Stephen Ekaputra Limantoro, Jhe-Hao Lin, Chih-Yu Wang, Yi-Lung Tsai, Hong-Han Shuai, Ching-Chun Huang, Wen-Huang Cheng

    Abstract: Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its original distribution, which can possibly lead to incorrect predictions. In this article, we propose a logit-based distillation via swapped logit processing, name… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted to Multimedia Systems 2025

  4. arXiv:2504.12169  [pdf, other

    cs.CV eess.IV

    Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline

    Authors: Joanne Lin, Crispian Morris, Ruirui Lin, Fan Zhang, David Bull, Nantheera Anantrasirichai

    Abstract: Low-light conditions pose significant challenges for both human and machine annotation. This in turn has led to a lack of research into machine understanding for low-light images and (in particular) videos. A common approach is to apply annotations obtained from high quality datasets to synthetically created low light versions. In addition, these approaches are often limited through the use of unr… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  5. arXiv:2504.02628  [pdf, other

    eess.IV cs.CV

    Towards Computation- and Communication-efficient Computational Pathology

    Authors: Chu Han, Bingchao Zhao, Jiatai Lin, Shanshan Lyu, Longfei Wang, Tianpeng Deng, Cheng Lu, Changhong Liang, Hannah Y. Wen, Xiaojing Guo, Zhenwei Shi, Zaiyi Liu

    Abstract: Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis. This limitation severely compromises their clinical utility, especially in time-sensitive diagnostic scenarios and situations requiring efficient data transfer. To add… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  6. arXiv:2503.20215  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    Qwen2.5-Omni Technical Report

    Authors: Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, Junyang Lin

    Abstract: In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. To enable the streaming of multimodal information inputs, both audio and visual encoders utilize a block-wise processing approach. To synchronize the timest… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  7. arXiv:2503.12589  [pdf, other

    cs.SD eess.AS

    Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation

    Authors: Wupeng Wang, Zexu Pan, Jingru Lin, Shuai Wang, Haizhou Li

    Abstract: Speech separation seeks to isolate individual speech signals from a multi-talk speech mixture. Despite much progress, a system well-trained on synthetic data often experiences performance degradation on out-of-domain data, such as real-world speech mixtures. To address this, we introduce a novel context-aware, two-stage training scheme for speech separation models. In this training scheme, the con… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  8. arXiv:2503.06945  [pdf, other

    eess.IV cs.CV

    Dynamic Cross-Modal Feature Interaction Network for Hyperspectral and LiDAR Data Classification

    Authors: Junyan Lin, Feng Gap, Lin Qi, Junyu Dong, Qian Du, Xinbo Gao

    Abstract: Hyperspectral image (HSI) and LiDAR data joint classification is a challenging task. Existing multi-source remote sensing data classification methods often rely on human-designed frameworks for feature extraction, which heavily depend on expert knowledge. To address these limitations, we propose a novel Dynamic Cross-Modal Feature Interaction Network (DCMNet), the first framework leveraging a dyna… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE TGRS 2025

  9. arXiv:2503.02769  [pdf, other

    cs.SD cs.CL cs.HC eess.AS

    InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training

    Authors: Dingdong Wang, Jin Xu, Ruihang Chu, Zhifang Guo, Xiong Wang, Jincenzi Wu, Dongchao Yang, Shengpeng Ji, Junyang Lin

    Abstract: Recent advancements in speech large language models (SpeechLLMs) have attracted considerable attention. Nonetheless, current methods exhibit suboptimal performance in adhering to speech instructions. Notably, the intelligence of models significantly diminishes when processing speech-form input as compared to direct text-form input. Prior work has attempted to mitigate this semantic inconsistency b… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  10. arXiv:2502.20018  [pdf, other

    cs.RO cs.CV eess.IV

    Multi-Keypoint Affordance Representation for Functional Dexterous Grasping

    Authors: Fan Yang, Dongsheng Luo, Wenrui Chen, Jiacheng Lin, Junjie Cai, Kailun Yang, Zhiyong Li, Yaonan Wang

    Abstract: Functional dexterous grasping requires precise hand-object interaction, going beyond simple gripping. Existing affordance-based methods primarily predict coarse interaction regions and cannot directly constrain the grasping posture, leading to a disconnection between visual perception and manipulation. To address this issue, we propose a multi-keypoint affordance representation for functional dext… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: The source code and demo videos will be publicly available at https://github.com/PopeyePxx/MKA

  11. arXiv:2502.19692  [pdf

    eess.IV cs.CV

    A Residual Multi-task Network for Joint Classification and Regression in Medical Imaging

    Authors: Junji Lin, Yi Zhang, Yunyue Pan, Yuli Chen, Chengchang Pan, Honggang Qi

    Abstract: Detection and classification of pulmonary nodules is a challenge in medical image analysis due to the variety of shapes and sizes of nodules and their high concealment. Despite the success of traditional deep learning methods in image classification, deep networks still struggle to perfectly capture subtle changes in lung nodule detection. Therefore, we propose a residual multi-task network (Res-M… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  12. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  13. arXiv:2502.00376  [pdf, other

    cs.LG cs.HC eess.SP

    SSRepL-ADHD: Adaptive Complex Representation Learning Framework for ADHD Detection from Visual Attention Tasks

    Authors: Abdul Rehman, Ilona Heldal, Jerry Chun-Wei Lin

    Abstract: Self Supervised Representation Learning (SSRepL) can capture meaningful and robust representations of the Attention Deficit Hyperactivity Disorder (ADHD) data and have the potential to improve the model's performance on also downstream different types of Neurodevelopmental disorder (NDD) detection. In this paper, a novel SSRepL and Transfer Learning (TL)-based framework that incorporates a Long Sh… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Journal ref: 2024 IEEE International Conference on Big Data (BigData)

  14. arXiv:2501.13250  [pdf, other

    eess.AS cs.SD

    Generative Data Augmentation Challenge: Synthesis of Room Acoustics for Speaker Distance Estimation

    Authors: Jackie Lin, Georg Götz, Hermes Sampedro Llopis, Haukur Hafsteinsson, Steinar Guðjónsson, Daniel Gert Nielsen, Finnur Pind, Paris Smaragdis, Dinesh Manocha, John Hershey, Trausti Kristjansson, Minje Kim

    Abstract: This paper describes the synthesis of the room acoustics challenge as a part of the generative data augmentation workshop at ICASSP 2025. The challenge defines a unique generative task that is designed to improve the quantity and diversity of the room impulse responses dataset so that it can be used for spatially sensitive downstream tasks: speaker distance estimation. The challenge identifies the… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted to the Workshop on Generative Data Augmentation at ICASSP 2025. Challenge website: https://sites.google.com/view/genda2025

  15. arXiv:2501.08853  [pdf, other

    eess.SY

    Achieving Stability and Optimality: Control Strategy for a Wind Turbine Supplying an Electrolyzer in the Islanded Storage-less Microgrid

    Authors: Bosen Yang, Kang Ma, Jin Lin, Mingjun Zhang, QiweiDuan, Zhendong Ji, Zhi Liu, Yonghua Song

    Abstract: Wind power generation supplying electrolyzers in islanded microgrids is an essential technical pathway for green hydrogen production, attracting growing attention in the transition towards net zero carbon emissions. Both academia and industry widely recognize that islanded AC microgrids normally rely on battery energy storage systems (BESSs) for grid-forming functions. However, the high cost of BE… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  16. arXiv:2501.04996  [pdf

    eess.IV cs.CV

    A CT Image Classification Network Framework for Lung Tumors Based on Pre-trained MobileNetV2 Model and Transfer learning, And Its Application and Market Analysis in the Medical field

    Authors: Ziyang Gao, Yong Tian, Shih-Chi Lin, Junghua Lin

    Abstract: In the medical field, accurate diagnosis of lung cancer is crucial for treatment. Traditional manual analysis methods have significant limitations in terms of accuracy and efficiency. To address this issue, this paper proposes a deep learning network framework based on the pre-trained MobileNetV2 model, initialized with weights from the ImageNet-1K dataset (version 2). The last layer of the model… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  17. arXiv:2501.02303  [pdf, other

    cs.RO eess.SP

    Design and Benchmarking of A Multi-Modality Sensor for Robotic Manipulation with GAN-Based Cross-Modality Interpretation

    Authors: Dandan Zhang, Wen Fan, Jialin Lin, Haoran Li, Qingzheng Cong, Weiru Liu, Nathan F. Lepora, Shan Luo

    Abstract: In this paper, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multi-modal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a `see-through-skin' mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Robotics

  18. arXiv:2412.18619  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM eess.AS

    Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

    Authors: Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee , et al. (2 additional authors not shown)

    Abstract: Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success. As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks f… ▽ More

    Submitted 29 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: 69 papes, 18 figures, repo at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

  19. arXiv:2412.18158  [pdf, other

    cs.CV eess.IV

    Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

    Authors: Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: While learned image compression methods have achieved impressive results in either human visual perception or machine vision tasks, they are often specialized only for one domain. This drawback limits their versatility and generalizability across scenarios and also requires retraining to adapt to new applications-a process that adds significant complexity and cost in real-world scenarios. In this… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  20. arXiv:2412.04861  [pdf, other

    cs.LG eess.SP

    MSECG: Incorporating Mamba for Robust and Efficient ECG Super-Resolution

    Authors: Jie Lin, I Chiu, Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Ping-Cheng Yeh, Yu Tsao

    Abstract: Electrocardiogram (ECG) signals play a crucial role in diagnosing cardiovascular diseases. To reduce power consumption in wearable or portable devices used for long-term ECG monitoring, super-resolution (SR) techniques have been developed, enabling these devices to collect and transmit signals at a lower sampling rate. In this study, we propose MSECG, a compact neural network model designed for EC… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: 5 pages, 3 figures

  21. arXiv:2411.05361  [pdf, other

    cs.CL eess.AS

    Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

    Authors: Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter-Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo , et al. (53 additional authors not shown)

    Abstract: Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluati… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  22. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander MÄ…dry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  23. arXiv:2410.03811  [pdf

    cs.HC eess.SY

    Enhanced Digital Twin for Human-Centric and Integrated Lighting Asset Management in Public Libraries: From Corrective to Predictive Maintenance

    Authors: Jing Lin, Jingchun Shen

    Abstract: Lighting asset management in public libraries has traditionally been reactive, focusing on corrective maintenance, addressing issues only when failures occur. Although standards now encourage preventive measures, such as incorporating a maintenance factor, the broader goal of human centric, sustainable lighting systems requires a shift toward predictive maintenance strategies. This study introduce… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  24. arXiv:2409.19283  [pdf, other

    eess.AS cs.SD

    Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

    Authors: Wenrui Liu, Zhifang Guo, Jin Xu, Yuanjun Lv, Yunfei Chu, Zhou Zhao, Junyang Lin

    Abstract: Building upon advancements in Large Language Models (LLMs), the field of audio processing has seen increased interest in training audio generation tasks with discrete audio token sequences. However, directly discretizing audio by neural audio codecs often results in sequences that fundamentally differ from text sequences. Unlike text, where text token sequences are deterministic, discrete audio to… ▽ More

    Submitted 4 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: e.g.: 15 pages, 4 figures

  25. arXiv:2409.14619  [pdf, other

    cs.SD eess.AS

    SongTrans: An unified song transcription and alignment method for lyrics and notes

    Authors: Siwei Wu, Jinzheng He, Ruibin Yuan, Haojie Wei, Xipin Wei, Chenghua Lin, Jin Xu, Junyang Lin

    Abstract: The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and accompaniment separation). Besides, most of these tools are designed to address a single task and struggle with aligning lyrics and notes (i.e., ident… ▽ More

    Submitted 10 October, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

  26. arXiv:2409.11494  [pdf, other

    eess.AS cs.SD

    M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

    Authors: Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli

    Abstract: The growing popularity of multi-channel wearable devices, such as smart glasses, has led to a surge of applications such as targeted speech recognition and enhanced hearing. However, current approaches to solve these tasks use independently trained models, which may not benefit from large amounts of unlabeled data. In this paper, we propose M-BEST-RQ, the first multi-channel speech foundation mode… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: In submission to IEEE ICASSP 2025

  27. arXiv:2409.05086  [pdf, other

    math.OC eess.SY

    Exploring the Optimal Size of Grid-forming Energy Storage in an Off-grid Renewable P2H System under Multi-timescale Energy Management

    Authors: Jie Zhu, Yiwei Qiu, Yangjun Zeng, Yi Zhou, Shi Chen, Tianlei Zang, Buxiang Zhou, Zhipeng Yu, Jin Lin

    Abstract: Utility-scale off-grid renewable power-to-hydrogen systems (OReP2HSs) typically include photovoltaic plants, wind turbines, electrolyzers (ELs), and energy storage systems. As an island system, OReP2HS requires at least one component, generally the battery energy storage system (BESS), that operates for grid-forming control to provide frequency and voltage references and regulate them through tran… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  28. arXiv:2409.02050  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model

    Authors: Hukai Huang, Jiayan Lin, Kaidi Wang, Yishuang Li, Wenhao Guan, Lin Li, Qingyang Hong

    Abstract: Due to the inherent difficulty in modeling phonetic similarities across different languages, code-switching speech recognition presents a formidable challenge. This study proposes a Collaborative-MoE, a Mixture of Experts (MoE) model that leverages a collaborative mechanism among expert groups. Initially, a preceding routing network explicitly learns Language Identification (LID) tasks and selects… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE SLT 2024

  29. arXiv:2408.12080  [pdf, other

    eess.SP cs.AI cs.NI

    Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning

    Authors: Max J. L. Lee, Ju Lin, Li-Ta Hsu

    Abstract: We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted at IPIN 2024. To be published in IEEE Xplore

  30. arXiv:2408.05777  [pdf, other

    cs.CV cs.AI eess.IV

    Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

    Authors: Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

    Abstract: Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method name… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  31. Precoding Based Downlink OAM-MIMO Communications with Rate Splitting

    Authors: Ruirui Chen, Jinyang Lin, Beibei Zhang, Yu Ding, Keyue Xu

    Abstract: Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON BROADCASTING, VOL. 69, NO. 4, DECEMBER 2023

  32. Cooperative Orbital Angular Momentum Wireless Communications

    Authors: Ruirui Chen, Wenchi Cheng, Jinyang Lin, Liping Liang

    Abstract: Orbital angular momentum (OAM) mode multiplexing has the potential to achieve high spectrum-efficiency communications at the same time and frequency by using orthogonal mode resource. However, the vortex wave hollow divergence characteristic results in the requirement of the large-scale receive antenna, which makes users hardly receive the OAM signal by size-limited equipment. To promote the OAM a… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 73, NO. 1, JANUARY 2024

  33. arXiv:2407.15358  [pdf, other

    eess.IV

    PRIME: Blind Multispectral Unmixing Using Virtual Quantum Prism and Convex Geometry

    Authors: Chia-Hsiang Lin, Jhao-Ting Lin

    Abstract: Multispectral unmixing (MU) is critical due to the inevitable mixed pixel phenomenon caused by the limited spatial resolution of typical multispectral images in remote sensing. However, MU mathematically corresponds to the underdetermined blind source separation problem, thus highly challenging, preventing researchers from tackling it. Previous MU works all ignore the underdetermined issue, and me… ▽ More

    Submitted 2 February, 2025; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: 16 pages, 10 figures

  34. arXiv:2407.12963  [pdf, other

    eess.IV

    Edge Projection-Based Adaptive View Selection for Cone-Beam CT

    Authors: Jingsong Lin, Singanallur Venkatakrishnan, Gregery Buzzard, Amir Koushyar Ziabari, Charles Bouman

    Abstract: Industrial cone-beam X-ray computed tomography (CT) scans of additively manufactured components produce a 3D reconstruction from projection measurements acquired at multiple predetermined rotation angles of the component about a single axis. Typically, a large number of projections are required to achieve a high-quality reconstruction, a process that can span several hours or days depending on the… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to 2024 Asilomar Conference on Signals, Systems, and Computers

  35. arXiv:2407.11578  [pdf, other

    cs.CV eess.IV

    UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

    Authors: Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo

    Abstract: This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and pla… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  36. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  37. ecVoice: Audio Text Extraction and Optimization of Video Based on Idioms Similarity Replacement

    Authors: Jinwei Lin

    Abstract: The Text Extraction of the Audio from the Video plays an important role in multimedia editing and processing. As a popular open source toolkit, Whisper performs fast in human voice recognition. However, the recognition performance is dependent on the computing resource, which makes the low computing memory running Whisper become difficult. Our paper presents an available solution to extract the hu… ▽ More

    Submitted 20 May, 2024; originally announced July 2024.

    Comments: APSIPA ASC 2023

  38. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  39. arXiv:2407.02826  [pdf, other

    eess.AS

    SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

    Authors: Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li

    Abstract: It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech. This motivates us to explore pre-training on mixture speech. This work presents SA-WavLM, a novel pre-trained model for mixture speech. Specifically,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  40. arXiv:2407.01536  [pdf, other

    eess.SY

    Beyond Profit: A Multi-Objective Framework for Electric Vehicle Charging Station Operations

    Authors: Shuoyao Wang, Jiawei Lin

    Abstract: This paper explores the pricing and scheduling strategies of the electric vehicle charging stations in response to the rising demand for cleaner transportation. Most of the existing methods focus on maximizing the energy efficiency or the charging station profit, however, the reputation of EVs is also a key factor for the long-term charging station operations. To address these gaps, we propose a n… ▽ More

    Submitted 12 March, 2024; originally announced July 2024.

    Comments: Accepted By VTC24-Spring

  41. arXiv:2406.14869  [pdf, other

    eess.SP

    Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

    Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, Jingran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

    Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  42. arXiv:2406.09989  [pdf, other

    q-bio.NC eess.SY

    Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

    Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, Jingzhe Lin, Jialin Wang, Quanying Liu

    Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  43. arXiv:2406.07437  [pdf, other

    cs.SD eess.AS

    Graph-based multi-Feature fusion method for speech emotion recognition

    Authors: Xueyu Liu, Jie Lin, Chao Wang

    Abstract: Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogen… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  44. arXiv:2406.01245  [pdf, other

    eess.IV

    Sparse Focus Network for Multi-Source Remote Sensing Data Classification

    Authors: Xuepeng Jin, Junyan Lin, Feng Gao, Lin Qi, Yang Zhou

    Abstract: Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  45. arXiv:2406.01235  [pdf, other

    eess.IV

    Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification

    Authors: Junyan Lin, Xuepeng Jin, Feng Gao, Junyu Dong, Hui Yu

    Abstract: Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IGARSS 2024

  46. arXiv:2406.00421  [pdf

    eess.SY

    Modal Analysis of Power System with High CIG Penetration Based on Impedance Models

    Authors: Le Zheng, Jiajie Zheng, Jiajian Lin, Chongru Liu

    Abstract: This paper explores the modal analysis of power systems with high Converter-Interfaced Generation (CIG) penetration utilizing an impedance-based modeling approach. Traditional modal analysis based on the state-space model (MASS) requires comprehensive control structures and parameters of each system element, a challenging prerequisite as converters increasingly integrate into power systems and the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  47. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R$^2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover… ▽ More

    Submitted 27 October, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to NeurIPS 2024. Project page: https://github.com/Ruyi-Zha/r2_gaussian

  48. arXiv:2405.13661  [pdf, ps, other

    cs.SD eess.AS

    Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

    Authors: Hong Zhang, Jie Lin, Shengxuan Chen

    Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  49. arXiv:2405.13636  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

    Authors: Jiaju Lin, Haoxuan Hu

    Abstract: Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem b… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  50. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables