Skip to main content

Showing 1–50 of 50 results for author: Dong, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.01326  [pdf, ps, other

    eess.IV cs.CV

    Structure and Smoothness Constrained Dual Networks for MR Bias Field Correction

    Authors: Dong Liang, Xingyu Qiu, Yuzhen Li, Wei Wang, Kuanquan Wang, Suyu Dong, Gongning Luo

    Abstract: MR imaging techniques are of great benefit to disease diagnosis. However, due to the limitation of MR devices, significant intensity inhomogeneity often exists in imaging results, which impedes both qualitative and quantitative medical analysis. Recently, several unsupervised deep learning-based models have been proposed for MR image improvement. However, these models merely concentrate on global… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 11 pages, 3 figures, accepted by MICCAI

    Journal ref: International conference on medical image computing and computer assisted intervention, 2025 AND COMPUTER ASSISTED INTERVENTION

  2. arXiv:2506.12537  [pdf, ps, other

    cs.CL cs.AI eess.AS

    Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction

    Authors: Xiaoran Fan, Zhichao Sun, Yangfan Gao, Jingfei Xiong, Hang Yan, Yifei Cao, Jiajun Sun, Shuo Li, Zhihao Zhang, Zhiheng Xi, Yuhao Zhou, Senjie Jin, Changhao Jiang, Junjie Ye, Ming Zhang, Rui Zheng, Zhenhua Han, Yunke Zhang, Demei Yan, Shaokang Dong, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the impact of key components (i.e., speech tokenizers, speech heads, and speaker modeling) on the performance of LLM-centric SLMs. We… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  3. arXiv:2503.07471  [pdf, other

    eess.SP

    Utilizing High Sampling Rate ADCs for Cost Efficient MIMO Radios

    Authors: Agrim Gupta, Shenggang Dong, Mehmet Mert Sahin, Younghan Nam, Frederik J. Harris, Dinesh Bharadia

    Abstract: In the past decade, $>$1 Gsps ADCs have become commonplace and are used in many modern 5G base station chips. A major driving force behind this adoption is the benefits of digital up/down-conversion and improved digital filtering. Recent works have also advocated for utilizing this high sampling bandwidth to fit-in multiple MIMO streams, and reduce the number of ADCs required to build MIMO base-st… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 8 pages, 12 figures

  4. arXiv:2502.00139  [pdf, other

    cs.IT eess.SP

    Beamforming with Joint Phase and Time Array: System Design, Prototyping and Performance

    Authors: Jianhua Mo, Ahmad AlAmmouri, Shenggang Dong, Younghan Nam, Won-Suk Choi, Gary Xu, Jianzhong, Zhan

    Abstract: Joint phase-time arrays (JPTA) is a new mmWave radio frequency front-end architecture constructed with appending time-delay elements to phase shifters for analog beamforming. JPTA allows the mmWave base station (BS) to form multiple frequency-dependent beams with a single RF chain, exploiting the extra degrees of freedom the time-delay elements offer. Without requiring extra power-hungry RF chains… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

    Comments: Presented at Asilomar Conference on Signals, Systems, and Computers 2024

  5. arXiv:2501.16327  [pdf, other

    cs.CL cs.SD eess.AS

    LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

    Authors: Heting Gao, Hang Shao, Xiong Wang, Chaofan Qiu, Yunhang Shen, Siqi Cai, Yuchen Shi, Zihan Xu, Zuwei Long, Yike Zhang, Shaoqi Dong, Chaoyou Fu, Ke Li, Long Ma, Xing Sun

    Abstract: The film Her features Samantha, a sophisticated AI audio agent who is capable of understanding both linguistic and paralinguistic information in human speech and delivering real-time responses that are natural, informative and sensitive to emotional subtleties. Moving one step toward more sophisticated audio agent from recent advancement in end-to-end (E2E) speech systems, we propose LUCY, a E2E s… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: Demo Link: https://github.com/VITA-MLLM/LUCY

  6. arXiv:2501.15588  [pdf, other

    eess.IV cs.CV

    Tumor Detection, Segmentation and Classification Challenge on Automated 3D Breast Ultrasound: The TDSC-ABUS Challenge

    Authors: Gongning Luo, Mingwang Xu, Hongyu Chen, Xinjie Liang, Xing Tao, Dong Ni, Hyunsu Jeong, Chulhong Kim, Raphael Stock, Michael Baumgartner, Yannick Kirchhoff, Maximilian Rokuss, Klaus Maier-Hein, Zhikai Yang, Tianyu Fan, Nicolas Boutry, Dmitry Tereshchenko, Arthur Moine, Maximilien Charmetant, Jan Sauer, Hao Du, Xiang-Hui Bai, Vipul Pai Raikar, Ricardo Montoya-del-Angel, Robert Marti , et al. (12 additional authors not shown)

    Abstract: Breast cancer is one of the most common causes of death among women worldwide. Early detection helps in reducing the number of deaths. Automated 3D Breast Ultrasound (ABUS) is a newer approach for breast screening, which has many advantages over handheld mammography such as safety, speed, and higher detection rate of breast cancer. Tumor detection, segmentation, and classification are key componen… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  7. arXiv:2412.13365  [pdf, other

    cs.AI cs.HC eess.SY

    Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction

    Authors: Shuyang Dong, Meiyi Ma, Josephine Lamp, Sebastian Elbaum, Matthew B. Dwyer, Lu Feng

    Abstract: There is a growing trend toward AI systems interacting with humans to revolutionize a range of application domains such as healthcare and transportation. However, unsafe human-machine interaction can lead to catastrophic failures. We propose a novel approach that predicts future states by accounting for the uncertainty of human interaction, monitors whether predictions satisfy or violate safety re… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  8. arXiv:2410.19288  [pdf, other

    eess.IV cs.CV cs.LG

    A Flow-based Truncated Denoising Diffusion Model for Super-resolution Magnetic Resonance Spectroscopic Imaging

    Authors: Siyuan Dong, Zhuotong Cai, Gilbert Hangel, Wolfgang Bogner, Georg Widhalm, Yaqing Huang, Qinghao Liang, Chenyu You, Chathura Kumaragamage, Robert K. Fulbright, Amit Mahajan, Amin Karbasi, John A. Onofrey, Robin A. de Graaf, James S. Duncan

    Abstract: Magnetic Resonance Spectroscopic Imaging (MRSI) is a non-invasive imaging technique for studying metabolism and has become a crucial tool for understanding neurological diseases, cancers and diabetes. High spatial resolution MRSI is needed to characterize lesions, but in practice MRSI is acquired at low resolution due to time and sensitivity restrictions caused by the low metabolite concentrations… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: Accepted by Medical Image Analysis (MedIA)

    Journal ref: Medical Image Analysis (2024): 103358

  9. arXiv:2410.12142   

    cs.RO eess.SY

    Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control

    Authors: Kris Shengjun Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Christopher Fletcher, Yakun Sophia Shao

    Abstract: Empowering resource-limited robots to execute computationally intensive tasks such as locomotion and manipulation is challenging. This project provides a comprehensive design space exploration to determine optimal hardware computation architectures suitable for model-based control algorithms. We profile and optimize representative architectural designs across general-purpose scalar, vector process… ▽ More

    Submitted 24 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: This submission has been withdrawn following further internal review and discussions with collaborators, as it was determined that the current version does not meet our intended standards, and will not be updated further. This decision aligns with internal changes and agreements that were finalized post-submission

  10. arXiv:2408.07866  [pdf, other

    eess.SY

    Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

    Authors: Jingqi Li, Donggun Lee, Jaewon Lee, Kris Shengjun Dong, Somayeh Sojoudi, Claire Tomlin

    Abstract: We propose a new reachability learning framework for high-dimensional nonlinear systems, focusing on reach-avoid problems. These problems require computing the reach-avoid set, which ensures that all its elements can safely reach a target set despite disturbances within pre-specified bounds. Our framework has two main parts: offline learning of a newly designed reachavoid value function, and post-… ▽ More

    Submitted 15 February, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

  11. arXiv:2406.09664  [pdf, other

    cs.SD eess.AS

    Frequency-mix Knowledge Distillation for Fake Speech Detection

    Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

    Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  12. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  14. arXiv:2402.18076  [pdf, other

    eess.SY

    Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator

    Authors: Xi Luo, Shiying Dong, Jinlong Hong, Bingzhao Gao, Hong Chen

    Abstract: This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 6 pages, 5 figures, submitted to 8th IFAC Conference on Nonlinear Model Predictive Control

  15. arXiv:2402.03048  [pdf, other

    cs.MA cs.LG eess.SY

    Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control under Switching Topologies

    Authors: Zewen Yang, Songbo Dong, Armin Lederer, Xiaobing Dai, Siyu Chen, Stefan Sosnowski, Georges Hattab, Sandra Hirche

    Abstract: This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty pre… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 8 pages

  16. arXiv:2311.06579  [pdf, other

    cs.RO eess.SY

    Five-Tiered Route Planner for Multi-AUV Accessing Fixed Nodes in Uncertain Ocean Environments

    Authors: Jiaxin Zhang, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong

    Abstract: This article introduces a five-tiered route planner for accessing multiple nodes with multiple autonomous underwater vehicles (AUVs) that enables efficient task completion in stochastic ocean environments. First, the pre-planning tier solves the single-AUV routing problem to find the optimal giant route (GR), estimates the number of required AUVs based on GR segmentation, and allocates nodes for e… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  17. arXiv:2310.01163  [pdf, other

    cs.RO eess.SY

    Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications

    Authors: Pian Yu, Shuyang Dong, Shili Sheng, Lu Feng, Marta Kwiatkowska

    Abstract: Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  18. arXiv:2309.04100  [pdf

    eess.IV cs.LG physics.med-ph

    Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI)

    Authors: Siyuan Dong, Henk M. De Feyter, Monique A. Thomas, Robin A. de Graaf, James S. Duncan

    Abstract: Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The C… ▽ More

    Submitted 13 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

  19. arXiv:2308.15930  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LLaSM: Large Language and Speech Model

    Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

    Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More

    Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  20. arXiv:2308.04013  [pdf, other

    eess.SY cs.IT

    Distributed Target Tracking with Fading Channels over Underwater Wireless Sensor Networks

    Authors: Miaoyi Tang, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong

    Abstract: This paper investigates the problem of distributed target tracking via underwater wireless sensor networks (UWSNs) with fading channels. The degradation of signal quality due to wireless channel fading can significantly impact network reliability and subsequently reduce the tracking accuracy. To address this issue, we propose a modified distributed unscented Kalman filter (DUKF) named DUKF-Fc, whi… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 6 tables

  21. arXiv:2306.15389  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection

    Authors: Shunbo Dong, Jun Xue, Cunhang Fan, Kang Zhu, Yujie Chen, Zhao Lv

    Abstract: In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the mode… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by DADA2023

  22. arXiv:2306.02231  [pdf, other

    cs.CL cs.AI cs.LG eess.SY

    Fine-Tuning Language Models with Advantage-Induced Policy Alignment

    Authors: Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  23. arXiv:2305.18771  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    SFCNeXt: a simple fully convolutional network for effective brain age estimation with small sample size

    Authors: Yu Fu, Yanyan Huang, Shunjie Dong, Yalin Wang, Tianbai Yu, Meng Niu, Cheng Zhuo

    Abstract: Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted by IEEE ISBI 2023

  24. Picking Up Quantization Steps for Compressed Image Classification

    Authors: Li Ma, Peixi Peng, Guangyao Chen, Yifan Zhao, Siwei Dong, Yonghong Tian

    Abstract: The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed im… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1884-1898, April 2023

  25. arXiv:2304.03708  [pdf, other

    eess.IV cs.CV

    Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

    Authors: Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu , et al. (5 additional authors not shown)

    Abstract: Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi… ▽ More

    Submitted 9 August, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  26. arXiv:2303.13463  [pdf, other

    cs.CL eess.AS

    W2KPE: Keyphrase Extraction with Word-Word Relation

    Authors: Wen Cheng, Shichen Dong, Wei Wang

    Abstract: This paper describes our submission to ICASSP 2023 MUG Challenge Track 4, Keyphrase Extraction, which aims to extract keyphrases most relevant to the conference theme from conference materials. We model the challenge as a single-class Named Entity Recognition task and developed techniques for better performance on the challenge: For the data preprocessing, we encode the split keyphrases after word… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  27. arXiv:2302.13893  [pdf, other

    eess.SY

    Electric Vehicle Sales Forecasting Model Considering Green Premium: A Chinese Market-based Perspective

    Authors: Zhi Li, Hang Fan, Shuyan Dong

    Abstract: "Green Premiums" which means the difference in cost between emissions-emitting technology and zero-emissions or emissions-reducing technology is significant for those renewable energy technology to address the climate change challenge facing the world in this century. China's Electrical Vehicles (EVs) industry is the first to cross the green premium into the commercialization stage, prompting its… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  28. RDFNet: Regional Dynamic FISTA-Net for Spectral Snapshot Compressive Imaging

    Authors: Shiyun Zhou, Tingfa Xu, Shaocong Dong, Jianan Li

    Abstract: Deep convolutional neural networks have recently shown promising results in compressive spectral reconstruction. Previous methods, however, usually adopt a single mapping function for sparse representation. Considering that different regions have distinct characteristics, it is desirable to apply various mapping functions to adjust different regions' transformations dynamically. With this in mind,… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: IEEE Transactions on Computational Imaging

  29. arXiv:2301.05781  [pdf, other

    eess.SY

    Analysis of November 21, 2021, Kaua`i Island Power System 18-20 Hz Oscillations

    Authors: Shuan Dong, Bin Wang, Jin Tan, Cameron J. Kruse, Brad W. Rockwell, Anderson Hoke

    Abstract: This letter discusses the 18-20 Hz oscillation event at 05:30 am on November 21, 2021, in Kaua`i's power system following the trip of an oil power plant. As far as the authors are aware, this is the first report of a transmission system-wide subsynchronous oscillation driven by inverter-based resources (though the system in question is relatively small). In this letter, we leverage two data-based… ▽ More

    Submitted 10 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

  30. arXiv:2211.08402  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Introducing Semantics into Speech Encoders

    Authors: Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang

    Abstract: Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 11 pages, 3 figures

  31. arXiv:2209.09413  [pdf, other

    eess.SY

    A Unified Analytical Method to Quantify Three Types of Fast Frequency Response from Inverter-based Resources

    Authors: Shuan Dong, Xin Fang, Jin Tan, Ningchao Gao, Xiaofan Cui, Anderson Hoke

    Abstract: With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic iner… ▽ More

    Submitted 25 August, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

  32. arXiv:2207.10181  [pdf, other

    eess.IV cs.CV cs.LG

    Flow-based Visual Quality Enhancer for Super-resolution Magnetic Resonance Spectroscopic Imaging

    Authors: Siyuan Dong, Gilbert Hangel, Eric Z. Chen, Shanhui Sun, Wolfgang Bogner, Georg Widhalm, Chenyu You, John A. Onofrey, Robin de Graaf, James S. Duncan

    Abstract: Magnetic Resonance Spectroscopic Imaging (MRSI) is an essential tool for quantifying metabolites in the body, but the low spatial resolution limits its clinical applications. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but the super-resolved images are often blurry compared to the experimentally-acquired high-resolution imag… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by DGM4MICCAI 2022

  33. arXiv:2206.08984  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-scale Super-resolution Magnetic Resonance Spectroscopic Imaging with Adjustable Sharpness

    Authors: Siyuan Dong, Gilbert Hangel, Wolfgang Bogner, Georg Widhalm, Karl Rössler, Siegfried Trattnig, Chenyu You, Robin de Graaf, John Onofrey, James Duncan

    Abstract: Magnetic Resonance Spectroscopic Imaging (MRSI) is a valuable tool for studying metabolic activities in the human body, but the current applications are limited to low spatial resolutions. The existing deep learning-based MRSI super-resolution methods require training a separate network for each upscaling factor, which is time-consuming and memory inefficient. We tackle this multi-scale super-reso… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted by MICCAI 2022

  34. arXiv:2206.04682  [pdf, other

    eess.IV cs.CV cs.LG

    RT-DNAS: Real-time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation

    Authors: Qing Lu, Xiaowei Xu, Shunjie Dong, Cong Hao, Lei Yang, Cheng Zhuo, Yiyu Shi

    Abstract: Accurately segmenting temporal frames of cine magnetic resonance imaging (MRI) is a crucial step in various real-time MRI guided cardiac interventions. To achieve fast and accurate visual assistance, there are strict requirements on the maximum latency and minimum throughput of the segmentation framework. State-of-the-art neural networks on this task are mostly hand-crafted to satisfy these constr… ▽ More

    Submitted 13 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  35. arXiv:2206.02838  [pdf, other

    eess.IV cs.CV cs.LG

    Invertible Sharpening Network for MRI Reconstruction Enhancement

    Authors: Siyuan Dong, Eric Z. Chen, Lin Zhao, Xiao Chen, Yikang Liu, Terrence Chen, Shanhui Sun

    Abstract: High-quality MRI reconstruction plays a critical role in clinical applications. Deep learning-based methods have achieved promising results on MRI reconstruction. However, most state-of-the-art methods were designed to optimize the evaluation metrics commonly used for natural images, such as PSNR and SSIM, whereas the visual quality is not primarily pursued. Compared to the fully-sampled images, t… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted by MICCAI 2022

  36. arXiv:2206.01369  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

    Authors: Chenyu You, Jinlin Xiang, Kun Su, Xiaoran Zhang, Siyuan Dong, John Onofrey, Lawrence Staib, James S. Duncan

    Abstract: Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, whi… ▽ More

    Submitted 30 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  37. arXiv:2205.02758  [pdf, other

    physics.soc-ph eess.SY

    Quantitative Measures for Integrating Resilience into Transportation Planning Practice: Study in Texas

    Authors: Cheng-Chun Lee, Akhil Rajput, Chia-Wei Hsu, Chao Fan, Faxi Yuan, Shangjia Dong, Amir Esmalian, Hamed Farahmand, Flavia Ioana Patrascu, Chia-Fu Liu, Bo Li, Junwei Ma, Ali Mostafavi

    Abstract: The objective of this study is to propose a system-level framework with quantitative measures to assess the resilience of road networks. The framework proposed in this paper can help transportation agencies incorporate resilience considerations into project development proactively and to understand the resilience performance of current road networks effectively. This study identified and implement… ▽ More

    Submitted 5 May, 2022; v1 submitted 4 April, 2022; originally announced May 2022.

  38. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  39. arXiv:2203.04911  [pdf, other

    cs.CL cs.SD eess.AS

    DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

    Authors: Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages… ▽ More

    Submitted 21 June, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  40. arXiv:2202.06548  [pdf, other

    eess.IV cs.LG

    A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

    Authors: Yu Fu, Shunjie Dong, Yi Liao, Le Xue, Yuanfan Xu, Feng Li, Qianqian Yang, Tianbai Yu, Mei Tian, Cheng Zhuo

    Abstract: 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  41. arXiv:2201.10737  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Class-Aware Adversarial Transformers for Medical Image Segmentation

    Authors: Chenyu You, Ruihan Zhao, Fenglin Liu, Siyuan Dong, Sandeep Chinchali, Ufuk Topcu, Lawrence Staib, James S. Duncan

    Abstract: Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale f… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

  42. arXiv:2112.05144  [pdf

    cs.CV eess.IV

    Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing

    Authors: Wujie Zhou, Shaohua Dong, Caie Xu, Yaguan Qian

    Abstract: RGB thermal scene parsing has recently attracted increasing research interest in the field of computer vision. However, most existing methods fail to perform good boundary extraction for prediction maps and cannot fully use high level features. In addition, these methods simply fuse the features from RGB and thermal modalities but are unable to obtain comprehensive fused features. To address these… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI2022

  43. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  44. arXiv:2102.11099  [pdf, other

    eess.IV cs.CV

    RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection

    Authors: Shunjie Dong, Qianqian Yang, Yu Fu, Mei Tian, Cheng Zhuo

    Abstract: The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is pref… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  45. arXiv:2007.06341  [pdf, other

    eess.IV cs.CV

    DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

    Authors: Shunjie Dong, Jinlong Zhao, Maojun Zhang, Zhengxue Shi, Jianing Deng, Yiyu Shi, Mei Tian, Cheng Zhuo

    Abstract: Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  46. arXiv:2006.09201  [pdf, other

    eess.SP cs.LG stat.ML

    A Hybrid Deep Learning Model for Predictive Flood Warning and Situation Awareness using Channel Network Sensors Data

    Authors: Shangjia Dong, Tianbo Yu, Hamed Farahmand, Ali Mostafavi

    Abstract: The objective of this study is to create and test a hybrid deep learning model, FastGRNN-FCN (Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network-Fully Convolutional Network), for urban flood prediction and situation awareness using channel network sensors data. The study used Harris County, Texas as the testbed, and obtained channel sensor data from three historical flood events (e.g.,… ▽ More

    Submitted 8 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  47. arXiv:2004.11848  [pdf

    cs.CV cs.LG eess.IV stat.ML

    Deep learning for smart fish farming: applications, opportunities and challenges

    Authors: Xinting Yang, Song Zhang, Jintao Liu, Qinfeng Gao, Shuanglin Dong, Chao Zhou

    Abstract: With the rapid emergence of deep learning (DL) technology, it has been successfully used in various fields including aquaculture. This change can create new opportunities and a series of challenges for information and data processing in smart fish farming. This paper focuses on the applications of DL in aquaculture, including live fish identification, species classification, behavioral analysis, f… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: 43 pages, 7 figures

    Journal ref: Reviews in aquaculture,2020

  48. arXiv:2002.03236  [pdf, other

    cs.RO eess.SY

    Tactile Dexterity: Manipulation Primitives with Tactile Feedback

    Authors: Francois R. Hogan, Jose Ballester, Siyuan Dong, Alberto Rodriguez

    Abstract: This paper develops closed-loop tactile controllers for dexterous robotic manipulation with a dual-palm robotic system. Tactile dexterity is an approach to dexterous manipulation that plans for robot/object interactions that render interpretable tactile information for control. We divide the role of tactile control into two goals: 1) control the contact state between the end-effector and the objec… ▽ More

    Submitted 30 April, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

  49. arXiv:1910.02860  [pdf, other

    cs.RO eess.IV eess.SY

    Cable Manipulation with a Tactile-Reactive Gripper

    Authors: Yu She, Shaoxiong Wang, Siyuan Dong, Neha Sunil, Alberto Rodriguez, Edward Adelson

    Abstract: Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical co… ▽ More

    Submitted 23 June, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Accepted to RSS 2020

  50. arXiv:1907.08769  [pdf, other

    eess.IV cs.CV cs.MM

    A Retina-inspired Sampling Method for Visual Texture Reconstruction

    Authors: Lin Zhu, Siwei Dong, Tiejun Huang, Yonghong Tian

    Abstract: Conventional frame-based camera is not able to meet the demand of rapid reaction for real-time applications, while the emerging dynamic vision sensor (DVS) can realize high speed capturing for moving objects. However, to achieve visual texture reconstruction, DVS need extra information apart from the output spikes. This paper introduces a fovea-like sampling method inspired by the neuron signal pr… ▽ More

    Submitted 20 July, 2019; originally announced July 2019.

    Comments: Published in ICME 2019