Skip to main content

Showing 1–40 of 40 results for author: Yin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.05451  [pdf

    eess.IV cs.CV eess.SP

    Self-supervised Deep Learning for Denoising in Ultrasound Microvascular Imaging

    Authors: Lijie Huang, Jingyi Yin, Jingke Zhang, U-Wai Lok, Ryan M. DeRuiter, Jieyang Jin, Kate M. Knoll, Kendra E. Petersen, James D. Krier, Xiang-yang Zhu, Gina K. Hesley, Kathryn A. Robinson, Andrew J. Bentall, Thomas D. Atwell, Andrew D. Rule, Lilach O. Lerman, Shigao Chen, Chengwu Huang

    Abstract: Ultrasound microvascular imaging (UMI) is often hindered by low signal-to-noise ratio (SNR), especially in contrast-free or deep tissue scenarios, which impairs subsequent vascular quantification and reliable disease diagnosis. To address this challenge, we propose Half-Angle-to-Half-Angle (HA2HA), a self-supervised denoising framework specifically designed for UMI. HA2HA constructs training pairs… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 12 pages, 10 figures. Supplementary materials are available at https://zenodo.org/records/15832003

  2. arXiv:2505.09616  [pdf, other

    cs.SD cs.AI eess.AS

    SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

    Authors: Yuqi Li, Yuanzhong Zheng, Zhongtian Guo, Yaoxuan Wang, Jianjun Yin, Haojun Fei

    Abstract: This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2Vec2 for feature extraction and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and empha… ▽ More

    Submitted 10 January, 2025; originally announced May 2025.

    Comments: 2 pages,3 figures,1 chart

    MSC Class: I.2.0

  3. arXiv:2503.02046  [pdf, other

    eess.AS cs.SD

    CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge

    Authors: Jun Yin, Marian Verhelst

    Abstract: Robust sound source localization for environments with noise and reverberation are increasingly exploiting deep neural networks fed with various acoustic features. Yet, state-of-the-art research mainly focuses on optimizing algorithmic accuracy, resulting in huge models preventing edge-device deployment. The edge, however, urges for real-time low-footprint acoustic reasoning for applications such… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Journal ref: ACM Transactions on Embedded Computing Systems, 2023, 22(3): 1-27

  4. A strictly predefined-time convergent and anti-noise fractional-order zeroing neural network for solving time-variant quadratic programming in kinematic robot control

    Authors: Yi Yang, Xiao Li, Xuchen Wang, Mei Liu, Junwei Yin, Weibing Li, Richard M. Voyles, Xin Ma

    Abstract: This paper proposes a strictly predefined-time convergent and anti-noise fractional-order zeroing neural network (SPTC-AN-FOZNN) model, meticulously designed for addressing time-variant quadratic programming (TVQP) problems. This model marks the first variable-gain ZNN to collectively manifest strictly predefined-time convergence and noise resilience, specifically tailored for kinematic motion con… ▽ More

    Submitted 22 February, 2025; originally announced March 2025.

    Comments: 13 pages, 10 figures; as accepted for publication

    Journal ref: Neural Networks (2025) 107279

  5. arXiv:2502.15006  [pdf, ps, other

    cs.RO cs.AI eess.SY

    Safe Beyond the Horizon: Efficient Sampling-based MPC with Neural Control Barrier Functions

    Authors: Ji Yin, Oswin So, Eric Yang Yu, Chuchu Fan, Panagiotis Tsiotras

    Abstract: A common problem when using model predictive control (MPC) in practice is the satisfaction of safety specifications beyond the prediction horizon. While theoretical works have shown that safety can be guaranteed by enforcing a suitable terminal set constraint or a sufficiently long prediction horizon, these techniques are difficult to apply and thus are rarely used by practitioners, especially in… ▽ More

    Submitted 8 July, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by RSS 2025

  6. arXiv:2412.03749  [pdf

    physics.med-ph eess.SP physics.bio-ph

    Electrically functionalized body surface for deep-tissue bioelectrical recording

    Authors: Dehui Zhang, Yucheng Zhang, Dong Xu, Shaolei Wang, Kaidong Wang, Boxuan Zhou, Yansong Ling, Yang Liu, Qingyu Cui, Junyi Yin, Enbo Zhu, Xun Zhao, Chengzhang Wan, Jun Chen, Tzung K. Hsiai, Yu Huang, Xiangfeng Duan

    Abstract: Directly probing deep tissue activities from body surfaces offers a noninvasive approach to monitoring essential physiological processes1-3. However, this method is technically challenged by rapid signal attenuation toward the body surface and confounding motion artifacts4-6 primarily due to excessive contact impedance and mechanical mismatch with conventional electrodes. Herein, by formulating an… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  7. arXiv:2409.12139  [pdf, other

    cs.SD cs.AI eess.AS

    Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

    Authors: Sijing Chen, Yuan Feng, Laipeng He, Tianwei He, Wendi He, Yanni Hu, Bin Lin, Yiting Lin, Yu Pan, Pengfei Tan, Chengwei Tian, Chen Wang, Zhicheng Wang, Ruoye Xie, Jixun Yao, Quanlei Yan, Yuguang Yang, Jianhao Ye, Jingjing Yin, Yanzhen Yu, Huimin Zhang, Xiang Zhang, Guangcheng Zhao, Hongbin Zhou, Pengpeng Zou

    Abstract: With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-… ▽ More

    Submitted 23 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: Technical Report; 18 pages; typos corrected, references added, demo url modified, author name modified;

  8. arXiv:2409.09910  [pdf

    eess.IV

    Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

    Authors: Guangrui Ding, Chang Liu, Jiaze Yin, Xinyan Teng, Yuying Tan, Hongjian He, Haonan Lin, Lei Tian, Ji-Xin Cheng

    Abstract: Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  9. arXiv:2409.06196  [pdf, other

    cs.SD cs.LG eess.AS

    MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection

    Authors: Zehao Wang, Haobo Yue, Zhicheng Zhang, Da Mu, Jin Tang, Jianqin Yin

    Abstract: Sound Event Detection (SED) plays a vital role in comprehending and perceiving acoustic scenes. Previous methods have demonstrated impressive capabilities. However, they are deficient in learning features of complex scenes from heterogeneous dataset. In this paper, we introduce a novel dual-branch architecture named Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event… ▽ More

    Submitted 11 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Submit to Icassp2025

  10. arXiv:2409.00565  [pdf, other

    cs.LG cs.CV eess.SP

    Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

    Authors: Yangfan Deng, Hamad Albidah, Ahmed Dallal, Jijun Yin, Zhi-Hong Mao

    Abstract: Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to impr… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  11. arXiv:2408.05057  [pdf, other

    cs.SD cs.AI eess.AS

    SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation

    Authors: Da Mu, Zhicheng Zhang, Haobo Yue, Zehao Wang, Jin Tang, Jianqin Yin

    Abstract: In the Sound Event Localization and Detection (SELD) task, Transformer-based models have demonstrated impressive capabilities. However, the quadratic complexity of the Transformer's self-attention mechanism results in computational inefficiencies. In this paper, we propose a network architecture for SELD called SELD-Mamba, which utilizes Mamba, a selective state-space model. We adopt the Event-Ind… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2407.14894  [pdf, other

    eess.SY

    A Holistic Optimization Framework for Energy Efficient UAV-assisted Fog Computing: Attitude Control, Trajectory Planning and Task Assignment

    Authors: Shuaijun Liu, Jinqiu Du, Yaxin Zheng, Jiaying Yin, Yuhui Deng, Jingjin Wu

    Abstract: Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. Ou… ▽ More

    Submitted 5 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  13. arXiv:2407.11333  [pdf, other

    cs.RO cs.SD eess.AS

    Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

    Authors: Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

    Abstract: We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2406.07918  [pdf, other

    eess.IV

    Micro-expression recognition based on depth map to point cloud

    Authors: Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang

    Abstract: Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these ap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. ACCO: Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators

    Authors: Jun Yin, Linyan Mei, Andre Guntoro, Marian Verhelst

    Abstract: Spatio-Temporal Convolutional Neural Networks (ST-CNN) allow extending CNN capabilities from image processing to consecutive temporal-pattern recognition. Generally, state-of-the-art (SotA) ST-CNNs inflate the feature maps and weights from well-known CNN backbones to represent the additional time dimension. However, edge computing applications would suffer tremendously from such large computation… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Journal ref: 2023 IEEE 41st International Conference on Computer Design (ICCD), Washington, DC, USA, 2023, pp. 391-398

  16. arXiv:2404.14712  [pdf, other

    physics.ao-ph cs.AI cs.DC eess.IV physics.geo-ph

    ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

    Authors: Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

    Abstract: Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitati… ▽ More

    Submitted 19 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  17. arXiv:2401.04976  [pdf, other

    eess.AS cs.SD

    Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

    Authors: Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang

    Abstract: Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is used to model the frequency dependency of sound events. In this paper, we proposed the first full-dynamic method named full-frequency dynamic convolution… ▽ More

    Submitted 21 August, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted by ICPR2024

  18. arXiv:2401.02046  [pdf, other

    eess.AS cs.SD

    CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

    Authors: Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

    Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method th… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: accepted by ASRU 2023

  19. arXiv:2312.15863  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

    Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiangjin Yin

    Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

  20. arXiv:2309.16247  [pdf, other

    eess.AS cs.SD

    PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

    Authors: Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

    Abstract: Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personal… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  21. arXiv:2309.09262  [pdf, other

    eess.AS cs.SD

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Authors: Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  22. arXiv:2309.03686  [pdf, other

    eess.IV cs.CV

    MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical Image Segmentation with Small Training Data

    Authors: Haoyuan Chen, Yufei Han, Pin Xu, Yanyi Li, Kuan Li, Jianping Yin

    Abstract: Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  23. arXiv:2309.02124  [pdf, other

    cs.LG eess.SP

    Exploiting Spatial-temporal Data for Sleep Stage Classification via Hypergraph Learning

    Authors: Yuze Liu, Ziming Zhao, Tiehua Zhang, Kang Wang, Xin Chen, Xiaowei Huang, Jun Yin, Zhishu Shen

    Abstract: Sleep stage classification is crucial for detecting patients' health conditions. Existing models, which mainly use Convolutional Neural Networks (CNN) for modelling Euclidean data and Graph Convolution Networks (GNN) for modelling non-Euclidean data, are unable to consider the heterogeneity and interactivity of multimodal data as well as the spatial-temporal correlation simultaneously, which hinde… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  24. arXiv:2308.04025  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

    Authors: Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, Jingjing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

    Abstract: Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world. While current studies primarily focus on recognition and generalization abilities, our research pioneers an investigation into the reliability of SER methods in the presence of semantic data shifts and explores how to exert fine-gra… ▽ More

    Submitted 22 March, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages

  25. arXiv:2303.08636  [pdf, other

    eess.AS cs.SD

    HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism

    Authors: Yuguang Yang, Yu Pan, Jingjing Yin, Jiangyu Han, Lei Ma, Heng Lu

    Abstract: SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR). However, its inference speed suffers from the quadratic complexity of softmax-attention (SA). In addition, limited by the large convolution kernel size, the local modeling ability of SqueezeFormer is insufficient. In this paper, we propose a novel method HybridFormer to improve SqueezeFormer in a fast an… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP2023

  26. arXiv:2302.12186  [pdf, other

    cs.CV eess.IV

    RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

    Authors: Jingxia Jiang, Jinbin Bai, Yun Liu, Junjie Yin, Sixiang Chen, Tian Ye, Erkang Chen

    Abstract: Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles. To address this issue, we propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images. Specifically, our proposed conditional network is designed w… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  27. arXiv:2302.11719  [pdf, other

    cs.RO eess.SY

    Shield Model Predictive Path Integral: A Computationally Efficient Robust MPC Approach Using Control Barrier Functions

    Authors: Ji Yin, Charles Dawson, Chuchu Fan, Panagiotis Tsiotras

    Abstract: Model Predictive Path Integral (MPPI) control is a type of sampling-based model predictive control that simulates thousands of trajectories and uses these trajectories to synthesize optimal controls on-the-fly. In practice, however, MPPI encounters problems limiting its application. For instance, it has been observed that MPPI tends to make poor decisions if unmodeled dynamics or environmental dis… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 8 pages, 7 figures. Submitted to RA-L for review

  28. arXiv:2301.12808  [pdf, other

    eess.AS

    Real-Time Acoustic Perception for Automotive Applications

    Authors: Jun Yin, Stefano Damiano, Marian Verhelst, Toon van Waterschoot, Andre Guntoro

    Abstract: In recent years the automotive industry has been strongly promoting the development of smart cars, equipped with multi-modal sensors to gather information about the surroundings, in order to aid human drivers or make autonomous decisions. While the focus has mostly been on visual sensors, also acoustic events are crucial to detect situations that require a change in the driving behavior, such as a… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  29. arXiv:2212.02099  [pdf, other

    eess.AS cs.SD

    LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

    Authors: Yuguang Yang, Yu Pan, Jingjing Yin, Heng Lu

    Abstract: This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC). It contains a kernelized linear attention (LA) module called LMLA to solve the time-consuming problem for long sequence speech recognition as well as an alternative to the FFN structure. First, the ELU function is adopted as the kernel function of our proposed LA module. Second, we propose a novel Le… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: NCMMSC2022

  30. arXiv:2209.12842  [pdf, other

    cs.RO eess.SY

    Risk-Aware Model Predictive Path Integral Control Using Conditional Value-at-Risk

    Authors: Ji Yin, Zhiyuan Zhang, Panagiotis Tsiotras

    Abstract: In this paper, we present a novel Model Predictive Control method for autonomous robots subject to arbitrary forms of uncertainty. The proposed Risk-Aware Model Predictive Path Integral (RA-MPPI) control utilizes the Conditional Value-at-Risk (CVaR) measure to generate optimal control actions for safety-critical robotic applications. Different from most existing Stochastic MPCs and CVaR optimizati… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 7 pages, 7 figures

  31. arXiv:2209.01996  [pdf, other

    cs.SD cs.CL eess.AS

    Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation

    Authors: Peining Zhang, Junliang Guo, Linli Xu, Mu You, Junming Yin

    Abstract: We consider a novel task of automatically generating text descriptions of music. Compared with other well-established text generation tasks such as image caption, the scarcity of well-paired music and text datasets makes it a much more challenging task. In this paper, we exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text de… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  32. arXiv:2205.04326  [pdf, other

    eess.IV cs.CV cs.LG

    Deeply Supervised Skin Lesions Diagnosis with Stage and Branch Attention

    Authors: Wei Dai, Rui Liu, Tianyi Wu, Min Wang, Jianqin Yin, Jun Liu

    Abstract: Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical t… ▽ More

    Submitted 23 August, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: 11 pages, 9 figures

  33. arXiv:2202.04855  [pdf, ps, other

    eess.AS cs.CL cs.SD

    The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

    Authors: Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

    Abstract: We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberan… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

  34. arXiv:2105.08630  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu , et al. (13 additional authors not shown)

    Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

  35. arXiv:2102.02620  [pdf, other

    eess.SY

    Electricity-gas integrated energy system optimal operation in typical scenario of coal district considering hydrogen heavy trucks

    Authors: Junjie Yin, Jianhua Wang, Jun You

    Abstract: The coal industry contributes significantly to the social economy, but the emission of greenhouse gases puts huge pressure on the environment in the process of mining, transportation, and power generation. In the integrated energy system (IES), the current research about the power-to-gas (P2G) technology mainly focuses on the injection of hydrogen generated from renewable energy electrolyzed water… ▽ More

    Submitted 7 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 10 pages, 7 figures

  36. arXiv:2012.10732  [pdf, other

    eess.AS cs.SD

    DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

    Authors: Huixiang Huang, Renjie Wu, Jingbiao Huang, Jucai Lin, Jun Yin

    Abstract: Generative adversarial network (GAN) still exists some problems in dealing with speech enhancement (SE) task. Some GAN-based systems adopt the same structure from Pixel-to-Pixel directly without special optimization. The importance of the generator network has not been fully explored. Other related researches change the generator network but operate in the time-frequency domain, which ignores the… ▽ More

    Submitted 7 March, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

  37. arXiv:2008.11988  [pdf, other

    cs.CV cs.LG eess.IV

    Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

    Authors: Guang Yu, Siqi Wang, Zhiping Cai, En Zhu, Chuanfu Xu, Jianping Yin, Marius Kloft

    Abstract: As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps: (1) They cannot localize video activities in a both precise and comprehensive manner. (2) They lack sufficient abilities to utilize high-level seman… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: To be published as an oral paper in Proceedings of the 28th ACM International Conference on Multimedia (ACM MM '20). 9 pages, 7 figures

  38. arXiv:2002.12135  [pdf, other

    cs.LG eess.SP stat.ML

    Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting

    Authors: Qiquan Shi, Jiaming Yin, Jiajun Cai, Andrzej Cichocki, Tatsuya Yokota, Lei Chen, Mingxuan Yuan, Jia Zeng

    Abstract: This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) i… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted by AAAI 2020

  39. arXiv:2002.11806  [pdf, other

    cs.IT eess.SP

    Massive MIMO Asymptotics for Ray-Based Propagation Channels

    Authors: Shuang Li, Peter Smith, Pawel Dmochowski, Harsh Tataria, Michail Matthaiou, Jingwei Yin

    Abstract: Favorable propagation (FP) and channel hardening (CH) are desired properties in massive multiple-input multiple-output (MIMO) systems. To date, these properties have primarily been analyzed for classical \textit{statistical} channel models, or \textit{ray-based} models with very specific angular parameters and distributions. This paper presents a thorough mathematical analysis of the asymptotic sy… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  40. arXiv:1906.00884  [pdf, other

    cs.CV eess.IV

    Fashion Editing with Adversarial Parsing Learning

    Authors: Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

    Abstract: Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value. Existing works often treat it as a general inpainting task and do not fully leverage the semantic structural information in fashion images. Moreover, they directly utilize conventional convolution and normalization layers to re… ▽ More

    Submitted 28 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 22 pages, 18 figures