Skip to main content

Showing 1–50 of 72 results for author: Shen, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22803  [pdf, ps, other

    cs.CV cs.HC cs.LG

    Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding

    Authors: Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Mei Lin, Peiyi Shen, Liang Zhang

    Abstract: Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for E… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  2. arXiv:2506.12577  [pdf, ps, other

    cs.CL

    OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases

    Authors: Yongrui Chen, Zhiqiang Liu, Jing Yu, Lin Ren, Nan Hu, Xinbang Dai, Jiajun Liu, Jiazhen Kang, Shenyu Zhang, Xinda Wang, Keyan Ding, Pengfei Shen, Haolei Zhu, Hongjie Deng, Yisong Wang, Tongtong Wu, Sheng Bi, Wen Zhang, Tianxing Wu, Qiu Ji, Haofen Wang, Wenliang Chen, Huajun Chen, Guilin Qi

    Abstract: Large Language Models (LLMs) have demonstrated substantial progress on reasoning tasks involving unstructured text, yet their capabilities significantly deteriorate when reasoning requires integrating structured external knowledge such as knowledge graphs, code snippets, or formal logic. This limitation is partly due to the absence of benchmarks capable of systematically evaluating LLM performance… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  3. arXiv:2505.13079  [pdf, ps, other

    eess.AS cs.AI

    Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Transferring linguistic knowledge from a pretrained language model (PLM) to acoustic feature learning has proven effective in enhancing end-to-end automatic speech recognition (E2E-ASR). However, aligning representations between linguistic and acoustic modalities remains a challenge due to inherent modality gaps. Optimal transport (OT) has shown promise in mitigating these gaps by minimizing the W… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: To appear in Interspeech 2025

  4. arXiv:2505.05114  [pdf, other

    eess.AS cs.SD

    Listen to Extract: Onset-Prompted Target Speaker Extraction

    Authors: Pengjie Shen, Kangrui Chen, Shulin He, Pengru Chen, Shuqi Yuan, He Kong, Xueliang Zhang, Zhong-Qiu Wang

    Abstract: We propose $\textit{listen to extract}$ (LExt), a highly-effective while extremely-simple algorithm for monaural target speaker extraction (TSE). Given an enrollment utterance of a target speaker, LExt aims at extracting the target speaker from the speaker's mixed speech with other speakers. For each mixture, LExt concatenates an enrollment utterance of the target speaker to the mixture signal at… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: in submission

  5. arXiv:2504.17028  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU

    Authors: Iman Khadir, Shane Stevenson, Henry Li, Kyle Krick, Abram Burrows, David Hall, Stan Posey, Samuel S. P. Shen

    Abstract: This paper demonstrates the feasibility of democratizing AI-driven global weather forecasting models among university research groups by leveraging Graphics Processing Units (GPUs) and freely available AI models, such as NVIDIA's FourCastNetv2. FourCastNetv2 is an NVIDIA's advanced neural network for weather prediction and is trained on a 73-channel subset of the European Centre for Medium-Range W… ▽ More

    Submitted 20 June, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

    Comments: 12 pages, 8 figures

    MSC Class: 86-04; 86-08; 86-10; 86-11

  6. arXiv:2504.11750  [pdf, other

    cs.DC cs.AI cs.AR cs.PF

    Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

    Authors: Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen

    Abstract: Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-GPU coupled architectures is crucial for optimization. This paper presents an in-depth analysis of LLM inference behavior on loosely-coupled (PCIe A100/H100) and closely-coupled (GH200) systems. We ana… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Accepted for ISPASS 2025

  7. arXiv:2503.21401  [pdf, other

    cs.RO cs.LG eess.SY

    AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

    Authors: Tianyu Xu, Yaoyu Cheng, Pinxi Shen, Lin Zhao

    Abstract: Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power. In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadr… ▽ More

    Submitted 28 March, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  8. arXiv:2502.15264  [pdf, other

    cs.CL cs.SD eess.AS

    Retrieval-Augmented Speech Recognition Approach for Domain Challenges

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: Speech recognition systems often face challenges due to domain mismatch, particularly in real-world applications where domain-specific data is unavailable because of data accessibility and confidentiality constraints. Inspired by Retrieval-Augmented Generation (RAG) techniques for large language models (LLMs), this paper introduces a LLM-based retrieval-augmented speech recognition method that inc… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  9. arXiv:2501.16388  [pdf, other

    cs.LG stat.AP

    Development and Validation of a Dynamic Kidney Failure Prediction Model based on Deep Learning: A Real-World Study with External Validation

    Authors: Jingying Ma, Jinwei Wang, Lanlan Lu, Yexiang Sun, Mengling Feng, Peng Shen, Zhiqin Jiang, Shenda Hong, Luxia Zhang

    Abstract: Background: Chronic kidney disease (CKD), a progressive disease with high morbidity and mortality, has become a significant global public health problem. At present, most of the models used for predicting the progression of CKD are static models. We aim to develop a dynamic kidney failure prediction model based on deep learning (KFDeep) for CKD patients, utilizing all available data on common clin… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  10. arXiv:2412.19002  [pdf, other

    cs.AR cs.AI

    Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs

    Authors: Prabhu Vellaisamy, Harideep Nair, Thomas Kang, Yichen Ni, Haoyang Fan, Bin Qi, Jeff Chen, Shawn Blanton, John Paul Shen

    Abstract: The increasing complexity of deep neural networks (DNNs) poses significant challenges for edge inference deployment due to resource and power constraints of edge devices. Recent works on unary-based matrix multiplication hardware aim to leverage data sparsity and low-precision values to enhance hardware efficiency. However, the adoption and integration of such unary hardware into commercial deep l… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: Accepted in DATE 2025

  11. TNNGen: Automated Design of Neuromorphic Sensory Processing Units for Time-Series Clustering

    Authors: Prabhu Vellaisamy, Harideep Nair, Vamsikrishna Ratnakaram, Dhruv Gupta, John Paul Shen

    Abstract: Temporal Neural Networks (TNNs), a special class of spiking neural networks, draw inspiration from the neocortex in utilizing spike-timings for information processing. Recent works proposed a microarchitecture framework and custom macro suite for designing highly energy-efficient application-specific TNNs. These recent works rely on manual hardware design, a labor-intensive and time-consuming proc… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Published in IEEE Transactions on Circuits and Systems II: Express Briefs, May 2024

  12. tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI

    Authors: Harideep Nair, Prabhu Vellaisamy, Albert Chen, Joseph Finn, Anna Li, Manav Trivedi, John Paul Shen

    Abstract: General matrix multiplication (GEMM) is a ubiquitous computing kernel/algorithm for data processing in diverse applications, including artificial intelligence (AI) and deep learning (DL). Recent shift towards edge computing has inspired GEMM architectures based on unary computing, which are predominantly stochastic and rate-coded systems. This paper proposes a novel GEMM architecture based on temp… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Published in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 2023

  13. tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit

    Authors: Prabhu Vellaisamy, Harideep Nair, Joseph Finn, Manav Trivedi, Albert Chen, Anna Li, Tsung-Han Lin, Perry Wang, Shawn Blanton, John Paul Shen

    Abstract: General Matrix Multiplication (GEMM) is a ubiquitous compute kernel in deep learning (DL). To support energy-efficient edge-native processing, new GEMM hardware units have been proposed that operate on unary encoded bitstreams using much simpler hardware. Most unary approaches thus far focus on rate-based unary encoding of values and perform stochastic approximate computation. This work presents t… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Published in 2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

  14. arXiv:2412.02942  [pdf, other

    cs.AI

    STDCformer: A Transformer-Based Model with a Spatial-Temporal Causal De-Confounding Strategy for Crowd Flow Prediction

    Authors: Silu He, Peng Shen, Pingzhen Xu, Qinyao Luo, Haifeng Li

    Abstract: Existing works typically treat spatial-temporal prediction as the task of learning a function $F$ to transform historical observations to future observations. We further decompose this cross-time transformation into three processes: (1) Encoding ($E$): learning the intrinsic representation of observations, (2) Cross-Time Mapping ($M$): transforming past representations into future representations,… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  15. arXiv:2409.06368  [pdf, other

    cs.GR

    Fiber-level Woven Fabric Capture from a Single Photo

    Authors: Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang

    Abstract: Accurately rendering the appearance of fabrics is challenging, due to their complex 3D microstructures and specialized optical properties. If we model the geometry and optics of fabrics down to the fiber level, we can achieve unprecedented rendering realism, but this raises the difficulty of authoring or capturing the fiber-level assets. Existing approaches can obtain fiber-level geometry with spe… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file

  16. arXiv:2409.02239  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Transferring linguistic knowledge from a pretrained language model (PLM) to an acoustic model has been shown to greatly improve the performance of automatic speech recognition (ASR). However, due to the heterogeneous feature distributions in cross-modalities, designing an effective model for feature alignment and knowledge transfer between linguistic and acoustic sequences remains a challenging ta… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE SLT 2024

  17. arXiv:2406.13399  [pdf, other

    cs.AI

    VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

    Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

    Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: to be published in IEEE ICWS 2024

  18. arXiv:2405.15750  [pdf, other

    cs.CL cs.AI cs.LG

    Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

    Authors: Abhinav Patil, Jaap Jumelet, Yu Ying Chiu, Andy Lapastora, Peter Shen, Lexie Wang, Clevis Willrich, Shane Steinert-Threlkeld

    Abstract: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpor… ▽ More

    Submitted 6 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Forthcoming in Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. For code and trained models, see http://github.com/CLMBRs/corpus-filtering

  19. arXiv:2405.11844  [pdf

    cs.AR cs.ET

    NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors

    Authors: Harideep Nair, William Leyman, Agastya Sampath, Quinn Jacobson, John Paul Shen

    Abstract: Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, pred… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted and Presented at Neuro-Inspired Computational Elements (NICE) Conference, La Jolla, CA. 2024

  20. arXiv:2404.15312  [pdf, other

    eess.SP cs.CV

    Realtime Person Identification via Gait Analysis

    Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

    Abstract: Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to dev… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2404.03648  [pdf, other

    cs.CL

    AutoWebGLM: A Large Language Model-based Web Navigating Agent

    Authors: Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web. In light of these challenges, we develop the open AutoWebGLM based on ChatGLM3-… ▽ More

    Submitted 12 October, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to KDD 2024

  22. Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference

    Authors: Harideep Nair, Prabhu Vellaisamy, Tsung-Han Lin, Perry Wang, Shawn Blanton, John Paul Shen

    Abstract: General Matrix Multiply (GEMM) units, consisting of multiply-accumulate (MAC) arrays, perform bulk of the computation in deep learning (DL). Recent work has proposed a novel MAC design, Bit-Pragmatic (PRA), capable of dynamically exploiting bit sparsity. This work presents OzMAC (Omit-zero-MAC), a modified re-implementation of PRA, but extends beyond earlier works by performing rigorous post-synth… ▽ More

    Submitted 2 January, 2025; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Pre-print version of the publication in VLSI-SoC 2024

  23. arXiv:2402.08423  [pdf, other

    cs.AI

    Vehicle Behavior Prediction by Episodic-Memory Implanted NDT

    Authors: Peining Shen, Jianwu Fang, Hongkai Yu, Jianru Xue

    Abstract: In autonomous driving, predicting the behavior (turning left, stopping, etc.) of target vehicles is crucial for the self-driving vehicle to make safe decisions and avoid accidents. Existing deep learning-based methods have shown excellent and accurate performance, but the black-box nature makes it untrustworthy to apply them in practical use. In this work, we explore the interpretability of behavi… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICRA2024

  24. arXiv:2402.03822  [pdf, other

    cs.AI cs.CL cs.LG

    RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

    Authors: Si Shen, Peijun Shen, Danhao Zhu

    Abstract: This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Thro… ▽ More

    Submitted 23 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  25. AscDAMs: Advanced SLAM-based channel detection and mapping system

    Authors: Tengfei Wang, Fucheng Lu, Jintao Qin, Taosheng Huang, Hui Kong, Ping Shen

    Abstract: Obtaining high-resolution, accurate channel topography and deposit conditions is the prior challenge for the study of channelized debris flow. Currently, wide-used mapping technologies including satellite imaging and drone photogrammetry struggle to precisely observe channel interior conditions of mountainous long-deep gullies, particularly those in the Wenchuan Earthquake region. SLAM is an emerg… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  26. arXiv:2401.05461  [pdf

    cs.HC cs.AI cs.LG

    The two-way knowledge interaction interface between humans and neural networks

    Authors: Zhanliang He, Nuoye Xiong, Hongsheng Li, Peiyi Shen, Guangming Zhu, Liang Zhang

    Abstract: Despite neural networks (NN) have been widely applied in various fields and generally outperforms humans, they still lack interpretability to a certain extent, and humans are unable to intuitively understand the decision logic of NN. This also hinders the knowledge interaction between humans and NN, preventing humans from getting involved to give direct guidance when NN's decisions go wrong. While… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  27. arXiv:2312.10964  [pdf, other

    cs.CL cs.SD eess.AS

    Generative linguistic representation for spoken language identification

    Authors: Peng Shen, Xuguang Lu, Hisashi Kawai

    Abstract: Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance. With the success of recent large models, such as GPT and Whisper, the potential to leverage such pre-trained models for extracting linguistic features for LID tasks has become a promising area of research. In this paper, we explore the utilization of the d… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE ASRU2023

  28. arXiv:2312.10959  [pdf, other

    cs.SD cs.CL eess.AS

    Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: Multi-talker overlapped speech recognition remains a significant challenge, requiring not only speech recognition but also speaker diarization tasks to be addressed. In this paper, to better address these tasks, we first introduce speaker labels into an autoregressive transformer-based speech recognition model to support multi-speaker overlapped speech recognition. Then, to improve speaker diariza… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  29. arXiv:2311.01003  [pdf, other

    eess.SY cs.RO

    Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle

    Authors: Chen Qian, Rui Chen, Peiyao Shen, Yongchun Fang, Jifu Yan, Tiefeng Li

    Abstract: Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and d… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  30. arXiv:2310.13471  [pdf, ps, other

    eess.AS cs.SD

    Neural domain alignment for spoken language recognition based on optimal transport

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Domain shift poses a significant challenge in cross-domain spoken language recognition (SLR) by reducing its effectiveness. Unsupervised domain adaptation (UDA) algorithms have been explored to address domain shifts in SLR without relying on class labels in the target domain. One successful UDA approach focuses on learning domain-invariant representations to align feature distributions between dom… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  31. arXiv:2309.16093  [pdf, ps, other

    eess.AS cs.SD

    Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Due to the modality discrepancy between textual and acoustic modeling, efficiently transferring linguistic knowledge from a pretrained language model (PLM) to acoustic encoding for automatic speech recognition (ASR) still remains a challenging task. In this study, we propose a cross-modality knowledge transfer (CMKT) learning framework in a temporal connectionist temporal classification (CTC) base… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  32. arXiv:2309.13650  [pdf, ps, other

    eess.AS cs.SD

    Cross-modal Alignment with Optimal Transport for CTC-based ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Temporal connectionist temporal classification (CTC)-based automatic speech recognition (ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the token independence assumption in decoding, an external language model (LM) is required which destroys its fast parallel decoding property. Several studies have been proposed to transfer linguistic knowledge from a pretraine… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ASRU 2023

  33. arXiv:2309.10832  [pdf, ps, other

    cs.SD eess.AS

    Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

    Authors: Jiahui Pan, Pengjie Shen, Hui Zhang, Xueliang Zhang

    Abstract: Multi-channel speech enhancement extracts speech using multiple microphones that capture spatial cues. Effectively utilizing directional information is key for multi-channel enhancement. Deep learning shows great potential on multi-channel speech enhancement and often takes short-time Fourier Transform (STFT) as inputs directly. To fully leverage the spatial information, we introduce a method usin… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2309.10393

  34. arXiv:2305.11651  [pdf, other

    cs.IT cs.MA cs.PF eess.SY

    Channel Cycle Time: A New Measure of Short-term Fairness

    Authors: Pengfei Shen, Yulin Shao, Haoyuan Pan, Lu Lu, Yonina C. Eldar

    Abstract: This paper puts forth a new metric, dubbed channel cycle time (CCT), to measure the short-term fairness of communication networks. CCT characterizes the average duration between two consecutive successful transmissions of a user, during which all other users successfully accessed the channel at least once. In contrast to existing short-term fairness measures, CCT provides more comprehensive insigh… ▽ More

    Submitted 14 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  35. arXiv:2209.07313  [pdf, other

    eess.IV cs.CV

    HarDNet-DFUS: An Enhanced Harmonically-Connected Network for Diabetic Foot Ulcer Image Segmentation and Colonoscopy Polyp Segmentation

    Authors: Ting-Yu Liao, Ching-Hui Yang, Yu-Wen Lo, Kuan-Ying Lai, Po-Huai Shen, Youn-Long Lin

    Abstract: We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a p… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  36. arXiv:2207.14578  [pdf, other

    cs.CL cs.SD eess.AS

    Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems. In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems. The proposed encodin… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

  37. arXiv:2207.06309  [pdf, other

    cs.IT cs.NI eess.SY

    Dynamic gNodeB Sleep Control for Energy-Conserving 5G Radio Access Network

    Authors: Pengfei Shen, Yulin Shao, Qi Cao, Lu Lu

    Abstract: 5G radio access network (RAN) is consuming much more energy than legacy RAN due to the denser deployments of gNodeBs (gNBs) and higher single-gNB power consumption. In an effort to achieve an energy-conserving RAN, this paper develops a dynamic on-off switching paradigm, where the ON/OFF states of gNBs can be dynamically configured according to the evolvements of the associated users. We formulate… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Keywords: Base station sleep control, 5G, radio access network, Markov decision process, greedy policy, index policy

  38. arXiv:2206.03061  [pdf, other

    cs.CV

    Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection

    Authors: Hongsheng Li, Guangming Zhu, Wu Zhen, Lan Ni, Peiyi Shen, Liang Zhang, Ning Wang, Cong Hua

    Abstract: The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and objects. Recently, the image's Human-Object Interaction(HOI) detection has made significant progress. However, there is still room for improvement in video HOI detection performance. Existing one-stage methods use well-designed end-to-end networks to detect a video segment and directly predict an in… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted by IJCNN2022

  39. arXiv:2205.14248  [pdf, other

    cs.ET cs.AR cs.NE

    Towards a Design Framework for TNN-Based Neuromorphic Sensory Processing Units

    Authors: Prabhu Vellaisamy, John Paul Shen

    Abstract: Temporal Neural Networks (TNNs) are spiking neural networks that exhibit brain-like sensory processing with high energy efficiency. This work presents the ongoing research towards developing a custom design framework for designing efficient application-specific TNN-based Neuromorphic Sensory Processing Units (NSPUs). This paper examines previous works on NSPU designs for UCR time-series clustering… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  40. arXiv:2205.07410  [pdf, other

    cs.AR cs.ET cs.LG cs.NE

    TNN7: A Custom Macro Suite for Implementing Highly Optimized Designs of Neuromorphic TNNs

    Authors: Harideep Nair, Prabhu Vellaisamy, Santha Bhasuthkar, John Paul Shen

    Abstract: Temporal Neural Networks (TNNs), inspired from the mammalian neocortex, exhibit energy-efficient online sensory processing capabilities. Recent works have proposed a microarchitecture framework for implementing TNNs and demonstrated competitive performance on vision and time-series applications. Building on these previous works, this work proposes TNN7, a suite of nine highly optimized custom macr… ▽ More

    Submitted 25 May, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: To be published in ISVLSI 2022

  41. arXiv:2204.03888  [pdf, other

    cs.CL cs.SD eess.AS

    Transducer-based language embedding for spoken language identification

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefi… ▽ More

    Submitted 29 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: This paper was accepted by Interspeech 2022

  42. arXiv:2203.17036  [pdf, ps, other

    eess.AS cs.CL

    Partial Coupling of Optimal Transport for Spoken Language Identification

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT). A discrepancy measurement based on OT was adopted for JDA between training and test data sets. In our previous study, it was supp… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: This work was submitted to INTERSPEECH 2022

  43. arXiv:2201.01778  [pdf, other

    quant-ph cond-mat.dis-nn cond-mat.mes-hall cs.AI cs.CV

    Quantum Capsule Networks

    Authors: Zidu Liu, Pei-Xin Shen, Weikang Li, L. -M. Duan, Dong-Ling Deng

    Abstract: Capsule networks, which incorporate the paradigms of connectionism and symbolism, have brought fresh insights into artificial intelligence. The capsule, as the building block of capsule networks, is a group of neurons represented by a vector to encode different features of an entity. The information is extracted hierarchically through capsule layers via routing algorithms. Here, we introduce a qua… ▽ More

    Submitted 5 December, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: 7 pages (main text) + 8 pages (supplementary information), 8 figures

    Journal ref: Quantum Sci. Technol. 8 015016 (2022)

  44. arXiv:2201.00443  [pdf, other

    cs.CV

    Scene Graph Generation: A Comprehensive Survey

    Authors: Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Mingtao Feng, Xia Zhao, Qiguang Miao, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semanti… ▽ More

    Submitted 22 June, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

    Comments: Submitted to TPAMI

  45. arXiv:2109.06310  [pdf, other

    cs.LG stat.ML

    State Relevance for Off-Policy Evaluation

    Authors: Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

    Abstract: Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces varianc… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021

  46. arXiv:2108.09541  [pdf, other

    cs.LG

    Rotation Equivariant Operators for Machine Learning on Scalar and Vector Fields

    Authors: Paul Shen, Michael Herbst, Venkat Viswanathan

    Abstract: We develop theory and software for rotation equivariant operators on scalar and vector fields, with diverse applications in simulation, optimization and machine learning. Rotation equivariance (covariance) means all fields in the system rotate together, implying spatially invariant dynamics that preserve symmetry. Extending the convolution theorems of linear time invariant systems, we theorize tha… ▽ More

    Submitted 4 August, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

  47. arXiv:2108.08633  [pdf, other

    cs.CV cs.MM

    Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

    Authors: Ning Wang, Guangming Zhu, Liang Zhang, Peiyi Shen, Hongsheng Li, Cong Hua

    Abstract: For a given video-based Human-Object Interaction scene, modeling the spatio-temporal relationship between humans and objects are the important cue to understand the contextual information presented in the video. With the effective spatio-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also to directly capture inter-time dependencies. It i… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: ACM MM Oral paper

  48. arXiv:2108.04542  [pdf, other

    cs.AI cs.CV

    TrUMAn: Trope Understanding in Movies and Animations

    Authors: Hung-Ting Su, Po-Wei Shen, Bing-Chen Tsai, Wen-Feng Cheng, Ke-Jyun Wang, Winston H. Hsu

    Abstract: Understanding and comprehending video content is crucial for many real-world applications such as search and recommendation systems. While recent progress of deep learning has boosted performance on various tasks using visual cues, deep cognition to reason intentions, motivation, or causality remains challenging. Existing datasets that aim to examine video reasoning capability focus on visual sign… ▽ More

    Submitted 21 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: CIKM 2021. The first two authors contributed equally to this work

  49. arXiv:2106.12864  [pdf, other

    eess.IV cs.CV cs.LG

    A Systematic Collection of Medical Image Datasets for Deep Learning

    Authors: Johann Li, Guangming Zhu, Cong Hua, Mingtao Feng, BasheerBennamoun, Ping Li, Xiaoyuan Lu, Juan Song, Peiyi Shen, Xu Xu, Lin Mei, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analy… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: This paper has been submitted to one journal

  50. arXiv:2106.05519  [pdf, other

    cs.CV

    Consistent Instance False Positive Improves Fairness in Face Recognition

    Authors: Xingkun Xu, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang, Yong Li, Zhen Cui

    Abstract: Demographic bias is a significant challenge in practical face recognition systems. Existing methods heavily rely on accurate demographic annotations. However, such annotations are usually unavailable in real scenarios. Moreover, these methods are typically designed for a specific demographic group and are not general enough. In this paper, we propose a false positive rate penalty loss, which mitig… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: CVPR2021