Skip to main content

Showing 1–50 of 68 results for author: Kang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.01077  [pdf

    cs.NE

    Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM

    Authors: Lei Zhao, Ling Kang, Quan Guo

    Abstract: With the advent of artificial intelligence (AI), many researchers are attempting to extract structured information from document-level biomedical literature by fine-tuning large language models (LLMs). However, they face significant challenges such as the need for expensive hardware, like high-performance GPUs and the high labor costs associated with annotating training datasets, especially in bio… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  2. arXiv:2504.16101  [pdf, other

    eess.SP cs.AI cs.LG

    xLSTM-ECG: Multi-label ECG Classification via Feature Fusion with xLSTM

    Authors: Lei Kang, Xuanshuo Fu, Javier Vazquez-Corral, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide, highlighting the critical need for efficient and accurate diagnostic tools. Electrocardiograms (ECGs) are indispensable in diagnosing various heart conditions; however, their manual interpretation is time-consuming and error-prone. In this paper, we propose xLSTM-ECG, a novel approach that leverages an extended Long Sh… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  3. arXiv:2504.09265  [pdf, other

    cs.LG cs.CL cs.CV

    Mixture of Group Experts for Learning Invariant Representations

    Authors: Lei Kang, Jia Li, Mi Tian, Hua Huang

    Abstract: Sparsely activated Mixture-of-Experts (MoE) models effectively increase the number of parameters while maintaining consistent computational costs per token. However, vanilla MoE models often suffer from limited diversity and specialization among experts, constraining their performance and scalability, especially as the number of experts increases. In this paper, we present a novel perspective on v… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  4. arXiv:2504.08616  [pdf, other

    cs.CV

    Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition

    Authors: Lei Kang, Xuanshuo Fu, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Handwritten Text Recognition (HTR) is essential for document analysis and digitization. However, handwritten data often contains user-identifiable information, such as unique handwriting styles and personal lexicon choices, which can compromise privacy and erode trust in AI services. Legislation like the ``right to be forgotten'' underscores the necessity for methods that can expunge sensitive inf… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  5. arXiv:2504.04549  [pdf, other

    cs.CV

    Opening the black box of deep learning: Validating the statistical association between explainable artificial intelligence (XAI) and clinical domain knowledge in fundus image-based glaucoma diagnosis

    Authors: Han Yuan, Lican Kang, Yong Li

    Abstract: While deep learning has exhibited remarkable predictive capabilities in various medical image tasks, its inherent black-box nature has hindered its widespread implementation in real-world healthcare settings. Our objective is to unveil the decision-making processes of deep learning models in the context of glaucoma classification by employing several Class Activation Map (CAM) techniques to genera… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  6. arXiv:2504.03158  [pdf, other

    stat.ML cs.LG

    Accelerating Particle-based Energetic Variational Inference

    Authors: Xuelian Bao, Lulu Kang, Chun Liu, Yiwei Wang

    Abstract: In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizin… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 21 pages, 5 figures, 2 tables

    MSC Class: 62G05; 65K10; 65L05

  7. arXiv:2503.16408  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

    Authors: Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai

    Abstract: Designing effective embodied multi-agent systems is critical for solving complex real-world tasks across domains. Due to the complexity of multi-agent embodied systems, existing methods fail to automatically generate safe and efficient training data for such systems. To this end, we propose the concept of compositional constraints for embodied multi-agent systems, addressing the challenges arising… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Project page: https://iranqin.github.io/robofactory/

  8. arXiv:2502.20676  [pdf, other

    cs.CV

    SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition

    Authors: Shanshan Wan, Yingmei Wei, Lai Kang, Tianrui Shen, Haixuan Wang, Yee-Hong Yang

    Abstract: Visual Place Recognition (VPR) is a major challenge for robotics and autonomous systems, with the goal of predicting the location of an image based solely on its visual features. State-of-the-art (SOTA) models extract global descriptors using the powerful foundation model DINOv2 as backbone. These models either explore the cross-image correlation or propose a time-consuming two-stage re-ranking st… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  9. arXiv:2502.16617  [pdf, other

    cs.LG stat.ME stat.ML

    Optimal Kernel Learning for Gaussian Process Models with High-Dimensional Input

    Authors: Lulu Kang, Minshen Xu

    Abstract: Gaussian process (GP) regression is a popular surrogate modeling tool for computer simulations in engineering and scientific domains. However, it often struggles with high computational costs and low prediction accuracy when the simulation involves too many input variables. For some simulation models, the outputs may only be significantly influenced by a small subset of the input variables, referr… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 30 pages, 7 tables, 8 figures

    ACM Class: G.3

  10. arXiv:2501.07953  [pdf, other

    cs.CV eess.IV

    Robust Hyperspectral Image Panshapring via Sparse Spatial-Spectral Representation

    Authors: Chia-Ming Lee, Yu-Fan Lin, Li-Wei Kang, Chih-Chung Hsu

    Abstract: High-resolution hyperspectral imaging plays a crucial role in various remote sensing applications, yet its acquisition often faces fundamental limitations due to hardware constraints. This paper introduces S$^{3}$RNet, a novel framework for hyperspectral image pansharpening that effectively combines low-resolution hyperspectral images (LRHSI) with high-resolution multispectral images (HRMSI) throu… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: Submitted to IGARSS 2025

  11. arXiv:2501.04665  [pdf, other

    eess.IV cs.CV

    HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion

    Authors: Chia-Ming Lee, Yu-Fan Lin, Yu-Hao Ho, Li-Wei Kang, Chih-Chung Hsu

    Abstract: Hyperspectral image (HSI) fusion addresses the challenge of reconstructing High-Resolution HSIs (HR-HSIs) from High-Resolution Multispectral images (HR-MSIs) and Low-Resolution HSIs (LR-HSIs), a critical task given the high costs and hardware limitations associated with acquiring high-quality HSIs. While existing methods leverage spatial and spectral relationships, they often suffer from limited r… ▽ More

    Submitted 14 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Comments: Submitted to IGARSS 2025

  12. arXiv:2412.03092  [pdf, other

    cs.CL cs.AI cs.LG

    Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

    Authors: Peiyan Zhang, Haibo Jin, Leyang Hu, Xinnuo Li, Liying Kang, Man Luo, Yangqiu Song, Haohan Wang

    Abstract: Recent advancements in large language models (LLMs) have significantly enhanced the ability of LLM-based systems to perform complex tasks through natural language processing and tool interaction. However, optimizing these LLM-based systems for specific tasks remains challenging, often requiring manual interventions like prompt engineering and hyperparameter tuning. Existing automatic optimization… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: 20 pages, 2 figures

    ACM Class: I.2.7; I.2.8

  13. arXiv:2411.03730  [pdf, other

    cs.LG cs.CR cs.CV

    NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA

    Authors: Marlon Tobaben, Mohamed Ali Souibgui, Rubèn Tito, Khanh Nguyen, Raouf Kerkouche, Kangsoo Jung, Joonas Jälkö, Lei Kang, Andrey Barsky, Vincent Poulain d'Andecy, Aurélie Joseph, Aashiq Muhamed, Kevin Kuo, Virginia Smith, Yusuke Yamasaki, Takumi Fukami, Kenta Niwa, Iifan Tyou, Hiro Ishii, Rio Yokota, Ragul N, Rintu Kutum, Josep Llados, Ernest Valveny, Antti Honkela , et al. (2 additional authors not shown)

    Abstract: The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 27 pages, 6 figures

  14. arXiv:2408.07259  [pdf, other

    cs.CV cs.AI

    GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models

    Authors: Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Fonts are integral to creative endeavors, design processes, and artistic productions. The appropriate selection of a font can significantly enhance artwork and endow advertisements with a higher level of expressivity. Despite the availability of numerous diverse font designs online, traditional retrieval-based methods for font selection are increasingly being supplanted by generation-based approac… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECAI2024

  15. arXiv:2406.19666  [pdf, other

    cs.CV eess.IV

    CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

    Authors: Chih-Chung Hsu, Chih-Chien Ni, Chia-Ming Lee, Li-Wei Kang

    Abstract: Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to TIP 2024

  16. GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

    Authors: Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim

    Abstract: In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. H… ▽ More

    Submitted 11 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGIR 2024

    ACM Class: H.3.3

  17. arXiv:2405.18724  [pdf, other

    q-bio.QM cs.AI cs.LG

    Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction

    Authors: Linjia Kang, Songhua Zhou, Shuyan Fang, Shichao Liu

    Abstract: Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for hierarchical prompted molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential… ▽ More

    Submitted 11 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  18. arXiv:2405.12915  [pdf, other

    cs.CL

    G-DIG: Towards Gradient-based Diverse and High-quality Instruction Data Selection for Machine Translation

    Authors: Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Shanbo Cheng

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in general scenarios. Instruction finetuning empowers them to align with humans in various tasks. Nevertheless, the Diversity and Quality of the instruction data remain two main challenges for instruction finetuning. With regard to this, in this paper, we propose a novel gradient-based method to automatically select high-quality a… ▽ More

    Submitted 7 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 main conference

  19. arXiv:2405.12684  [pdf, other

    stat.ML cs.LG

    Model Free Prediction with Uncertainty Assessment

    Authors: Yuling Jiao, Lican Kang, Jin Liu, Heng Peng, Heng Zuo

    Abstract: Deep nonparametric regression, characterized by the utilization of deep neural networks to learn target functions, has emerged as a focus of research attention in recent years. Despite considerable progress in understanding convergence rates, the absence of asymptotic properties hinders rigorous statistical inference. To address this gap, we propose a novel framework that transforms the deep estim… ▽ More

    Submitted 31 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  20. arXiv:2404.19031  [pdf, other

    cs.CV cs.AI

    Machine Unlearning for Document Classification

    Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to ICDAR2024

  21. arXiv:2404.19024  [pdf, other

    cs.CV

    Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism

    Authors: Lei Kang, Rubèn Tito, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Documents are 2-dimensional carriers of written communication, and as such their interpretation requires a multi-modal approach where textual and visual information are efficiently combined. Document Visual Question Answering (Document VQA), due to this multi-modal nature, has garnered significant interest from both the document understanding and natural language processing communities. The state-… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to ICDAR2024

  22. arXiv:2404.13309  [pdf, ps, other

    stat.ML cs.LG

    Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning

    Authors: Yuling Jiao, Lican Kang, Huazhen Lin, Jin Liu, Heng Zuo

    Abstract: This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{ö}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution tha… ▽ More

    Submitted 22 December, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  23. arXiv:2404.11041  [pdf, other

    cs.AI cs.LG

    On the Empirical Complexity of Reasoning and Planning in LLMs

    Authors: Liwei Kang, Zirui Zhao, David Hsu, Wee Sun Lee

    Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimente… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  24. arXiv:2404.02754  [pdf, ps, other

    cs.LG

    Continual Learning of Numerous Tasks from Long-tail Distributions

    Authors: Liwei Kang, Wee Sun Lee

    Abstract: Continual learning, an important aspect of artificial intelligence and machine learning research, focuses on developing models that learn and adapt to new tasks while retaining previously acquired knowledge. Existing continual learning algorithms usually involve a small number of tasks with uniform sizes and may not accurately represent real-world learning scenarios. In this paper, we investigate… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  25. arXiv:2403.14718  [pdf, other

    cs.LG cs.DC

    FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System

    Authors: Jianjun Huang, Lixin Ye, Li Kang

    Abstract: In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distribut… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 11 pages, 10 figures

  26. arXiv:2402.16240  [pdf, other

    cs.IR cs.SI

    High-Frequency-aware Hierarchical Contrastive Selective Coding for Representation Learning on Text-attributed Graphs

    Authors: Peiyan Zhang, Chaozhuo Li, Liying Kang, Feiran Huang, Senzhang Wang, Xing Xie, Sunghun Kim

    Abstract: We investigate node representation learning on text-attributed graphs (TAGs), where nodes are associated with text information. Although recent studies on graph neural networks (GNNs) and pretrained language models (PLMs) have exhibited their power in encoding network and text signals, respectively, less attention has been paid to delicately coupling these two types of models on TAGs. Specifically… ▽ More

    Submitted 19 April, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW 2024

    ACM Class: H.3.3

  27. arXiv:2402.15561  [pdf, other

    cs.LG cs.CY

    Fair Multivariate Adaptive Regression Splines for Ensuring Equity and Transparency

    Authors: Parian Haghighat, Denisa G'andara, Lulu Kang, Hadis Anahideh

    Abstract: Predictive analytics is widely used in various domains, including education, to inform decision-making and improve outcomes. However, many predictive models are proprietary and inaccessible for evaluation or modification by researchers and practitioners, limiting their accountability and ethical design. Moreover, predictive models are often opaque and incomprehensible to the officials who use them… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Journal ref: The 38th Annual AAAI Conference on Artificial Intelligence, 2024

  28. arXiv:2402.03526  [pdf, other

    cs.CV

    nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

    Authors: Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li

    Abstract: In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimens… ▽ More

    Submitted 10 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/lhaof/nnMamba

  29. arXiv:2312.10108  [pdf, other

    cs.CV cs.AI cs.LG

    Privacy-Aware Document Visual Question Answering

    Authors: Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Joonas Jälkö, Vincent Poulain D'Andecy, Aurelie Joseph, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

    Abstract: Document Visual Question Answering (DocVQA) has quickly grown into a central task of document understanding. But despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees. In this work, we explore privacy in the domain of DocVQA for the first time, highlighting privacy issues in state of the art multi-modal LLM… ▽ More

    Submitted 2 September, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 35 pages, 12 figures, accepted for publication at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024

  30. arXiv:2310.14158  [pdf, other

    cs.CV

    Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction

    Authors: Luoyao Kang, Haifan Gong, Xiang Wan, Haofeng Li

    Abstract: Deep learning (DL) has been used in the automatic diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD) with brain imaging data. However, previous methods have not fully exploited the relation between brain image and clinical information that is widely adopted by experts in practice. To exploit the heterogeneous features from imaging and tabular data simultaneously, we propose… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023, released code: https://github.com/lhaof/VAPL

  31. arXiv:2310.07109  [pdf, other

    cs.SE

    SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning

    Authors: Xueqi Yang, Mariusz Jakubowski, Li Kang, Haojie Yu, Tim Menzies

    Abstract: As software projects rapidly evolve, software artifacts become more complex and defects behind get harder to identify. The emerging Transformer-based approaches, though achieving remarkable performance, struggle with long code sequences due to their self-attention mechanism, which scales quadratically with the sequence length. This paper introduces SparseCoder, an innovative approach incorporating… ▽ More

    Submitted 11 September, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: 34 pages, 9 figures, pre-print

  32. arXiv:2306.11465  [pdf

    cs.RO cs.AI cs.LG eess.SY

    Safe, Efficient, Comfort, and Energy-saving Automated Driving through Roundabout Based on Deep Reinforcement Learning

    Authors: Henan Yuan, Penghui Li, Bart van Arem, Liujiang Kang, Yongqi Dong

    Abstract: Traffic scenarios in roundabouts pose substantial complexity for automated driving. Manually mapping all possible scenarios into a state space is labor-intensive and challenging. Deep reinforcement learning (DRL) with its ability to learn from interacting with the environment emerges as a promising solution for training such automated driving models. This study explores, employs, and implements va… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 6 pages, 3 figures, under review by the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  33. arXiv:2305.18326  [pdf, other

    cs.CV cs.AI

    BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

    Authors: Liyan Kang, Luyang Huang, Ningxin Peng, Peihao Zhu, Zewei Sun, Shanbo Cheng, Mingxuan Wang, Degen Huang, Jinsong Su

    Abstract: We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of multi-modality machine translation. Compared with the widely used How2 and VaTeX datasets, BigVideo is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: Amb… ▽ More

    Submitted 3 July, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  34. arXiv:2305.06273  [pdf, other

    cs.CL cs.SD eess.AS

    Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

    Authors: Lei Kang, Lichao Zhang, Dazhi Jiang

    Abstract: Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions. Despite the recent progress in SER, state-of-the-art models struggle to achieve a satisfactory performance. We propose a self-attention based method with combined use of label-adaptive mixup and cen… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  35. arXiv:2304.11446  [pdf, other

    cs.CV cs.AI

    Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis

    Authors: Yansong Gao, Zhihong Pan, Xin Zhou, Le Kang, Pratik Chaudhari

    Abstract: Denoising diffusion probabilistic models (DDPMs) are a class of powerful generative models. The past few years have witnessed the great success of DDPMs in generating high-fidelity samples. A significant limitation of the DDPMs is the slow sampling procedure. DDPMs generally need hundreds or thousands of sequential function evaluations (steps) of neural networks to generate a sample. This paper ai… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: text overlap with arXiv:2101.12176 by other authors

  36. arXiv:2303.06747  [pdf, other

    cs.CV eess.IV

    Raising The Limit Of Image Rescaling Using Auxiliary Encoding

    Authors: Chenzhong Yin, Zhihong Pan, Xin Zhou, Le Kang, Paul Bogdan

    Abstract: Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional n… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  37. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

    Authors: Mingchen Li, Liqi Kang, Yi Xiong, Yu Guang Wang, Guisheng Fan, Pan Tan, Liang Hong

    Abstract: Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mech… ▽ More

    Submitted 28 December, 2022; originally announced January 2023.

    Journal ref: Journal of Cheminformatics (2023) 15:12

  38. arXiv:2210.12315  [pdf, other

    cs.CV

    Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model

    Authors: Zhiyuan Ren, Zhihong Pan, Xin Zhou, Le Kang

    Abstract: We propose a simple and novel method for generating 3D human motion from complex natural language sentences, which describe different velocity, direction and composition of all kinds of actions. Different from existing methods that use classical generative architecture, we apply the Denoising Diffusion Probabilistic Model to this task, synthesizing diverse motion results under the guidance of text… ▽ More

    Submitted 14 April, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2023

  39. SoccerNet 2022 Challenges Results

    Authors: Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao , et al. (69 additional authors not shown)

    Abstract: The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on det… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM MMSports 2022

  40. arXiv:2207.11455  [pdf, other

    cs.CV

    UC-OWOD: Unknown-Classified Open World Object Detection

    Authors: Zhiheng Wu, Yue Lu, Xingyu Chen, Zhengxing Wu, Liwen Kang, Junzhi Yu

    Abstract: Open World Object Detection (OWOD) is a challenging computer vision problem that requires detecting unknown objects and gradually learning the identified unknown classes. However, it cannot distinguish unknown instances as multiple unknown classes. In this work, we propose a novel OWOD problem called Unknown-Classified Open World Object Detection (UC-OWOD). UC-OWOD aims to detect unknown instances… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  41. arXiv:2205.06361  [pdf, other

    cs.CR cs.AR

    Building A Trusted Execution Environment for In-Storage Computing

    Authors: Yuqi Xue, Luyi Kang, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, Jian Huang

    Abstract: In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviating the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them consider security as the priority for in-storage computing. Specifically, since modern SSD controller… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Extended abstract for IceClave. arXiv admin note: substantial text overlap with arXiv:2109.03373

  42. arXiv:2204.11351  [pdf, other

    cs.LG cs.AI

    An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

    Authors: Han Yuan, Mingxuan Liu, Lican Kang, Chenkui Miao, Ying Wu

    Abstract: Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanati… ▽ More

    Submitted 9 April, 2023; v1 submitted 24 April, 2022; originally announced April 2022.

  43. SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

    Authors: Anthony Cioppa, Silvio Giancola, Adrien Deliege, Le Kang, Xin Zhou, Zhiyu Cheng, Bernard Ghanem, Marc Van Droogenbroeck

    Abstract: Tracking objects in soccer videos is extremely important to gather both player and team statistics, whether it is to estimate the total distance run, the ball possession or the team formation. Video processing can help automating the extraction of those information, without the need of any invasive sensor, hence applicable to any team on any stadium. Yet, the availability of datasets to train lear… ▽ More

    Submitted 20 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: Paper accepted for the CVsports workshop at CVPR2022. This document contains 8 pages + references

  44. Content and Style Aware Generation of Text-line Images for Handwriting Recognition

    Authors: Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

    Abstract: Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to g… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted to TPAMI

  45. arXiv:2203.15187  [pdf, other

    cs.CV

    ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

    Authors: Bo He, Xitong Yang, Le Kang, Zhiyu Cheng, Xin Zhou, Abhinav Shrivastava

    Abstract: Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training. Without the boundary information of action segments, existing methods mostly rely on multiple instance learning (MIL), where the predictions of unlabeled instances (i.e., video snippets) are supervised by classifying labeled bags (i.e.… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  46. arXiv:2111.10806  [pdf, other

    stat.ML cs.LG stat.CO

    A Data-Driven Line Search Rule for Support Recovery in High-dimensional Data Analysis

    Authors: Peili Li, Yuling Jiao, Xiliang Lu, Lican Kang

    Abstract: In this work, we consider the algorithm to the (nonlinear) regression problems with $\ell_0$ penalty. The existing algorithms for $\ell_0$ based optimization problem are often carried out with a fixed step size, and the selection of an appropriate step size depends on the restricted strong convexity and smoothness for the loss function, hence it is difficult to compute in practical calculation. In… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

  47. arXiv:2111.10722  [pdf, other

    stat.ML cs.LG stat.CO

    A Deterministic Sampling Method via Maximum Mean Discrepancy Flow with Adaptive Kernel

    Authors: Yindong Chen, Yiwei Wang, Lulu Kang, Chun Liu

    Abstract: We propose a novel deterministic sampling method to approximate a target distribution $ρ^*$ by minimizing the kernel discrepancy, also known as the Maximum Mean Discrepancy (MMD). By employing the general \emph{energetic variational inference} framework (Wang et al., 2021), we convert the problem of minimizing MMD to solving a dynamic ODE system of the particles. We adopt the implicit Euler numeri… ▽ More

    Submitted 11 March, 2025; v1 submitted 20 November, 2021; originally announced November 2021.

    Comments: 30 pages, 10 figures

  48. IceClave: A Trusted Execution Environment for In-Storage Computing

    Authors: Luyi Kang, Yuqi Xue, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, Jian Huang

    Abstract: In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviate the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. Specifically, since modern SSD controllers do not h… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: 11 pages. Accepted to MICRO'21

  49. arXiv:2107.09728  [pdf, other

    cs.LG q-bio.QM

    Machine Learning Approaches to Automated Flow Cytometry Diagnosis of Chronic Lymphocytic Leukemia

    Authors: Akum S. Kang, Loveleen C. Kang, Stephen M. Mastorides, Philip R. Foulis, Lauren A. DeLand, Robert P. Seifert, Andrew A. Borkowski

    Abstract: Flow cytometry is a technique that measures multiple fluorescence and light scatter-associated parameters from individual cells as they flow a single file through an excitation light source. These cells are labeled with antibodies to detect various antigens and the fluorescence signals reflect antigen expression. Interpretation of the multiparameter flow cytometry data is laborious, time-consuming… ▽ More

    Submitted 22 July, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 4 pp

  50. arXiv:2107.04766  [pdf, ps, other

    stat.CO cs.LG

    Convergence Analysis of Schr{ö}dinger-F{ö}llmer Sampler without Convexity

    Authors: Yuling Jiao, Lican Kang, Yanyan Liu, Youzhou Zhou

    Abstract: Schrödinger-Föllmer sampler (SFS) is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schrödinger-Föllmer diffusion process $$\mathrm{d} X_{t}=-\nabla U\left(X_t, t\right) \mathrm{d} t+\mathrm{d} B_{t}, \quad t \in[0,1],\quad X_0=0$$ on the unit interval, which transports the degenerate dis… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2106.10880