Skip to main content

Showing 1–50 of 149 results for author: Yin, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10183  [pdf, other

    cs.DC cs.AI

    KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems

    Authors: Jieke Lin, Wanyu Wang, Longxiang Yin, Yinhe Han

    Abstract: Embodied Artificial Intelligence (AI) systems, such as autonomous robots and intelligent vehicles, are increasingly reliant on diverse heterogeneous accelerators (e.g., GPGPUs, NPUs, FPGAs) to meet stringent real-time processing and energy-efficiency demands. However, the proliferation of vendor-specific proprietary communication libraries creates significant interoperability barriers, hindering s… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures. Jieke Lin and Wanyu Wang contributed equally to this work

  2. arXiv:2505.10013  [pdf, ps, other

    cs.CL

    DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs

    Authors: Lake Yin, Fan Huang

    Abstract: As Large Language Models (LLMs) have risen in prominence over the past few years, there has been concern over the potential biases in LLMs inherited from the training data. Previous studies have examined how LLMs exhibit implicit bias, such as when response generation changes when different social contexts are introduced. We argue that this implicit bias is not only an ethical, but also a technica… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 7 pages, 1 figure

  3. arXiv:2505.02078  [pdf, ps, other

    cs.CL cs.AI

    LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

    Authors: Joy Lim Jia Yin, Daniel Zhang-Li, Jifan Yu, Haoxuan Li, Shangqing Tu, Yuanchun Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, Bin Xu

    Abstract: Evaluating the quality of slide-based multimedia instruction is challenging. Existing methods like manual assessment, reference-based metrics, and large language model evaluators face limitations in scalability, context capture, or bias. In this paper, we introduce LecEval, an automated metric grounded in Mayer's Cognitive Theory of Multimedia Learning, to evaluate multimodal knowledge acquisition… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures

  4. arXiv:2504.17033  [pdf, ps, other

    cs.DS

    Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

    Authors: Ran Duan, Jiayi Mao, Xiao Mao, Xinkai Shu, Longhui Yin

    Abstract: We give a deterministic $O(m\log^{2/3}n)$-time algorithm for single-source shortest paths (SSSP) on directed graphs with real non-negative edge weights in the comparison-addition model. This is the first result to break the $O(m+n\log n)$ time bound of Dijkstra's algorithm on sparse graphs, showing that Dijkstra's algorithm is not optimal for SSSP.

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 17 pages

  5. arXiv:2504.07554  [pdf, other

    cs.RO

    Efficient Swept Volume-Based Trajectory Generation for Arbitrary-Shaped Ground Robot Navigation

    Authors: Yisheng Li, Longji Yin, Yixi Cai, Jianheng Liu, Haotian Li, Fu Zhang

    Abstract: Navigating an arbitrary-shaped ground robot safely in cluttered environments remains a challenging problem. The existing trajectory planners that account for the robot's physical geometry severely suffer from the intractable runtime. To achieve both computational efficiency and Continuous Collision Avoidance (CCA) of arbitrary-shaped ground robot planning, we proposed a novel coarse-to-fine naviga… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  6. arXiv:2504.04422  [pdf, other

    cs.CR cs.SE

    LeakGuard: Detecting Memory Leaks Accurately and Scalably

    Authors: Hongliang Liang, Luming Yin, Guohao Wu, Yuxiang Li, Qiuping Yi, Lei Wang

    Abstract: Memory leaks are prevalent in various real-world software projects, thereby leading to serious attacks like denial-of-service. Though prior methods for detecting memory leaks made significant advance, they often suffer from low accuracy and weak scalability for testing large and complex programs. In this paper we present LeakGuard, a memory leak detection tool which provides satisfactory balance o… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: 21 pages, 5 figures, conference paper on memory leak detection

  7. arXiv:2503.18282  [pdf, other

    cs.CV

    TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos

    Authors: Kazuhiro Yamada, Li Yin, Qingrui Hu, Ning Ding, Shunsuke Iwashita, Jun Ichikawa, Kiwamu Kotani, Calvin Yeung, Keisuke Fujii

    Abstract: Multi-object tracking, player identification, and pose estimation are fundamental components of sports analytics, essential for analyzing player movements, performance, and tactical strategies. However, existing datasets and methodologies primarily target mainstream team sports such as soccer and conventional 5-on-5 basketball, often overlooking scenarios involving fixed-camera setups commonly use… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  8. arXiv:2503.14906  [pdf, other

    eess.IV cs.CV

    FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis

    Authors: Yaofei Duan, Tao Tan, Zhiyuan Zhu, Yuhao Huang, Yuanji Zhang, Rui Gao, Patrick Cheong-Iao Pang, Xinru Gao, Guowei Tao, Xiang Cong, Zhou Li, Lianying Liang, Guangzhi He, Linliang Yin, Xuedong Deng, Xin Yang, Dong Ni

    Abstract: Fetal ultrasound (US) examinations require the acquisition of multiple planes, each providing unique diagnostic information to evaluate fetal development and screening for congenital anomalies. However, obtaining a comprehensive, multi-plane annotated fetal US dataset remains challenging, particularly for rare or complex anomalies owing to their low incidence and numerous subtypes. This poses diff… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 18 pages, 10 figures

  9. arXiv:2503.08308  [pdf, other

    cs.AI

    Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework

    Authors: Zhuo Zhi, Chen Feng, Adam Daneshmend, Mine Orlu, Andreas Demosthenous, Lu Yin, Da Li, Ziquan Liu, Miguel R. D. Rodrigues

    Abstract: Multimodal large language models (MLLMs) show promise in tasks like visual question answering (VQA) but still face challenges in multimodal reasoning. Recent works adapt agentic frameworks or chain-of-thought (CoT) reasoning to improve performance. However, CoT-based multimodal reasoning often demands costly data annotation and fine-tuning, while agentic approaches relying on external tools risk i… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  10. arXiv:2502.17429  [pdf, other

    cs.CV

    CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation

    Authors: Vishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan

    Abstract: While 3D instance segmentation has made significant progress, current methods struggle to address realistic scenarios where new categories emerge over time with natural class imbalance. This limitation stems from existing datasets, which typically feature few well-balanced classes. Although few datasets include unbalanced class annotations, they lack the diverse incremental scenarios necessary for… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Code: https://github.com/vgthengane/CLIMB3D

  11. arXiv:2502.14888  [pdf, other

    cs.CV cs.AI

    The Multi-Faceted Monosemanticity in Multimodal Representations

    Authors: Hanqi Yan, Xiangxiang Cui, Lu Yin, Paul Pu Liang, Yulan He, Yifei Wang

    Abstract: In this paper, we leverage recent advancements in feature monosemanticity to extract interpretable features from deep multimodal models, offering a data-driven understanding of modality gaps. Specifically, we investigate CLIP (Contrastive Language-Image Pretraining), a prominent visual-language representation model trained on extensive image-text pairs. Building upon interpretability tools develop… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  12. arXiv:2502.12594  [pdf, other

    cs.CL

    PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

    Authors: Bowei He, Lihao Yin, Hui-Ling Zhen, Xiaokun Zhang, Mingxuan Yuan, Chen Ma

    Abstract: Model pruning is an effective approach for compressing large language models. However, this process often leads to significant degradation of model capabilities. While post-training techniques such as instruction tuning are commonly employed to recover model performance, existing methods often overlook the uneven deterioration of model capabilities and incur high computational costs. Moreover, som… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  13. arXiv:2502.06892  [pdf, other

    cs.LG cs.AI

    Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks

    Authors: Bowei He, Lihao Yin, Hui-Ling Zhen, Jianping Zhang, Lanqing Hong, Mingxuan Yuan, Chen Ma

    Abstract: The widespread deployment of pre-trained language models (PLMs) has exposed them to textual backdoor attacks, particularly those planted during the pre-training stage. These attacks pose significant risks to high-reliability applications, as they can stealthily affect multiple downstream tasks. While certifying robustness against such threats is crucial, existing defenses struggle with the high-di… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025

  14. arXiv:2502.05795  [pdf, other

    cs.LG cs.AI

    The Curse of Depth in Large Language Models

    Authors: Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

    Abstract: In this paper, we introduce the Curse of Depth, a concept that highlights, explains, and addresses the recent observation in modern Large Language Models(LLMs) where nearly half of the layers are less effective than expected. We first confirm the wide existence of this phenomenon across the most popular families of LLMs such as Llama, Mistral, DeepSeek, and Qwen. Our analysis, theoretically and em… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  15. arXiv:2501.18277  [pdf, other

    cs.LG

    Sebra: Debiasing Through Self-Guided Bias Ranking

    Authors: Adarsh Kappiyath, Abhra Chaudhuri, Ajay Jaiswal, Ziquan Liu, Yunpeng Li, Xiatian Zhu, Lu Yin

    Abstract: Ranking samples by fine-grained estimates of spuriosity (the degree to which spurious cues are present) has recently been shown to significantly benefit bias mitigation, over the traditional binary biased-\textit{vs}-unbiased partitioning of train sets. However, this spuriosity ranking comes with the requirement of human supervision. In this paper, we propose a debiasing framework based on our nov… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted to ICLR 2025

  16. arXiv:2501.16673  [pdf, other

    cs.CL

    LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

    Authors: Li Yin, Zhangyang Wang

    Abstract: Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows. Yet, prompt engineering -- the task of crafting textual inputs to effectively direct LLMs -- remains difficult and labor-intensive, particularly for complex pipelines that combine multiple LLM calls with functional operations l… ▽ More

    Submitted 30 January, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  17. arXiv:2501.14312  [pdf, other

    cs.DC cs.LG

    Locality-aware Fair Scheduling in LLM Serving

    Authors: Shiyi Cao, Yichuan Wang, Ziming Mao, Pin-Lun Hsu, Liangsheng Yin, Tian Xia, Dacheng Li, Shu Liu, Yineng Zhang, Yang Zhou, Ying Sheng, Joseph Gonzalez, Ion Stoica

    Abstract: Large language model (LLM) inference workload dominates a wide variety of modern AI applications, ranging from multi-turn conversation to document analysis. Balancing fairness and efficiency is critical for managing diverse client workloads with varying prefix patterns. Unfortunately, existing fair scheduling algorithms for LLM serving, such as Virtual Token Counter (VTC), fail to take prefix loca… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  18. UNet--: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections

    Authors: Lingxiao Yin, Wei Tao, Dongyue Zhao, Tadayuki Ito, Kinya Osa, Masami Kato, Tse-Wei Chen

    Abstract: U-Net models with encoder, decoder, and skip-connections components have demonstrated effectiveness in a variety of vision tasks. The skip-connections transmit fine-grained information from the encoder to the decoder. It is necessary to maintain the feature maps used by the skip-connections in memory before the decoding stage. Therefore, they are not friendly to devices with limited resource. In t… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 17 pages, 7 figures, accepted by ACCV2024

    Journal ref: Computer Vision - ACCV 2024, volume 15478, 185-201

  19. arXiv:2412.16202  [pdf, other

    cs.CV cs.LG

    Aspect-Based Few-Shot Learning

    Authors: Tim van Engeland, Lu Yin, Vlado Menkovski

    Abstract: We generalize the formulation of few-shot learning by introducing the concept of an aspect. In the traditional formulation of few-shot learning, there is an underlying assumption that a single "true" label defines the content of each data point. This label serves as a basis for the comparison between the query object and the objects in the support set. However, when a human expert is asked to exec… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  20. arXiv:2412.13795  [pdf, other

    cs.LG cs.AI

    Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

    Authors: Pengxiang Li, Lu Yin, Shiwei Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success, yet recent findings reveal that their deeper layers often contribute minimally and can be pruned without affecting overall performance. While some view this as an opportunity for model compression, we identify it as a training shortfall rooted in the widespread use of Pre-Layer Normalization (Pre-LN). We demonstrate that Pre-LN, common… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  21. arXiv:2412.06258  [pdf, other

    cs.CV

    Enhanced Multi-Object Tracking Using Pose-based Virtual Markers in 3x3 Basketball

    Authors: Li Yin, Calvin Yeung, Qingrui Hu, Jun Ichikawa, Hirotsugu Azechi, Susumu Takahashi, Keisuke Fujii

    Abstract: Multi-object tracking (MOT) is crucial for various multi-agent analyses such as evaluating team sports tactics and player movements and performance. While pedestrian tracking has advanced with Tracking-by-Detection MOT, team sports like basketball pose unique challenges. These challenges include players' unpredictable movements, frequent close interactions, and visual similarities that complicate… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  22. arXiv:2412.00069  [pdf, other

    cs.LG cs.CL

    Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning

    Authors: Mingyu Cao, Gen Li, Jie Ji, Jiaqi Zhang, Xiaolong Ma, Shiwei Liu, Lu Yin

    Abstract: Mixture-of-Experts (MoE) has garnered significant attention for its ability to scale up neural networks while utilizing the same or even fewer active parameters. However, MoE does not alleviate the massive memory requirements of networks, which limits their practicality in real-world applications, especially in the era of large language models (LLMs). While recent work explores the possibility of… ▽ More

    Submitted 16 February, 2025; v1 submitted 25 November, 2024; originally announced December 2024.

  23. arXiv:2411.13545  [pdf, other

    cs.CV

    Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

    Authors: Andy Li, Aiden Durrant, Milan Markovic, Tianjin Huang, Souvik Kundu, Tianlong Chen, Lu Yin, Georgios Leontidis

    Abstract: Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks, crucial for deploying models on memory and power-constrained devices. While recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%, accuracy quickly deteriorates when pushing sparsities to ext… ▽ More

    Submitted 10 March, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: V3: moderate revisions and overall improvements, 12 pages, 6 figures, 5 tables, including supplementary material

  24. arXiv:2411.10609  [pdf, other

    cs.CY cs.SI

    Labeled Datasets for Research on Information Operations

    Authors: Ozgur Can Seckin, Manita Pote, Alexander Nwala, Lake Yin, Luca Luceri, Alessandro Flammini, Filippo Menczer

    Abstract: Social media platforms have become a hub for political activities and discussions, democratizing participation in these endeavors. However, they have also become an incubator for manipulation campaigns, like information operations (IOs). Some social media platforms have released datasets related to such IOs originating from different countries. However, we lack comprehensive control data that can… ▽ More

    Submitted 19 November, 2024; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: 7 pages, 2 figures, 1 table

  25. arXiv:2411.07711  [pdf, other

    cs.LG cs.RO

    OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework

    Authors: Jiaxi Li, Lu Yin, Xilu Wang

    Abstract: The integration of Large Language Models (LLMs) into autonomous driving systems offers promising enhancements in environmental understanding and decision-making. However, the substantial computational demands of deploying LLMs locally on vehicles render this approach unfeasible for real-world automotive applications. To address this challenge, we introduce OWLed, the Outlier-Weighed Layerwise Prun… ▽ More

    Submitted 27 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  26. arXiv:2411.06229  [pdf, other

    cs.AI

    Multimodal Contrastive Learning of Urban Space Representations from POI Data

    Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Lu Yin, Junyuan Liu

    Abstract: Existing methods for learning urban space representations from Point-of-Interest (POI) data face several limitations, including issues with geographical delineation, inadequate spatial information modelling, underutilisation of POI semantic attributes, and computational inefficiencies. To address these issues, we propose CaLLiPer (Contrastive Language-Location Pre-training), a novel representation… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: 19 pages, 5 figures, 7 tables

  27. arXiv:2411.02442  [pdf, other

    cs.CL cs.AI cs.IR

    TODO: Enhancing LLM Alignment with Ternary Preferences

    Authors: Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang

    Abstract: Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary Bradley-Terry (BT) model, which can struggle to capture the complexities of human preferences -- particularly in the presence of noisy or inconsistent labels and frequent… ▽ More

    Submitted 28 March, 2025; v1 submitted 2 November, 2024; originally announced November 2024.

    Comments: Accepted to ICLR 2025

  28. arXiv:2410.07771  [pdf, other

    cs.SD cs.AI cs.CL cs.CV eess.AS

    Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

    Authors: Adriana Fernandez-Lopez, Shiwei Liu, Lu Yin, Stavros Petridis, Maja Pantic

    Abstract: This paper investigates the under-explored area of low-rank weight training for large-scale Conformer-based speech recognition models from scratch. Our study demonstrates the viability of this training paradigm for such models, yielding several notable findings. Firstly, we discover that applying a low-rank structure exclusively to the attention modules can unexpectedly enhance performance, even w… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Submitted to ICASSP 2025

  29. arXiv:2410.07461  [pdf, other

    cs.CL

    Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning

    Authors: Abhinav Bandari, Lu Yin, Cheng-Yu Hsieh, Ajay Kumar Jaiswal, Tianlong Chen, Li Shen, Ranjay Krishna, Shiwei Liu

    Abstract: Network pruning has emerged as a potential solution to make LLMs cheaper to deploy. However, existing LLM pruning approaches universally rely on the C4 dataset as the calibration data for calculating pruning scores, leaving its optimality unexplored. In this study, we evaluate the choice of calibration data on LLM pruning, across a wide range of datasets that are most commonly used in LLM training… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024

  30. arXiv:2410.05970  [pdf, other

    cs.CV cs.AI cs.CL

    PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

    Authors: Xudong Xie, Hao Yan, Liang Yin, Yang Liu, Jing Ding, Minghui Liao, Yuliang Liu, Wei Chen, Xiang Bai

    Abstract: Multimodal document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task. However, existing methods typically focus on either plain text or a limited number of document images, struggling to handle long PDF documents with interleaved tex… ▽ More

    Submitted 20 January, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

  31. arXiv:2410.05733  [pdf, other

    cs.LG cs.CR

    Private and Communication-Efficient Federated Learning based on Differentially Private Sketches

    Authors: Meifan Zhang, Zhanhong Xie, Lihua Yin

    Abstract: Federated learning (FL) faces two primary challenges: the risk of privacy leakage due to parameter sharing and communication inefficiencies. To address these challenges, we propose DPSFL, a federated learning method that utilizes differentially private sketches. DPSFL compresses the local gradients of each client using a count sketch, thereby improving communication efficiency, while adding noise… ▽ More

    Submitted 9 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  32. arXiv:2410.05465  [pdf, other

    cs.AI cs.LG

    On the Expressive Power of Tree-Structured Probabilistic Circuits

    Authors: Lang Yin, Han Zhao

    Abstract: Probabilistic circuits (PCs) have emerged as a powerful framework to compactly represent probability distributions for efficient and exact probabilistic inference. It has been shown that PCs with a general directed acyclic graph (DAG) structure can be understood as a mixture of exponentially (in its height) many components, each of which is a product distribution over univariate marginals. However… ▽ More

    Submitted 24 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: This paper was accepted to NeurIPS 2024. This version uses a more accurate terminology for a complexity class, and adds a preliminary paragraph on relevant complexity classes

  33. arXiv:2409.17798  [pdf, other

    cs.RO

    Swarm-LIO2: Decentralized, Efficient LiDAR-inertial Odometry for UAV Swarms

    Authors: Fangcheng Zhu, Yunfan Ren, Longji Yin, Fanze Kong, Qingbo Liu, Ruize Xue, Wenyi Liu, Yixi Cai, Guozheng Lu, Haotian Li, Fu Zhang

    Abstract: Aerial swarm systems possess immense potential in various aspects, such as cooperative exploration, target tracking, search and rescue. Efficient, accurate self and mutual state estimation are the critical preconditions for completing these swarm tasks, which remain challenging research topics. This paper proposes Swarm-LIO2: a fully decentralized, plug-and-play, computationally efficient, and ban… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 23 Pages

  34. arXiv:2409.09196  [pdf, other

    cs.CV cs.LG

    Are Sparse Neural Networks Better Hard Sample Learners?

    Authors: Qiao Xiao, Boqian Wu, Lu Yin, Christopher Neil Gadzinski, Tianjin Huang, Mykola Pechenizkiy, Decebal Constantin Mocanu

    Abstract: While deep learning has demonstrated impressive progress, it remains a daunting challenge to learn from hard samples as these samples are usually noisy and intricate. These hard samples play a crucial role in the optimal performance of deep neural networks. Most research on Sparse Neural Networks (SNNs) has focused on standard training data, leaving gaps in understanding their effectiveness on com… ▽ More

    Submitted 27 December, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at British Machine Vision Conference (BMVC 2024)

  35. arXiv:2409.07372  [pdf, other

    cs.CL cs.AI cs.HC

    Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination

    Authors: Daniel Zhang-Li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li

    Abstract: The vast pre-existing slides serve as rich and important materials to carry lecture knowledge. However, effectively leveraging lecture slides to serve students is difficult due to the multi-modal nature of slide content and the heterogeneous teaching actions. We study the problem of discovering effective designs that convert a slide into an interactive lecture. We develop Slide2Lecture, a tuning-f… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  36. arXiv:2409.00651  [pdf, other

    nlin.CD cs.CY cs.LG q-bio.QM

    Adapting Physics-Informed Neural Networks for Bifurcation Detection in Ecological Migration Models

    Authors: Lujie Yin, Xing Lv

    Abstract: In this study, we explore the application of Physics-Informed Neural Networks (PINNs) to the analysis of bifurcation phenomena in ecological migration models. By integrating the fundamental principles of diffusion-advection-reaction equations with deep learning techniques, we address the complexities of species migration dynamics, particularly focusing on the detection and analysis of Hopf bifurca… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  37. arXiv:2408.07364  [pdf, other

    cs.LG

    Robust Active Learning (RoAL): Countering Dynamic Adversaries in Active Learning with Elastic Weight Consolidation

    Authors: Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy

    Abstract: Despite significant advancements in active learning and adversarial attacks, the intersection of these two fields remains underexplored, particularly in developing robust active learning frameworks against dynamic adversarial threats. The challenge of developing robust active learning frameworks under dynamic adversarial attacks is critical, as these attacks can lead to catastrophic forgetting wit… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  38. arXiv:2407.11239  [pdf, other

    cs.LG

    From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

    Authors: Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

    Abstract: Modern Large Language Models (LLMs) are composed of matrices with billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. Being significantly large, such matrices can often be expressed in low-rank format with potential to relax resource requirements. Unlike prior works which focus on developing novel matrix decomposition algo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  39. arXiv:2407.08296  [pdf, other

    cs.LG

    Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

    Authors: Zhenyu Zhang, Ajay Jaiswal, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

    Abstract: Training Large Language Models (LLMs) is memory-intensive due to the large number of parameters and associated optimization states. GaLore, a recent method, reduces memory usage by projecting weight gradients into a low-rank subspace without compromising performance. However, GaLore relies on time-consuming Singular Value Decomposition (SVD) operations to identify the subspace, and the frequent su… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  40. arXiv:2406.18884  [pdf, other

    cs.AI

    Sequential three-way group decision-making for double hierarchy hesitant fuzzy linguistic term set

    Authors: Nanfang Luo, Qinghua Zhang, Qin Xie, Yutai Wang, Longjun Yin, Guoyin Wang

    Abstract: Group decision-making (GDM) characterized by complexity and uncertainty is an essential part of various life scenarios. Most existing researches lack tools to fuse information quickly and interpret decision results for partially formed decisions. This limitation is particularly noticeable when there is a need to improve the efficiency of GDM. To address this issue, a novel multi-level sequential t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  41. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  42. arXiv:2406.17614  [pdf, other

    cs.CV cs.MM

    MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

    Authors: Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic

    Abstract: Pre-trained models have been a foundational approach in speech recognition, albeit with associated additional costs. In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch. This approach, abbreviated as \textbf{MSRS} (Multimodal Speech Recognition from Scratch), introduces a sparse regulari… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  43. arXiv:2405.19850  [pdf, other

    cs.AI

    Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models

    Authors: Yuxiao Luo, Zhongcai Cao, Xin Jin, Kang Liu, Ling Yin

    Abstract: Understanding human mobility patterns is essential for various applications, from urban planning to public safety. The individual trajectory such as mobile phone location data, while rich in spatio-temporal information, often lacks semantic detail, limiting its utility for in-depth mobility analysis. Existing methods can infer basic routine activity sequences from this data, lacking depth in under… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  44. arXiv:2405.18380  [pdf, other

    cs.LG cs.AI cs.CL

    OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

    Authors: Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu

    Abstract: The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we p… ▽ More

    Submitted 12 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  45. arXiv:2405.11419  [pdf, other

    cs.DB cs.CR

    Sketches-based join size estimation under local differential privacy

    Authors: Meifan Zhang, Xin Liu, Lihua Yin

    Abstract: Join size estimation on sensitive data poses a risk of privacy leakage. Local differential privacy (LDP) is a solution to preserve privacy while collecting sensitive data, but it introduces significant noise when dealing with sensitive join attributes that have large domains. Employing probabilistic structures such as sketches is a way to handle large domains, but it leads to hash-collision errors… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  46. arXiv:2405.04781  [pdf, other

    cs.CL

    CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization

    Authors: Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing zhang

    Abstract: Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  47. arXiv:2404.16522  [pdf, other

    eess.IV cs.LG

    A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

    Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

    Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.03865  [pdf, other

    cs.CL cs.LG

    FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

    Authors: Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

    Abstract: Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.01382

  49. arXiv:2402.14276  [pdf, other

    eess.SP cs.IT

    Bispectrum Unbiasing for Dilation-Invariant Multi-reference Alignment

    Authors: Liping Yin, Anna Little, Matthew Hirn

    Abstract: Motivated by modern data applications such as cryo-electron microscopy, the goal of classic multi-reference alignment (MRA) is to recover an unknown signal $f: \mathbb{R} \to \mathbb{R}$ from many observations that have been randomly translated and corrupted by additive noise. We consider a generalization of classic MRA where signals are also corrupted by a random scale change, i.e. dilation. We p… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  50. arXiv:2402.11903  [pdf, other

    cs.CL cs.AI

    DiLA: Enhancing LLM Tool Learning with Differential Logic Layer

    Authors: Yu Zhang, Hui-Ling Zhen, Zehua Pei, Yingzhao Lian, Lihao Yin, Mingxuan Yuan, Bei Yu

    Abstract: Considering the challenges faced by large language models (LLMs) in logical reasoning and planning, prior efforts have sought to augment LLMs with access to external solvers. While progress has been made on simple reasoning problems, solving classical constraint satisfaction problems, such as the Boolean Satisfiability Problem (SAT) and Graph Coloring Problem (GCP), remains difficult for off-the-s… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.12295 by other authors