Skip to main content

Showing 1–50 of 855 results for author: Li, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.07184  [pdf, other

    cs.CL

    Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs

    Authors: Yifan Wei, Xiaoyan Yu, Tengfei Pan, Angsheng Li, Li Du

    Abstract: Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora, yet their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research, where high factual precision is required. While synthetic data provides a promising avenue for augmenting domain knowledge, existing methods frequently generate redundant sample… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  3. arXiv:2505.06630  [pdf, other

    cs.CL cs.AI

    Dynamic Domain Information Modulation Algorithm for Multi-domain Sentiment Analysis

    Authors: Chunyi Yue, Ang Li

    Abstract: Multi-domain sentiment classification aims to mitigate poor performance models due to the scarcity of labeled data in a single domain, by utilizing data labeled from various domains. A series of models that jointly train domain classifiers and sentiment classifiers have demonstrated their advantages, because domain classification helps generate necessary information for sentiment classification. I… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 17 pages, 5 figures, 3 tables

    ACM Class: I.2.7

  4. Economic Analysis and Optimization of Energy Storage Configuration for Park Power Systems Based on Random Forest and Genetic Algorithm

    Authors: Yanghui Song, Aoqi Li, Lilei Huo

    Abstract: This study aims to analyze the economic performance of various parks under different conditions, particularly focusing on the operational costs and power load balancing before and after the deployment of energy storage systems. Firstly, the economic performance of the parks without energy storage was analyzed using a random forest model. Taking Park A as an example, it was found that the cost had… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 8 pages, 8 figures,International Journal of New Developments in Engineering and Society ISSN 2522-3488 Vol. 8, Issue 4: 22-29

  5. arXiv:2505.04558  [pdf, other

    cs.LG cs.AI

    Purity Law for Generalizable Neural TSP Solvers

    Authors: Wenzhao Liu, Haoran Li, Congying Han, Zicheng Zhang, Anqi Li, Tiande Guo

    Abstract: Achieving generalization in neural approaches across different scales and distributions remains a significant challenge for the Traveling Salesman Problem~(TSP). A key obstacle is that neural networks often fail to learn robust principles for identifying universal patterns and deriving optimal solutions from diverse instances. In this paper, we first uncover Purity Law (PuLa), a fundamental struct… ▽ More

    Submitted 10 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  6. arXiv:2505.01003  [pdf, other

    cs.CV

    3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer

    Authors: Kamel Aouaidjia, Aofan Li, Wenhao Zhang, Chongsheng Zhang

    Abstract: Nowadays, Transformers and Graph Convolutional Networks (GCNs) are the prevailing techniques for 3D human pose estimation. However, Transformer-based methods either ignore the spatial neighborhood relationships between the joints when used for skeleton representations or disregard the local temporal patterns of the local joint movements in skeleton sequence modeling, while GCN-based methods often… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 16 pages, 9 figures, 7 tables

  7. arXiv:2504.20972  [pdf, other

    cs.CL

    SetKE: Knowledge Editing for Knowledge Elements Overlap

    Authors: Yifan Wei, Xiaoyan Yu, Ran Song, Hao Peng, Angsheng Li

    Abstract: Large Language Models (LLMs) excel in tasks such as retrieval and question answering but require updates to incorporate new knowledge and reduce inaccuracies and hallucinations. Traditional updating methods, like fine-tuning and incremental learning, face challenges such as overfitting and high computational costs. Knowledge Editing (KE) provides a promising alternative but often overlooks the Kno… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: The CR version will be updated subsequently

    Journal ref: IJCAI 2025

  8. arXiv:2504.19507  [pdf, other

    cs.IT

    From Freshness to Effectiveness: Goal-Oriented Sampling for Remote Decision Making

    Authors: Aimin Li, Shaohua Wu, Gary C. F. Lee, Sumei Sun

    Abstract: Data freshness, measured by Age of Information (AoI), is highly relevant in networked applications such as Vehicle to Everything (V2X), smart health systems, and Industrial Internet of Things (IIoT). Yet, freshness alone does not equate to informativeness. In decision-critical settings, some stale data may prove more valuable than fresh updates. To explore this nuance, we move beyond AoI-centric p… ▽ More

    Submitted 5 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

    Comments: 35 pages. Submitted to the IEEE Transactions on Information Theory

  9. arXiv:2504.18451  [pdf, other

    cs.LG

    Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning

    Authors: Tewodros Alemu Ayall, Andy Li, Matthew Beddows, Milan Markovic, Georgios Leontidis

    Abstract: Due to rapid population growth globally, digitally-enabled agricultural sectors are crucial for sustainable food production and making informed decisions about resource management for farmers and various stakeholders. The deployment of Internet of Things (IoT) technologies that collect real-time observations of various environmental (e.g., temperature, humidity, etc.) and operational factors (e.g.… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 20 pages, 11 figures

  10. arXiv:2504.17116  [pdf, other

    quant-ph cs.AR

    OneAdapt: Adaptive Compilation for Resource-Constrained Photonic One-Way Quantum Computing

    Authors: Hezi Zhang, Jixuan Ruan, Dean Tullsen, Yufei Ding, Ang Li, Travis S. Humble

    Abstract: Measurement-based quantum computing (MBQC), a.k.a. one-way quantum computing (1WQC), is a universal quantum computing model, which is particularly well-suited for photonic platforms. In this model, computation is driven by measurements on an entangled state, which serves as an intermediate representation (IR) between program and hardware. However, compilers on previous IRs lacks the adaptability t… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  11. arXiv:2504.16601  [pdf, other

    cs.CL cs.AI

    Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study

    Authors: Andy Li, Wei Zhou, Rashina Hoda, Chris Bain, Peter Poon

    Abstract: This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese. It assesses both patient, friendly and clinician, focused texts using standard automated metrics. Results showed that traditional MT tools generally performed better, especially for complex texts, while… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 8 pages, 2 tables and 1 Figure

  12. arXiv:2504.16520  [pdf

    cs.CV q-bio.NC

    A Few-Shot Metric Learning Method with Dual-Channel Attention for Cross-Modal Same-Neuron Identification

    Authors: Wenwei Li, Liyi Cai, Wu Chen, Anan Li

    Abstract: In neuroscience research, achieving single-neuron matching across different imaging modalities is critical for understanding the relationship between neuronal structure and function. However, modality gaps and limited annotations present significant challenges. We propose a few-shot metric learning method with a dual-channel attention mechanism and a pretrained vision transformer to enable robust… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 23 pages, 9 figures, submitted to arXiv for public access

  13. arXiv:2504.15513  [pdf, other

    cs.CV

    InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

    Authors: Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu

    Abstract: Image enhancement finds wide-ranging applications in real-world scenarios due to complex environments and the inherent limitations of imaging devices. Recent diffusion-based methods yield promising outcomes but necessitate prolonged and computationally intensive iterative sampling. In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-ba… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by ICLR 2025

  14. arXiv:2504.14470  [pdf, other

    cs.CV

    Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis

    Authors: Jingjing Ren, Wenbo Li, Zhongdao Wang, Haoze Sun, Bangzhen Liu, Haoyu Chen, Jiaqi Xu, Aoxue Li, Shifeng Zhang, Bin Shao, Yong Guo, Lei Zhu

    Abstract: Demand for 2K video synthesis is rising with increasing consumer expectations for ultra-clear visuals. While diffusion transformers (DiTs) have demonstrated remarkable capabilities in high-quality video generation, scaling them to 2K resolution remains computationally prohibitive due to quadratic growth in memory and processing costs. In this work, we propose Turbo2K, an efficient and practical fr… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Webpage at https://jingjingrenabc.github.io/turbo2k/

  15. arXiv:2504.14391  [pdf, other

    cs.CV

    How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?

    Authors: Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou

    Abstract: Publicly available biomedical videos, such as those on YouTube, serve as valuable educational resources for medical students. Unlike standard machine learning datasets, these videos are designed for human learners, often mixing medical imagery with narration, explanatory diagrams, and contextual framing. In this work, we investigate whether such pedagogically rich, yet non-standardized and heterog… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  16. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  17. arXiv:2504.12523  [pdf, other

    cs.CL cs.AI cs.LG

    Memorization vs. Reasoning: Updating LLMs with New Knowledge

    Authors: Aochong Oliver Li, Tanya Goyal

    Abstract: Large language models (LLMs) encode vast amounts of pre-trained knowledge in their parameters, but updating them as real-world information evolves remains a challenge. Existing methodologies and benchmarks primarily target entity substitutions, failing to capture the full breadth of complex real-world dynamics. In this paper, we introduce Knowledge Update Playground (KUP), an automatic pipeline fo… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 9 pages, 3 figures

  18. arXiv:2504.12276  [pdf, other

    cs.CV

    The Tenth NTIRE 2025 Image Denoising Challenge Report

    Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  19. arXiv:2504.10859  [pdf, other

    cs.CG cs.RO

    A Sublinear Algorithm for Path Feasibility Among Rectangular Obstacles

    Authors: Alex Fan, Alicia Li, Arul Kolla, Jason Gonzalez

    Abstract: The problem of finding a path between two points while avoiding obstacles is critical in robotic path planning. We focus on the feasibility problem: determining whether such a path exists. We model the robot as a query-specific rectangular object capable of moving parallel to its sides. The obstacles are axis-aligned, rectangular, and may overlap. Most previous works only consider nondisjoint rect… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  20. arXiv:2504.10369  [pdf, other

    cs.AR cs.AI cs.LG cs.PL

    SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning

    Authors: Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Yexiao He, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang Li

    Abstract: Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 16 pages, 8 figures, 7 tables. Under Review

  21. arXiv:2504.09058  [pdf, other

    cs.AI cs.CL

    Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement

    Authors: Chengyuan Liu, Shihang Wang, Lizhi Qing, Kaisong Song, Junjie Cao, Jun Lin, Ji Zhang, Ang Li, Kun Kuang, Fei Wu

    Abstract: Recently, stepwise supervision on Chain of Thoughts (CoTs) presents an enhancement on the logical reasoning tasks such as coding and math, with the help of Monte Carlo Tree Search (MCTS). However, its contribution to tasks requiring domain-specific expertise and knowledge remains unexplored. Motivated by the interest, we identify several potential challenges of vanilla MCTS within this context, an… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Under review

  22. arXiv:2504.08979  [pdf, other

    cs.DB cs.HC

    A Formalism and Library for Database Visualization

    Authors: Eugene Wu, Xiang Yu Tuang, Antonio Li, Vareesh Bainwala

    Abstract: Existing data visualization formalisms are restricted to single-table inputs, which makes existing visualization grammars like Vega-lite or ggplot2 tedious to use, have overly complex APIs, and unsound when visualization multi-table data. This paper presents the first visualization formalism to support databases as input -- in other words, *database visualization*. A visualization specification is… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  23. arXiv:2504.07453  [pdf

    cs.NE

    Probability Estimation and Scheduling Optimization for Battery Swap Stations via LRU-Enhanced Genetic Algorithm and Dual-Factor Decision System

    Authors: Anzhen Li, Shufan Qing, Xiaochang Li, Rui Mao, Mingchen Feng

    Abstract: To address the challenges of limited Battery Swap Stations datasets, high operational costs, and fluctuating user charging demand, this research proposes a probability estimation model based on charging pile data and constructs nine scenario-specific battery swap demand datasets. In addition, this study combines Least Recently Used strategy with Genetic Algorithm and incorporates a guided search m… ▽ More

    Submitted 1 May, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: 14 pages, accepted by ICIC 2025 Oral

  24. arXiv:2504.07418  [pdf, other

    cs.CV

    ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement

    Authors: Anning Hu, Ang Li, Xirui Jin, Danping Zou

    Abstract: We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scal… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 7 pages, 6 figures, 4 tables. Accepted to IEEE ICRA 2025. This is the preprint version

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2025

  25. arXiv:2504.07048  [pdf, other

    cs.CR cs.ET

    Context Switching for Secure Multi-programming of Near-Term Quantum Computers

    Authors: Avinash Kumar, Meng Wang, Chenxu Liu, Ang Li, Prashant J. Nair, Poulami Das

    Abstract: Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program… ▽ More

    Submitted 17 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  26. arXiv:2504.06939  [pdf, other

    cs.SE

    FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks

    Authors: Dekun Dai, MingWei Liu, Anji Li, Jialun Cao, Yanlin Wang, Chong Wang, Xin Peng, Zibin Zheng

    Abstract: Code repair is a fundamental task in software development, facilitating efficient bug resolution and software maintenance. Although large language models (LLMs) have demonstrated considerable potential in automated code repair, their ability to comprehend and effectively leverage diverse types of feedback remains insufficiently understood. To bridge this gap, we introduce FeedbackEval, a systemati… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  27. arXiv:2504.05536  [pdf, other

    cs.DC cs.DB

    dpBento: Benchmarking DPUs for Data Processing

    Authors: Jiasheng Hu, Chihan Cui, Anna Li, Raahil Vora, Yuanfan Chen, Philip A. Bernstein, Jialin Li, Qizhen Zhang

    Abstract: Data processing units (DPUs, SoC-based SmartNICs) are emerging data center hardware that provide opportunities to address cloud data processing challenges. Their onboard compute, memory, network, and auxiliary storage can be leveraged to offload a variety of data processing tasks. Although recent work shows promising benefits of DPU offloading for specific operations, a comprehensive view of the i… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    ACM Class: H.2.4; C.2.4

  28. arXiv:2504.03932  [pdf, other

    cs.CL

    YaleNLP @ PerAnsSumm 2025: Multi-Perspective Integration via Mixture-of-Agents for Enhanced Healthcare QA Summarization

    Authors: Dongsuk Jang, Alan Li, Arman Cohan

    Abstract: Automated summarization of healthcare community question-answering forums is challenging due to diverse perspectives presented across multiple user responses to each question. The PerAnsSumm Shared Task was therefore proposed to tackle this challenge by identifying perspectives from different answers and then generating a comprehensive answer to the question. In this study, we address the PerAnsSu… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Paper accepted at CL4HEALTH @ NAACL 2025: Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics

  29. arXiv:2504.02605  [pdf, other

    cs.SE cs.AI cs.CL

    Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

    Authors: Daoguang Zan, Zhirong Huang, Wei Liu, Hanwu Chen, Linhao Zhang, Shulin Xin, Lu Chen, Qi Liu, Xiaojian Zhong, Aoyan Li, Siyao Liu, Yongsheng Xiao, Liangqiang Chen, Yuyu Zhang, Jing Su, Tianyu Liu, Rui Long, Kai Shen, Liang Xiang

    Abstract: The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across diverse software ecosystems. To address this, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering Jav… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  30. arXiv:2504.00906  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

    Authors: Saaket Agashe, Kyle Wong, Vincent Tu, Jiachen Yang, Ang Li, Xin Eric Wang

    Abstract: Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries. However, current agents face significant challenges: imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performanc… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 18 pages, 13 figures, 8 tables

  31. arXiv:2504.00899  [pdf

    cs.CY cs.AI cs.LG

    Role and Use of Race in AI/ML Models Related to Health

    Authors: Martin C. Were, Ang Li, Bradley A. Malin, Zhijun Yin, Joseph R. Coco, Benjamin X. Collins, Ellen Wright Clayton, Laurie L. Novak, Rachele Hendricks-Sturrup, Abiodun Oluyomi, Shilo Anders, Chao Yan

    Abstract: The role and use of race within health-related artificial intelligence and machine learning (AI/ML) models has sparked increasing attention and controversy. Despite the complexity and breadth of related issues, a robust and holistic framework to guide stakeholders in their examination and resolution remains lacking. This perspective provides a broad-based, systematic, and cross-cutting landscape a… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  32. arXiv:2504.00342  [pdf, other

    cs.RO cs.LG

    Aligning Diffusion Model with Problem Constraints for Trajectory Optimization

    Authors: Anjian Li, Ryne Beeson

    Abstract: Diffusion models have recently emerged as effective generative frameworks for trajectory optimization, capable of producing high-quality and diverse solutions. However, training these models in a purely data-driven manner without explicit incorporation of constraint information often leads to violations of critical constraints, such as goal-reaching, collision avoidance, and adherence to system dy… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  33. arXiv:2503.23397  [pdf, other

    cs.DB cs.DS cs.PF

    FB$^+$-tree: A Memory-Optimized B$^+$-tree with Latch-Free Update

    Authors: Yuan Chen, Ao Li, Wenhai Li, Lingfeng Deng

    Abstract: B$^+$-trees are prevalent in traditional database systems due to their versatility and balanced structure. While binary search is typically utilized for branch operations, it may lead to inefficient cache utilization in main-memory scenarios. In contrast, trie-based index structures drive branch operations through prefix matching. While these structures generally produce fewer cache misses and are… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 14 pages,17 figures

  34. arXiv:2503.22720  [pdf, other

    cs.LG cs.AI

    Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

    Authors: Bowei Tian, Xuntao Lyu, Meng Liu, Hongyi Wang, Ang Li

    Abstract: Representation Engineering (RepE) has emerged as a powerful paradigm for enhancing AI transparency by focusing on high-level representations rather than individual neurons or circuits. It has proven effective in improving interpretability and control, showing that representations can emerge, propagate, and shape final model outputs in large language models (LLMs). However, in Vision-Language Model… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  35. arXiv:2503.20197  [pdf, other

    cs.SE

    Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework

    Authors: Zike Li, Mingwei Liu, Anji Li, Kaifeng He, Yanlin Wang, Xin Peng, Zibin Zheng

    Abstract: Ensuring the robustness of code generated by large language models (LLMs) is crucial for real-world reliability. However, existing evaluations predominantly focus on correctness, often neglecting key robustness concerns such as missing input validation and insufficient error handling. In this paper, we present the first empirical study on the robustness of LLM-generated code. We introduce novel ro… ▽ More

    Submitted 1 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: 10 pages

  36. arXiv:2503.18478  [pdf, other

    cs.CV

    Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding

    Authors: Xiangrui Liu, Yan Shu, Zheng Liu, Ao Li, Yang Tian, Bo Zhao

    Abstract: Despite advanced token compression techniques, existing multimodal large language models (MLLMs) still struggle with hour-long video understanding. In this work, we propose Video-XL-Pro, an efficient method for extremely long video understanding, built upon Reconstructive Compression of Tokens (ReCoT), a learnable module that leverages self-supervised learning to generate comprehensive and compact… ▽ More

    Submitted 27 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  37. arXiv:2503.16527  [pdf, other

    cs.CL cs.AI cs.CY cs.SI

    LLM Generated Persona is a Promise with a Catch

    Authors: Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng

    Abstract: The use of large language models (LLMs) to simulate human behavior has gained significant attention, particularly through personas that approximate individual characteristics. Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback, including social science, economic analysis, marketing research, and business operations. Traditional methods to col… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  38. arXiv:2503.16416  [pdf, other

    cs.AI cs.CL cs.LG

    Survey on Evaluation of LLM-based Agents

    Authors: Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-Scheuer

    Abstract: The emergence of LLM-based agents represents a paradigm shift in AI, enabling autonomous systems to plan, reason, use tools, and maintain memory while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methodologies for these increasingly capable agents. We systematically analyze evaluation benchmarks and frameworks across four critical dimensio… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  39. arXiv:2503.15475  [pdf, other

    cs.CV

    Cube: A Roblox View of 3D Intelligence

    Authors: Foundation AI Team, Kiran Bhat, Nishchaie Khanna, Karun Channa, Tinghui Zhou, Yiheng Zhu, Xiaoxia Sun, Charles Shang, Anirudh Sudarshan, Maurice Chu, Daiqing Li, Kangle Deng, Jean-Philippe Fauconnier, Tijmen Verhulsdonck, Maneesh Agrawala, Kayvon Fatahalian, Alexander Weiss, Christian Reiser, Ravi Kiran Chirravuri, Ravali Kandur, Alejandro Pelaez, Akash Garg, Michael Palleschi, Jessica Wang, Skylar Litz , et al. (22 additional authors not shown)

    Abstract: Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation… ▽ More

    Submitted 14 April, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Our code and model weights can be found at: https://github.com/Roblox/cube

  40. arXiv:2503.14932  [pdf, other

    cs.CR cs.DC cs.LG

    Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices

    Authors: Ziyao Wang, Yexiao He, Zheyu Shen, Yu Li, Guoheng Sun, Myungjin Lee, Ang Li

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable abilities in various natural language processing tasks. However, adapting these models to specialized domains using private datasets stored on resource-constrained edge devices, such as smartphones and personal computers, remains challenging due to significant privacy concerns and limited computational resources. Existing m… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  41. arXiv:2503.13873  [pdf, other

    cs.IT

    Joint Transmission and Control in a Goal-oriented NOMA Network

    Authors: Kunpeng Liu, Shaohua Wu, Aimin Li, Qinyu Zhang

    Abstract: Goal-oriented communication shifts the focus from merely delivering timely information to maximizing decision-making effectiveness by prioritizing the transmission of high-value information. In this context, we introduce the Goal-oriented Tensor (GoT), a novel closed-loop metric designed to directly quantify the ultimate utility in Goal-oriented systems, capturing how effectively the transmitted i… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  42. arXiv:2503.12233  [pdf, ps, other

    cs.IT eess.SP

    Robust Full-Space Physical Layer Security for STAR-RIS-Aided Wireless Networks: Eavesdropper with Uncertain Location and Channel

    Authors: Han Xiao, Xiaoyan Hu, Ang Li, Wenjie Wang, Kun Yang

    Abstract: A robust full-space physical layer security (PLS) transmission scheme is proposed in this paper considering the full-space wiretapping challenge of wireless networks supported by simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). Different from the existing schemes, the proposed PLS scheme takes account of the uncertainty on the eavesdropper's position within t… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  43. arXiv:2503.11908  [pdf, other

    cs.DM cs.AI

    Revisiting FastMap: New Applications

    Authors: Ang Li

    Abstract: FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various g… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: PhD dissertation

  44. arXiv:2503.11251  [pdf, other

    cs.CV cs.CL

    Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

    Authors: Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong , et al. (29 additional authors not shown)

    Abstract: We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results de… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  45. arXiv:2503.08229  [pdf, other

    cs.CV

    Modeling Variants of Prompts for Vision-Language Models

    Authors: Ao Li, Zongfang Liu, Xinhua Li, Jinghui Zhang, Pengwei Wang, Hu Wang

    Abstract: Large pre-trained vision-language models (VLMs) offer a promising approach to leveraging human language for enhancing downstream tasks. However, VLMs such as CLIP face significant limitation: its performance is highly sensitive to prompt template design. Although prompt learning methods can address the sensitivity issue by replacing natural language prompts with learnable ones, they are incomprehe… ▽ More

    Submitted 12 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 10 pages

  46. arXiv:2503.07862  [pdf, other

    cs.CL

    cantnlp@DravidianLangTech2025: A Bag-of-Sounds Approach to Multimodal Hate Speech Detection

    Authors: Sidney Wong, Andrew Li

    Abstract: This paper presents the systems and results for the Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL) shared task at the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (DravidianLangTech-2025). We took a `bag-of-sounds' approach by training our hate speech detection system on the speech (audio) data using transformed Mel spectrogram measur… ▽ More

    Submitted 16 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted Fifth Workshop on Speech and Language Technologies for Dravidian Languages

  47. arXiv:2503.07367  [pdf, other

    cs.CV

    LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction

    Authors: Kangan Qian, Jinyu Miao, Ziang Luo, Zheng Fu, and Jinchen Li, Yining Shi, Yunlong Wang, Kun Jiang, Mengmeng Yang, Diange Yang

    Abstract: Accurate and reliable spatial and motion information plays a pivotal role in autonomous driving systems. However, object-level perception models struggle with handling open scenario categories and lack precise intrinsic geometry. On the other hand, occupancy-based class-agnostic methods excel in representing scenes but fail to ensure physics consistency and ignore the importance of interactions be… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures

  48. arXiv:2503.05066  [pdf, other

    cs.LG cs.AI cs.CL

    Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

    Authors: Shwai He, Weilin Cai, Jiayi Huang, Ang Li

    Abstract: The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation, optimizing the trade-off between performance and efficiency. However, under expert parallelism, MoE suffers from inference inefficiencies due to imbalanced token-to-expert assignment, where some experts are overloaded while others remain underutilized. This imbalance… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  49. arXiv:2503.04030  [pdf, other

    cs.CV

    Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration

    Authors: Aocheng Li, James R. Zimmer-Dauphinee, Rajesh Kalyanam, Ian Lindsay, Parker VanValkenburgh, Steven Wernke, Daniel Aliaga

    Abstract: Point cloud completion helps restore partial incomplete point clouds suffering occlusions. Current self-supervised methods fail to give high fidelity completion for large objects with missing surfaces and unbalanced distribution of available points. In this paper, we present a novel method for restoring large-scale point clouds with limited and imbalanced ground-truth. Using rough boundary annotat… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  50. arXiv:2503.01215  [pdf, other

    cs.LG stat.ML

    Architectural and Inferential Inductive Biases For Exchangeable Sequence Modeling

    Authors: Daksh Mittal, Ang Li, Tzu-Ching Yen, Daniel Guetta, Hongseok Namkoong

    Abstract: Autoregressive models have emerged as a powerful framework for modeling exchangeable sequences - i.i.d. observations when conditioned on some latent factor - enabling direct modeling of uncertainty from missing data (rather than a latent). Motivated by the critical role posterior inference plays as a subroutine in decision-making (e.g., active learning, bandits), we study the inferential and archi… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 35 Pages, 20 Figures