Skip to main content

Showing 1–50 of 3,308 results for author: Zhou, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10471  [pdf, other

    cs.SI

    Scalable Approximate Biclique Counting over Large Bipartite Graphs

    Authors: Jingbang Chen, Weinuo Li, Yingli Zhou, Hangrui Zhou, Qiuyang Mang, Can Wang, Yixiang Fang, Chenhao Ma

    Abstract: Counting $(p,q)$-bicliques in bipartite graphs is crucial for a variety of applications, from recommendation systems to cohesive subgraph analysis. Yet, it remains computationally challenging due to the combinatorial explosion to exactly count the $(p,q)$-bicliques. In many scenarios, e.g., graph kernel methods, however, exact counts are not strictly required. To design a scalable and high-quality… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10322  [pdf, other

    cs.LG math.OC

    Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework

    Authors: Yijie Zhou, Shi Pu

    Abstract: Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assum… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09986  [pdf, other

    cs.CV eess.IV

    High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

    Authors: Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terres… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.09172  [pdf, ps, other

    cs.AR eess.SP

    Automated SAR ADC Sizing Using Analytical Equations

    Authors: Zhongyi Li, Zhuofu Tao, Yanze Zhou, Yichen Shi, Zhiping Yu, Ting-Jung Lin, Lei He

    Abstract: Conventional analog and mixed-signal (AMS) circuit designs heavily rely on manual effort, which is time-consuming and labor-intensive. This paper presents a fully automated design methodology for Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs) from performance specifications to complete transistor sizing. To tackle the high-dimensional sizing problem, we propose a dual… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09087  [pdf

    q-bio.BM cs.LG

    A Comparative Review of RNA Language Models

    Authors: He Wang, Yikun Zhang, Jie Chen, Jian Zhan, Yaoqi Zhou

    Abstract: Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or prot… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.09074  [pdf, other

    cs.RO

    Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions

    Authors: Letian Wang, Marc-Antoine Lavoie, Sandro Papais, Barza Nisar, Yuxiao Chen, Wenhao Ding, Boris Ivanovic, Hao Shao, Abulikemu Abuduweili, Evan Cook, Yang Zhou, Peter Karkus, Jiachen Li, Changliu Liu, Marco Pavone, Steven Waslander

    Abstract: Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Initial draft, 162 pages, 40 figures, 13 tables

  7. arXiv:2505.08944  [pdf, ps, other

    cs.DC

    Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony

    Authors: Shaoyu Wang, Guangrong He, Geon-Woo Kim, Yanqi Zhou, Seo Jin Park

    Abstract: Mixture-of-Experts (MoE) architectures offer the promise of larger model capacity without the prohibitive costs of fully dense designs. However, in real-world inference serving, load skew across experts often leads to suboptimal device utilization and excessive synchronization overheads. This paper introduces Asynchronous Expert Parallelism (AEP), a new paradigm that decouples layer execution from… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.08438  [pdf, other

    cs.CV cs.AI

    A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering

    Authors: Chuanzhi Xu, Haoxian Zhou, Langyi Chen, Haodong Chen, Ying Zhou, Vera Chung, Qiang Qu

    Abstract: Event cameras have emerged as promising sensors for 3D reconstruction due to their ability to capture per-pixel brightness changes asynchronously. Unlike conventional frame-based cameras, they produce sparse and temporally rich data streams, which enable more accurate 3D reconstruction and open up the possibility of performing reconstruction in extreme environments such as high-speed motion, low l… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 35 pages, 12 figures, 11 tables

  10. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.07850  [pdf, other

    cs.CL cs.AI cs.CY

    A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas

    Authors: Pranav Narayanan Venkit, Jiayi Li, Yingfan Zhou, Sarah Rajtmajer, Shomir Wilson

    Abstract: As LLMs (large language models) are increasingly used to generate synthetic personas particularly in data-limited domains such as health, privacy, and HCI, it becomes necessary to understand how these narratives represent identity, especially that of minority communities. In this paper, we audit synthetic personas generated by 3 LLMs (GPT4o, Gemini 1.5 Pro, Deepseek 2.5) through the lens of repres… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  13. arXiv:2505.07849  [pdf, ps, other

    cs.SE cs.AI cs.IR

    SweRank: Software Issue Localization with Code Ranking

    Authors: Revanth Gangi Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, Shafiq Joty

    Abstract: Software issue localization, the task of identifying the precise code locations (files, classes, or functions) relevant to a natural language issue description (e.g., bug report, feature request), is a critical yet time-consuming aspect of software development. While recent LLM-based agentic approaches demonstrate promise, they often incur significant latency and cost due to complex multi-step rea… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  14. arXiv:2505.07431  [pdf, ps, other

    cs.IR

    Diffusion-driven SpatioTemporal Graph KANsformer for Medical Examination Recommendation

    Authors: Jianan Li, Yangtao Zhou, Zhifu Zhao, Qinglan Huang, Jian Qi, Xiao He, Hua Chu, Fu Li

    Abstract: Recommendation systems in AI-based medical diagnostics and treatment constitute a critical component of AI in healthcare. Although some studies have explored this area and made notable progress, healthcare recommendation systems remain in their nascent stage. And these researches mainly target the treatment process such as drug or disease recommendations. In addition to the treatment process, the… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  15. arXiv:2505.07263  [pdf, other

    cs.CV

    Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

    Authors: Xiaokun Wang, Chris, Jiangbo Pei, Wei Shen, Yi Peng, Yunzhuo Hao, Weijie Qiu, Ai Jian, Tianyidan Xie, Xuchen Song, Yang Liu, Yahui Zhou

    Abstract: We propose Skywork-VL Reward, a multimodal reward model that provides reward signals for both multimodal understanding and reasoning tasks. Our technical approach comprises two key components: First, we construct a large-scale multimodal preference dataset that covers a wide range of tasks and scenarios, with responses collected from both standard vision-language models (VLMs) and advanced VLM rea… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  16. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.06998  [pdf, ps, other

    cs.SI physics.soc-ph

    Assessing the Robustness and Reducibility of Multiplex Networks with Embedding-Aided Interlayer Similarities

    Authors: Haoran Nan, Senquan Wang, Chun Ouyang, Yanchen Zhou, Weiwei Gu

    Abstract: The study of interlayer similarity of multiplex networks helps to understand the intrinsic structure of complex systems, revealing how changes in one layer can propagate and affect others, thus enabling broad implications for transportation, social, and biological systems. Existing algorithms that measure similarity between network layers typically encode only partial information, which limits the… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  18. arXiv:2505.06706  [pdf, ps, other

    cs.AI

    Bi-level Mean Field: Dynamic Grouping for Large-Scale MARL

    Authors: Yuxuan Zheng, Yihe Zhou, Feiyang Xu, Mingli Song, Shunyu Liu

    Abstract: Large-scale Multi-Agent Reinforcement Learning (MARL) often suffers from the curse of dimensionality, as the exponential growth in agent interactions significantly increases computational complexity and impedes learning efficiency. To mitigate this, existing efforts that rely on Mean Field (MF) simplify the interaction landscape by approximating neighboring agents as a single mean agent, thus redu… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  19. arXiv:2505.06496  [pdf, ps, other

    cs.CL cs.AI

    xGen-small Technical Report

    Authors: Erik Nijkamp, Bo Pang, Egor Pakhomov, Akash Gokul, Jin Qu, Silvio Savarese, Yingbo Zhou, Caiming Xiong

    Abstract: We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning.… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  20. arXiv:2505.06120  [pdf, ps, other

    cs.CL cs.HC

    LLMs Get Lost In Multi-Turn Conversation

    Authors: Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, Jennifer Neville

    Abstract: Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions,… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  21. arXiv:2505.06055  [pdf, ps, other

    cs.CV

    Towards Better Cephalometric Landmark Detection with Diffusion Data Generation

    Authors: Dongqian Guo, Wencheng Han, Pang Lyu, Yuxi Zhou, Jianbing Shen

    Abstract: Cephalometric landmark detection is essential for orthodontic diagnostics and treatment planning. Nevertheless, the scarcity of samples in data collection and the extensive effort required for manual annotation have significantly impeded the availability of diverse datasets. This limitation has restricted the effectiveness of deep learning-based detection methods, particularly those based on large… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  22. arXiv:2505.05950  [pdf, ps, other

    cs.LG

    FloE: On-the-Fly MoE Inference on Memory-constrained GPU

    Authors: Yuxin Zhou, Zheng Li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou

    Abstract: With the widespread adoption of Mixture-of-Experts (MoE) models, there is a growing demand for efficient inference on memory-constrained devices. While offloading expert parameters to CPU memory and loading activated experts on demand has emerged as a potential solution, the large size of activated experts overburdens the limited PCIe bandwidth, hindering the effectiveness in latency-sensitive sce… ▽ More

    Submitted 11 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  23. arXiv:2505.05916  [pdf, ps, other

    cs.LG cs.AI

    IRNN: Innovation-driven Recurrent Neural Network for Time-Series Data Modeling and Prediction

    Authors: Yifan Zhou, Yibo Wang, Chao Shang

    Abstract: Many real-world datasets are time series that are sequentially collected and contain rich temporal information. Thus, a common interest in practice is to capture dynamics of time series and predict their future evolutions. To this end, the recurrent neural network (RNN) has been a prevalent and effective machine learning option, which admits a nonlinear state-space model representation. Motivated… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  24. arXiv:2505.05870  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Towards Facial Image Compression with Consistency Preserving Diffusion Prior

    Authors: Yimin Zhou, Yichong Xia, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reco… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  25. arXiv:2505.05768  [pdf, other

    eess.IV cs.AI cs.CV

    Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

    Authors: Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li, Pusheng Xu, Ruijie Yao, Lianhao Zhou, Yuxuan Zhou, Hui Feng, Qiping Zhou, Xinyue Wang, Shoujin Huang, Zihao Jin, Florence H. T. Chung, Shujun Wang, Yalin Zheng, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

    Abstract: Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 42 pages,5 tables, 12 figures, challenge report

  26. arXiv:2505.05763  [pdf

    cs.LG cs.CL

    BMMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection

    Authors: Yize Zhou, Jie Zhang, Meijie Wang, Lun Yu

    Abstract: Academic misconduct detection in biomedical research remains challenging due to algorithmic narrowness in existing methods and fragmented analytical pipelines. We present BMMDetect, a multimodal deep learning framework that integrates journal metadata (SJR, institutional data), semantic embeddings (PubMedBERT), and GPT-4o-mined textual attributes (methodological statistics, data anomalies) for hol… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  27. arXiv:2505.05533  [pdf, other

    cs.LG cs.AI

    Rethinking Graph Contrastive Learning through Relative Similarity Preservation

    Authors: Zhiyuan Ning, Pengfei Wang, Ziyue Qiao, Pengyang Wang, Yuanchun Zhou

    Abstract: Graph contrastive learning (GCL) has achieved remarkable success by following the computer vision paradigm of preserving absolute similarity between augmented views. However, this approach faces fundamental challenges in graphs due to their discrete, non-Euclidean nature -- view generation often breaks semantic validity and similarity verification becomes unreliable. Through analyzing 11 real-worl… ▽ More

    Submitted 12 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI2025; full version including appendix

  28. arXiv:2505.04999  [pdf, other

    cs.RO cs.AI cs.LG

    CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

    Authors: Anthony Liang, Pavel Czempin, Matthew Hong, Yutai Zhou, Erdem Biyik, Stephen Tu

    Abstract: Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that exis… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Latent Action Models, Self-supervised Pretraining, Learning from Videos

  29. arXiv:2505.04989  [pdf, other

    cs.RO

    CPP-DIP: Multi-objective Coverage Path Planning for MAVs in Dispersed and Irregular Plantations

    Authors: Weijie Kuang, Hann Woei Ho, Ye Zhou

    Abstract: Coverage Path Planning (CPP) is vital in precision agriculture to improve efficiency and resource utilization. In irregular and dispersed plantations, traditional grid-based CPP often causes redundant coverage over non-vegetated areas, leading to waste and pollution. To overcome these limitations, we propose CPP-DIP, a multi-objective CPP framework designed for Micro Air Vehicles (MAVs). The frame… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  30. arXiv:2505.04889  [pdf, other

    cs.LG cs.CR

    FedRE: Robust and Effective Federated Learning with Privacy Preference

    Authors: Tianzhe Xiao, Yichen Li, Yu Zhou, Yining Qi, Yi Liu, Wei Wang, Haozhao Wang, Yi Wang, Ruixuan Li

    Abstract: Despite Federated Learning (FL) employing gradient aggregation at the server for distributed training to prevent the privacy leakage of raw data, private information can still be divulged through the analysis of uploaded gradients from clients. Substantial efforts have been made to integrate local differential privacy (LDP) into the system to achieve a strict privacy guarantee. However, existing m… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  31. arXiv:2505.04877  [pdf, other

    cs.CV cs.AI

    Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

    Authors: Lianbo Ma, Jianlun Ma, Yuee Zhou, Guoyang Xie, Qiang He, Zhichao Lu

    Abstract: Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive search for quantization policies on large-scale datasets. To resolve this issue, we introduce a novel approach that first searches for quantization policies on s… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  32. arXiv:2505.04788  [pdf, ps, other

    cs.CV

    Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

    Authors: Bangyan Liao, Zhenjun Zhao, Haoang Li, Yi Zhou, Yingping Zeng, Hao Li, Peidong Liu

    Abstract: Determining the vanishing points (VPs) in a Manhattan world, as a fundamental task in many 3D vision applications, consists of jointly inferring the line-VP association and locating each VP. Existing methods are, however, either sub-optimal solvers or pursuing global optimality at a significant cost of computing time. In contrast to prior works, we introduce convex relaxation techniques to solve t… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025 as Award Candidate & Oral Presentation. The first two authors contributed equally to this work. Code: https://github.com/WU-CVGL/GlobustVP

  33. arXiv:2505.04622  [pdf, other

    cs.GR cs.CV

    PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

    Authors: Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han

    Abstract: Shape primitive abstraction, which decomposes complex 3D shapes into simple geometric elements, plays a crucial role in human visual cognition and has broad applications in computer vision and graphics. While recent advances in 3D content generation have shown remarkable progress, existing primitive abstraction methods either rely on geometric optimization with limited semantic understanding or le… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025. 14 pages, 15 figures

  34. arXiv:2505.04620  [pdf, other

    cs.CV

    On Path to Multimodal Generalist: General-Level and General-Bench

    Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu , et al. (7 additional authors not shown)

    Abstract: The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expande… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML'25, 305 pages, 115 tables, 177 figures, project page: https://generalist.top/

  35. arXiv:2505.04512  [pdf, other

    cs.CV

    HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

    Authors: Teng Hu, Zhentao Yu, Zhengguang Zhou, Sen Liang, Yuan Zhou, Qin Lin, Qinglin Lu

    Abstract: Customized video generation aims to produce videos featuring specific subjects under flexible user-defined conditions, yet existing methods often struggle with identity consistency and limited input modalities. In this paper, we propose HunyuanCustom, a multi-modal customized video generation framework that emphasizes subject consistency while supporting image, audio, video, and text conditions. B… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  36. arXiv:2505.03802  [pdf, other

    cs.LG cs.AI

    Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth

    Authors: Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

    Abstract: QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM). Recently, methods based on SVD for continuous update iterations to initialize LoRA matrices to accommodate quantization errors have generally failed to consistently improve performance. Dynamic mixed precision is a natural idea for continuously improving the fine-tuning… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 24 pages, 6 figures

  37. arXiv:2505.03683  [pdf, other

    cs.SE

    Moral Testing of Autonomous Driving Systems

    Authors: Wenbing Tang, Mingfei Cheng, Yuan Zhou, Yang Liu

    Abstract: Autonomous Driving System (ADS) testing plays a crucial role in their development, with the current focus primarily on functional and safety testing. However, evaluating the non-functional morality of ADSs, particularly their decision-making capabilities in unavoidable collision scenarios, is equally important to ensure the systems' trustworthiness and public acceptance. Unfortunately, testing ADS… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  38. arXiv:2505.03507  [pdf, ps, other

    cs.CV

    Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

    Authors: Shenglan Li, Rui Yao, Yong Zhou, Hancheng Zhu, Kunyang Sun, Bing Liu, Zhiwen Shao, Jiaqi Zhao

    Abstract: To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

  39. arXiv:2505.03185  [pdf, other

    cs.HC

    Behavioral Sensing and Intervention Paradigm: A Review of Closed-Loop Approaches for Ingestion Health

    Authors: Jun Fang, Yanuo Zhou, Ka I Chan, Jiajin Li, Zeyi Sun, Zhengnan Li, Zicong Fu, Hongjing Piao, Haodong Xu, Yuanchun Shi, Yuntao Wang

    Abstract: Ingestive behavior plays a critical role in health, yet many existing interventions remain limited to static guidance or manual self-tracking. With the increasing integration of sensors and perceptual computing, recent systems have begun to support closed-loop interventions that dynamically sense user behavior and provide feedback during or around ingestion episodes. In this survey, we review 136… ▽ More

    Submitted 14 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  40. arXiv:2505.02625  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

    Authors: Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng

    Abstract: Real-time, intelligent, and natural speech interaction is an essential part of the next-generation human-computer interaction. Recent advancements have showcased the potential of building intelligent spoken chatbots based on large language models (LLMs). In this paper, we introduce LLaMA-Omni 2, a series of speech language models (SpeechLMs) ranging from 0.5B to 14B parameters, capable of achievin… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Preprint. Project: https://github.com/ictnlp/LLaMA-Omni2

  41. arXiv:2505.02365  [pdf, other

    cs.CV

    Quaternion Multi-focus Color Image Fusion

    Authors: Weihua Yang, Yicong Zhou

    Abstract: Multi-focus color image fusion refers to integrating multiple partially focused color images to create a single all-in-focus color image. However, existing methods struggle with complex real-world scenarios due to limitations in handling color information and intricate textures. To address these challenges, this paper proposes a quaternion multi-focus color image fusion framework to perform high-q… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  42. arXiv:2505.02364  [pdf, other

    cs.CV

    Quaternion Infrared Visible Image Fusion

    Authors: Weihua Yang, Yicong Zhou

    Abstract: Visible images provide rich details and color information only under well-lighted conditions while infrared images effectively highlight thermal targets under challenging conditions such as low visibility and adverse weather. Infrared-visible image fusion aims to integrate complementary information from infrared and visible images to generate a high-quality fused image. Existing methods exhibit cr… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  43. arXiv:2505.02325  [pdf, other

    cs.CV

    TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment

    Authors: Zhichuan Wang, Yang Zhou, Jinhai Xiang, Yulong Wang, Xinwei He

    Abstract: Learning discriminative 3D representations that generalize well to unknown testing categories is an emerging requirement for many real-world 3D applications. Existing well-established methods often struggle to attain this goal due to insufficient 3D training data from broader concepts. Meanwhile, pre-trained large vision-language models (e.g., CLIP) have shown remarkable zero-shot generalization c… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted by ICMR 2025

  44. arXiv:2505.02016  [pdf, ps, other

    cs.AR

    ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA

    Authors: Zhengyuan Shi, Zeju Li, Chengyu Ma, Yunhao Zhou, Ziyang Zheng, Jiawei Liu, Hongyang Pan, Lingfeng Zhou, Kezhi Li, Jiaying Zhu, Lingwei Yan, Zhiqiang He, Chenhao Xue, Wentao Jiang, Fan Yang, Guangyu Sun, Xiaoyan Yang, Gang Chen, Chuan Shi, Zhufei Chu, Jun Yang, Qiang Xu

    Abstract: We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  45. arXiv:2505.01589  [pdf, other

    cs.RO

    Phasing Through the Flames: Rapid Motion Planning with the AGHF PDE for Arbitrary Objective Functions and Constraints

    Authors: Challen Enninful Adu, César E. Ramos Chuquiure, Yutong Zhou, Pearl Lin, Ruikai Yang, Bohao Zhang, Shubham Singh, Ram Vasudevan

    Abstract: The generation of optimal trajectories for high-dimensional robotic systems under constraints remains computationally challenging due to the need to simultaneously satisfy dynamic feasibility, input limits, and task-specific objectives while searching over high-dimensional spaces. Recent approaches using the Affine Geometric Heat Flow (AGHF) Partial Differential Equation (PDE) have demonstrated pr… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 15 pages, 5 figures

  46. arXiv:2505.01489  [pdf, other

    cs.LG cs.CR

    Machine Learning for Cyber-Attack Identification from Traffic Flows

    Authors: Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

    Abstract: This paper presents our simulation of cyber-attacks and detection strategies on the traffic control system in Daytona Beach, FL. using Raspberry Pi virtual machines and the OPNSense firewall, along with traffic dynamics from SUMO and exploitation via the Metasploit framework. We try to answer the research questions: are we able to identify cyber attacks by only analyzing traffic flow patterns. In… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  47. arXiv:2505.01488  [pdf, other

    cs.LG cs.CR

    Explainable Machine Learning for Cyberattack Identification from Traffic Flows

    Authors: Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

    Abstract: The increasing automation of traffic management systems has made them prime targets for cyberattacks, disrupting urban mobility and public safety. Traditional network-layer defenses are often inaccessible to transportation agencies, necessitating a machine learning-based approach that relies solely on traffic flow data. In this study, we simulate cyberattacks in a semi-realistic environment, using… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  48. arXiv:2505.00973  [pdf, ps, other

    cs.LG math.OC

    A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems

    Authors: Xin Chen, Yuze Chen, Yuan Zhou

    Abstract: We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predi… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 64 pages, 1 figure

  49. arXiv:2505.00753  [pdf, other

    cs.CL cs.LG

    A Survey on Large Language Model based Human-Agent Systems

    Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu

    Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world app… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Paper lists and resources are available at \url{https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers}

  50. arXiv:2505.00364  [pdf, other

    cs.LG

    From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks

    Authors: Jie Yang, Yuwen Wang, Kaixuan Chen, Tongya Zheng, Yihe Zhou, Zhenbang Xiao, Ji Cao, Mingli Song, Shunyu Liu

    Abstract: Interpretable Graph Neural Networks (GNNs) aim to reveal the underlying reasoning behind model predictions, attributing their decisions to specific subgraphs that are informative. However, existing subgraph-based interpretable methods suffer from an overemphasis on local structure, potentially overlooking long-range dependencies within the entire graphs. Although recent efforts that rely on graph… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted by ICLR 2025