Skip to main content

Showing 1–50 of 7,722 results for author: Li, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01884  [pdf, ps, other

    cs.CV

    Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification

    Authors: Kunlun Xu, Fan Zhuo, Jiangmeng Li, Xu Zou, Jiahuan Zhou

    Abstract: Current lifelong person re-identification (LReID) methods predominantly rely on fully labeled data streams. However, in real-world scenarios where annotation resources are limited, a vast amount of unlabeled data coexists with scarce labeled samples, leading to the Semi-Supervised LReID (Semi-LReID) problem where LReID methods suffer severe performance degradation. Existing LReID methods, even whe… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  2. arXiv:2507.01535  [pdf, ps, other

    cs.CV

    TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking

    Authors: Bingxi Liu, Calvin Chen, Junhao Li, Guyang Yu, Haoqian Song, Xuchen Liu, Jinqiang Cui, Hong Zhang

    Abstract: The Vision Transformer (ViT) model has long struggled with the challenge of quadratic complexity, a limitation that becomes especially critical in unmanned aerial vehicle (UAV) tracking systems, where data must be processed in real time. In this study, we explore the recently proposed State-Space Model, Mamba, leveraging its computational efficiency and capability for long-sequence modeling to eff… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 12 pages

  3. arXiv:2507.01439  [pdf, ps, other

    cs.CV

    TurboReg: TurboClique for Robust and Efficient Point Cloud Registration

    Authors: Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao, Kaixin Wang, Kuang Cao, Ji Wu, Jiayuan Li

    Abstract: Robust estimation is essential in correspondence-based Point Cloud Registration (PCR). Existing methods using maximal clique search in compatibility graphs achieve high recall but suffer from exponential time complexity, limiting their use in time-sensitive applications. To address this challenge, we propose a fast and robust estimator, TurboReg, built upon a novel lightweight clique, TurboClique,… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ICCV-2025 Accepted Paper

  4. arXiv:2507.01057  [pdf, ps, other

    cs.LG physics.flu-dyn

    Loop2Net: Data-Driven Generation and Optimization of Airfoil CFD Meshes from Sparse Boundary Coordinates

    Authors: Lushun Fan, Yuqin Xia, Jun Li, Karl Jenkins

    Abstract: In this study, an innovative intelligent optimization system for mesh quality is proposed, which is based on a deep convolutional neural network architecture, to achieve mesh generation and optimization. The core of the study is the Loop2Net generator and loss function, it predicts the mesh based on the given wing coordinates. And the model's performance is continuously optimised by two key loss f… ▽ More

    Submitted 28 June, 2025; originally announced July 2025.

  5. arXiv:2507.01006  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang , et al. (54 additional authors not shown)

    Abstract: We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  6. arXiv:2507.00444  [pdf, ps, other

    cs.ET

    DiffCkt: A Diffusion Model-Based Hybrid Neural Network Framework for Automatic Transistor-Level Generation of Analog Circuits

    Authors: Chengjie Liu, Jiajia Li, Yabing Feng, Wenhao Huang, Weiyu Chen, Yuan Du, Jun Yang, Li Du

    Abstract: Analog circuit design consists of the pre-layout and layout phases. Among them, the pre-layout phase directly decides the final circuit performance, but heavily depends on experienced engineers to do manual design according to specific application scenarios. To overcome these challenges and automate the analog circuit pre-layout design phase, we introduce DiffCkt: a diffusion model-based hybrid ne… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCAD2025

  7. arXiv:2507.00439  [pdf, ps, other

    cs.CL

    Beyond Sociodemographic Prompting: Using Supervision to Align LLMs with Human Response Distributions

    Authors: Gauri Kambhatla, Sanjana Gautam, Angela Zhang, Alex Liu, Ravi Srinivasan, Junyi Jessy Li, Matthew Lease

    Abstract: The ability to accurately predict how different population groups would answer subjective questions would have great value. In this work, we show that use of relatively simple supervision can greatly improve language model alignment with diverse population groups, as measured over three datasets spanning various topics. Beyond evaluating average performance, we also report how alignment varies acr… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2507.00435  [pdf, ps, other

    cs.RO cs.AI cs.CV

    RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

    Authors: Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa

    Abstract: We present RoboEval, a simulation benchmark and structured evaluation framework designed to reveal the limitations of current bimanual manipulation policies. While prior benchmarks report only binary task success, we show that such metrics often conceal critical weaknesses in policy behavior -- such as poor coordination, slipping during grasping, or asymmetric arm usage. RoboEval introduces a suit… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Project page: https://robo-eval.github.io

  9. arXiv:2507.00419  [pdf, ps, other

    physics.geo-ph cs.AI

    Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding

    Authors: Yimin Dou, Xinming Wu, Nathan L Bangs, Harpreet Singh Sethi, Jintao Li, Hang Gao, Zhixiang Guo

    Abstract: Understanding Earth's subsurface is critical for energy transition, natural hazard mitigation, and planetary science. Yet subsurface analysis remains fragmented, with separate models required for structural interpretation, stratigraphic analysis, geobody segmentation, and property modeling-each tightly coupled to specific data distributions and task formulations. We introduce the Geological Everyt… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  10. arXiv:2507.00411  [pdf, ps, other

    cs.LG

    Diffusion Disambiguation Models for Partial Label Learning

    Authors: Jinfu Fan, Xiaohui Zhong, Kangrui Ren, Jiangnan Li, Linqing Huang

    Abstract: Learning from ambiguous labels is a long-standing problem in practical machine learning applications. The purpose of \emph{partial label learning} (PLL) is to identify the ground-truth label from a set of candidate labels associated with a given instance. Inspired by the remarkable performance of diffusion models in various generation tasks, this paper explores their potential to denoise ambiguous… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  11. arXiv:2506.23692  [pdf, ps, other

    cs.AI

    Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models

    Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye

    Abstract: While AI for Science (AI4S) serves as an analytical tool in the current research paradigm, it doesn't solve its core inefficiency. We propose "Agent for Science" (Agent4S)-the use of LLM-driven agents to automate the entire research workflow-as the true Fifth Scientific Paradigm. This paper introduces a five-level classification for Agent4S, outlining a clear roadmap from simple task automation to… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  12. arXiv:2506.23577  [pdf, ps, other

    cs.CV

    StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection

    Authors: Yanning Hou, Yanran Ruan, Junfa Li, Shanshan Wang, Jianfeng Qiu, Ke Xu

    Abstract: Enhancing the alignment between text and image features in the CLIP model is a critical challenge in zero-shot industrial anomaly detection tasks. Recent studies predominantly utilize specific category prompts during pretraining, which can cause overfitting to the training categories and limit model generalization. To address this, we propose a method that transforms category names through multica… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  13. arXiv:2506.23543  [pdf, ps, other

    cs.CV

    Pyramidal Patchification Flow for Visual Generation

    Authors: Hui Li, Baoyou Chen, Liwei Zhang, Jiaye Li, Jingdong Wang, Siyu Zhu

    Abstract: Diffusion transformers (DiTs) adopt Patchify, mapping patch representations to token representations through linear projections, to adjust the number of tokens input to DiT blocks and thus the computation cost. Instead of a single patch size for all the timesteps, we introduce a Pyramidal Patchification Flow (PPFlow) approach: Large patch sizes are used for high noise timesteps and small patch siz… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 10 pages, 9figures

  14. arXiv:2506.23493  [pdf, ps, other

    cs.NI eess.SP

    Securing the Sky: Integrated Satellite-UAV Physical Layer Security for Low-Altitude Wireless Networks

    Authors: Jiahui Li, Geng Sun, Xiaoyu Sun, Fang Mei, Jingjing Wang, Xiangwang Hou, Daxin Tian, Victor C. M. Leung

    Abstract: Low-altitude wireless networks (LAWNs) have garnered significant attention in the forthcoming 6G networks. In LAWNs, satellites with wide coverage and unmanned aerial vehicles (UAVs) with flexible mobility can complement each other to form integrated satellite-UAV networks, providing ubiquitous and high-speed connectivity for low-altitude operations. However, the higher line-of-sight probability i… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been submitted to IEEE Wireless Communications

  15. arXiv:2506.23488  [pdf, ps, other

    cs.NI

    Generative AI-enhanced Low-Altitude UAV-Mounted Stacked Intelligent Metasurfaces

    Authors: Geng Sun, Mingzhe Fan, Lei Zhang, Hongyang Pan, Jiahui Li, Chuang Zhang, Linyao Li, Changyuan Zhao, Chau Yuen

    Abstract: Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In thi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been already submitted to TCCN

  16. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  17. arXiv:2506.23292  [pdf, ps, other

    cs.CV

    DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios

    Authors: Changtao Miao, Yi Zhang, Weize Gao, Man Luo, Weiwei Feng, Zhiya Tan, Jianshu Li, Ajian Liu, Yunfeng Diao, Qi Chu, Tao Gong, Zhe Li, Weibin Yao, Joey Tianyi Zhou

    Abstract: Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. In critical domai… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper is a preliminary version, with an extended and comprehensive version currently under development

  18. Synergizing Implicit and Explicit User Interests: A Multi-Embedding Retrieval Framework at Pinterest

    Authors: Zhibo Fan, Hongtao Lin, Haoyu Chen, Bowen Deng, Hedi Xia, Yuke Yan, James Li

    Abstract: Industrial recommendation systems are typically composed of multiple stages, including retrieval, ranking, and blending. The retrieval stage plays a critical role in generating a high-recall set of candidate items that covers a wide range of diverse user interests. Effectively covering the diverse and long-tail user interests within this stage poses a significant challenge: traditional two-tower m… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: KDD 2025

  19. arXiv:2506.22920  [pdf, ps, other

    cs.AI

    Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

    Authors: Pinzheng Wang, Juntao Li, Zecheng Tang, Haijia Gui, Min zhang

    Abstract: Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  20. arXiv:2506.22880  [pdf, ps, other

    cs.CV cs.AI

    Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder

    Authors: Dang Jisheng, Wu Xudong, Wang Bimei, Lv Ning, Chen Jiayu, Jingwen Zhao, Yichu liu, Jizhao Liu, Juncheng Li, Teng Wang

    Abstract: Existing video segmenter and grounder approaches, exemplified by Sa2VA, directly fuse features within segmentation models. This often results in an undesirable entanglement of dynamic visual information and static semantics, thereby degrading segmentation accuracy. To systematically mitigate this issue, we propose DeSa2VA, a decoupling-enhanced prompting scheme integrating text pre-training and a… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  21. arXiv:2506.22736  [pdf, ps, other

    cs.CV

    UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments

    Authors: Dayong Su, Yafei Zhang, Huafeng Li, Jinxing Li, Yu Liu

    Abstract: Current multimodal medical image fusion typically assumes that source images are of high quality and perfectly aligned at the pixel level. Its effectiveness heavily relies on these conditions and often deteriorates when handling misaligned or degraded medical images. To address this, we propose UniFuse, a general fusion framework. By embedding a degradation-aware prompt learning module, UniFuse se… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV2025

  22. arXiv:2506.22732  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Robust Tensor Completion via Gradient Tensor Nulclear L1-L2 Norm for Traffic Data Recovery

    Authors: Hao Shu, Jicheng Li, Tianyv Lei, Lijun Sun

    Abstract: In real-world scenarios, spatiotemporal traffic data frequently experiences dual degradation from missing values and noise caused by sensor malfunctions and communication failures. Therefore, effective data recovery methods are essential to ensure the reliability of downstream data-driven applications. while classical tensor completion methods have been widely adopted, they are incapable of modeli… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  23. arXiv:2506.22516  [pdf, ps, other

    cs.CL cs.AI cs.NE q-bio.NC

    Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

    Authors: Jingkai Li

    Abstract: Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 -- the latest iterations of this framework -- to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Published as a journal paper at: https://doi.org/10.1016/j.nlp.2025.100163

    Journal ref: Natural Language Processing Journal 12C (2025) 100163

  24. arXiv:2506.22462  [pdf, ps, other

    eess.SP cs.AI cs.CY cs.HC

    Privacy-aware IoT Fall Detection Services For Aging in Place

    Authors: Abdallah Lakhdari, Jiajie Li, Amani Abusafia, Athman Bouguettaya

    Abstract: Fall detection is critical to support the growing elderly population, projected to reach 2.1 billion by 2050. However, existing methods often face data scarcity challenges or compromise privacy. We propose a novel IoT-based Fall Detection as a Service (FDaaS) framework to assist the elderly in living independently and safely by accurately detecting falls. We design a service-oriented architecture… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 11 pages, 12 figures, This paper is accepted in the 2025 IEEE International Conference on Web Services (ICWS 2025)

  25. arXiv:2506.22037  [pdf

    cs.SE

    KARMA Approach supporting Development Process Reconstruction in Model-based Systems Engineering

    Authors: Jiawei Li, Zan Liang, Guoxin Wang, Jinzhi Lu, Yan Yan, Shouxuan Wu, Hao Wang

    Abstract: Model reconstruction is a method used to drive the development of complex system development processes in model-based systems engineering. Currently, during the iterative design process of a system, there is a lack of an effective method to manage changes in development requirements, such as development cycle requirements and cost requirements, and to realize the reconstruction of the system devel… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 12 pages, 9 figures, submitted to the 15th international Complex Systems Design & Management (CSD&M) conference

  26. arXiv:2506.21734  [pdf, ps, other

    cs.AI cs.LG

    Hierarchical Reasoning Model

    Authors: Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori

    Abstract: Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose th… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  27. arXiv:2506.21682  [pdf, ps, other

    cs.CL

    Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations

    Authors: Li Zhou, Hao Jiang, Junjie Li, Zefeng Zhao, Feng Jiang, Wenyu Chen, Haizhou Li

    Abstract: Explicit structural information has been proven to be encoded by Graph Neural Networks (GNNs), serving as auxiliary knowledge to enhance model capabilities and improve performance in downstream NLP tasks. However, recent studies indicate that GNNs fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs), despite lacking the message-passing mechanisms inherent to GNNs, e… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Graph Neural Networks, Multi-Layer Perceptrons, Explicit Structural Modeling, Probing Classifier

  28. arXiv:2506.21555  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Multilingual ASR Finetuning via LoRA Language Experts

    Authors: Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian

    Abstract: Recent advancements in deep learning have significantly enhanced multilingual automatic speech recognition (ASR) due to the development of advanced model architectures and available large-scale multilingual datasets. Despite that, multilingual ASR still suffers from the curse of multilinguality in that different languages tend to interfere with each other, making it difficult for the ASR model to… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted in Interspeech 2025

  29. arXiv:2506.21343  [pdf, ps, other

    cs.LG

    DynamicBench: Evaluating Real-Time Report Generation in Large Language Models

    Authors: Jingyao Li, Hao Sun, Zile Qiao, Yong Jiang, Pengjun Xie, Fei Huang, Hong Xu, Jiaya Jia

    Abstract: Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minu… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  30. arXiv:2506.21033  [pdf, ps, other

    cs.DC

    BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services

    Authors: Zhaojiacheng Zhou, Hongze Liu, Shijing Yuan, Hanning Zhang, Jiong Lou, Chentao Wu, Jie Li

    Abstract: The hallucination problem of Large Language Models (LLMs) has increasingly drawn attention. Augmenting LLMs with external knowledge is a promising solution to address this issue. However, due to privacy and security concerns, a vast amount of downstream task-related knowledge remains dispersed and isolated across various "silos," making it difficult to access. To bridge this knowledge gap, we prop… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  31. arXiv:2506.20876  [pdf, ps, other

    cs.CL

    Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine

    Authors: Sebastian Joseph, Lily Chen, Barry Wei, Michael Mackert, Iain J. Marshall, Paul Pu Liang, Ramez Kouzy, Byron C. Wallace, Junyi Jessy Li

    Abstract: Technological progress has led to concrete advancements in tasks that were regarded as challenging, such as automatic fact-checking. Interest in adopting these systems for public health and medicine has grown due to the high-stakes nature of medical decisions and challenges in critically appraising a vast and diverse medical literature. Evidence-based medicine connects to every individual, and yet… ▽ More

    Submitted 28 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Flattened Figure 1 PDF for compatibility with Mac Preview

  32. arXiv:2506.20875  [pdf, ps, other

    cs.GR cs.CV

    3DGH: 3D Head Generation with Composable Hair and Face

    Authors: Chengan He, Junxuan Li, Tobias Kirschstein, Artem Sevastopolsky, Shunsuke Saito, Qingyang Tan, Javier Romero, Chen Cao, Holly Rushmeier, Giljoo Nam

    Abstract: We present 3DGH, an unconditional generative model for 3D human heads with composable hair and face components. Unlike previous work that entangles the modeling of hair and face, we propose to separate them using a novel data representation with template-based 3D Gaussian Splatting, in which deformable hair geometry is introduced to capture the geometric variations across different hairstyles. Bas… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted to SIGGRAPH 2025. Project page: https://c-he.github.io/projects/3dgh/

  33. arXiv:2506.20748  [pdf, ps, other

    cs.HC cs.AI

    Exploring the Effects of Chatbot Anthropomorphism and Human Empathy on Human Prosocial Behavior Toward Chatbots

    Authors: Jingshu Li, Zicheng Zhu, Renwen Zhang, Yi-Chieh Lee

    Abstract: Chatbots are increasingly integrated into people's lives and are widely used to help people. Recently, there has also been growing interest in the reverse direction-humans help chatbots-due to a wide range of benefits including better chatbot performance, human well-being, and collaborative outcomes. However, little research has explored the factors that motivate people to help chatbots. To addres… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  34. arXiv:2506.20624  [pdf, ps, other

    cs.PL quant-ph

    PhasePoly: An Optimization Framework forPhase Polynomials in Quantum Circuits

    Authors: Zihan Chen, Henry Chen, Yuwei Jin, Minghao Guo, Enhyeok Jang, Jiakang Li, Caitlin Chan, Won Woo Ro, Eddy Z. Zhang

    Abstract: Quantum computing has transformative computational power to make classically intractable computing feasible. As the algorithms that achieve practical quantum advantage are beyond manual tuning, quantum circuit optimization has become extremely important and integrated into today's quantum software stack. This paper focuses on a critical type of quantum circuit optimization -- phase-polynomial opti… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 14 pages, 12 figures

  35. arXiv:2506.20590  [pdf, ps, other

    cs.CV

    WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration

    Authors: Chaojun Ni, Jie Li, Haoyun Li, Hengyu Liu, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Boyuan Wang, Chenxin Li, Guan Huang, Wenjun Mei

    Abstract: Interactive 3D scene generation from a single image has gained significant attention due to its potential to create immersive virtual worlds. However, a key challenge in current 3D generation methods is the limited explorability, which cannot render high-quality images during larger maneuvers beyond the original viewpoint, particularly when attempting to move forward into unseen areas. To address… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  36. arXiv:2506.20488  [pdf, ps, other

    cs.CR cs.NI

    Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions

    Authors: Shuo Yang, Xinran Zheng, Jinfeng Xu, Jinze Li, Danyang Song, Zheyu Chen, Edith C. H. Ngai

    Abstract: The rapid advancement of 6G wireless networks, IoT, and edge computing has significantly expanded the cyberattack surface, necessitating more intelligent and adaptive vulnerability detection mechanisms. Traditional security methods, while foundational, struggle with zero-day exploits, adversarial threats, and context-dependent vulnerabilities in highly dynamic network environments. Generative AI (… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  37. arXiv:2506.20463  [pdf, ps, other

    cs.HC cs.CY

    Analyzing Security and Privacy Challenges in Generative AI Usage Guidelines for Higher Education

    Authors: Bei Yi Ng, Jiarui Li, Xinyuan Tong, Kevin Ye, Gauthami Yenne, Varun Chandrasekaran, Jingjie Li

    Abstract: Educators and learners worldwide are embracing the rise of Generative Artificial Intelligence (GenAI) as it reshapes higher education. However, GenAI also raises significant privacy and security concerns, as models and privacy-sensitive user data, such as student records, may be misused by service providers. Unfortunately, end-users often have little awareness of or control over how these models o… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  38. arXiv:2506.20134  [pdf, ps, other

    cs.CV

    From 2D to 3D Cognition: A Brief Survey of General World Models

    Authors: Ningwei Xie, Zizi Tian, Lei Yang, Xiao-Ping Zhang, Meng Guo, Jie Li

    Abstract: World models have garnered increasing attention in the development of artificial general intelligence (AGI), serving as computational frameworks for learning representations of the external world and forecasting future states. While early efforts focused on 2D visual perception and simulation, recent 3D-aware generative world models have demonstrated the ability to synthesize geometrically consist… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  39. arXiv:2506.19960  [pdf, ps, other

    physics.chem-ph cs.AI stat.ML

    An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking

    Authors: Adam Foster, Zeno Schätzle, P. Bernát Szabó, Lixue Cheng, Jonas Köhler, Gino Cassella, Nicholas Gao, Jiawei Li, Frank Noé, Jan Hermann

    Abstract: Reliable description of bond breaking remains a major challenge for quantum chemistry due to the multireferential character of the electronic structure in dissociating species. Multireferential methods in particular suffer from large computational cost, which under the normal paradigm has to be paid anew for each system at a full price, ignoring commonalities in electronic structure across molecul… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  40. arXiv:2506.19848  [pdf, ps, other

    cs.CV cs.CL

    ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

    Authors: Long Xing, Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jinsong Li, Shuangrui Ding, Weiming Zhang, Nenghai Yu, Jiaqi Wang, Feng Wu, Dahua Lin

    Abstract: This paper presents ScaleCap, an inference-time scalable image captioning strategy that generates comprehensive and detailed image captions. The key challenges of high-quality image captioning lie in the inherent biases of LVLMs: multimodal bias resulting in imbalanced descriptive granularity, offering detailed accounts of some elements while merely skimming over others; linguistic bias leading to… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Code is available at https://github.com/Cooperx521/ScaleCap

  41. arXiv:2506.19022  [pdf, ps, other

    cs.CV

    Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation

    Authors: Jinlong Li, Dong Zhao, Qi Zang, Zequn Jie, Lin Ma, Nicu Sebe

    Abstract: Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios with changing target distributions. Existing CTTA methods primarily focus on mitigating the challenges of catastrophic forgetting and error accumulation. Though there have been emerging methods based on forgetting adaptation with parameter-efficient fine-tuning, they still… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  42. arXiv:2506.18951  [pdf, ps, other

    cs.DB cs.AI

    SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

    Authors: Jinyang Li, Xiaolong Li, Ge Qu, Per Jacobsson, Bowen Qin, Binyuan Hui, Shuzheng Si, Nan Huo, Xiaohan Xu, Yue Zhang, Ziwei Tang, Yuanshuai Li, Florensia Widjaja, Xintong Zhu, Feige Zhou, Yongfeng Huang, Yannis Papakonstantinou, Fatma Ozcan, Chenhao Ma, Reynold Cheng

    Abstract: Resolution of complex SQL issues persists as a significant bottleneck in real-world database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging SQL issues. To address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 PostgreSQL tasks (BIRD-CRITI… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 26 pages, 9 figures

  43. arXiv:2506.18940  [pdf, ps, other

    q-bio.GN cs.AI

    eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis

    Authors: Zhenke Liu, Jien Li, Ziqi Zhang

    Abstract: Extrachromosomal circular DNA (eccDNA) plays key regulatory roles and contributes to oncogene overexpression in cancer through high-copy amplification and long-range interactions. Despite advances in modeling, no pre-trained models currently support full-length circular eccDNA for downstream analysis. Existing genomic models are either limited to single-nucleotide resolution or hindered by the ine… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025 Generative AI and Biology (GenBio) Workshop

  44. arXiv:2506.18879  [pdf, ps, other

    cs.CL cs.AI

    CommVQ: Commutative Vector Quantization for KV Cache Compression

    Authors: Junyan Li, Yang Zhang, Muhammad Yusuf Hassan, Talha Chafekar, Tianle Cai, Zhile Ren, Pengsheng Guo, Foroozan Karimzadeh, Colorado Reed, Chong Wang, Chuang Gan

    Abstract: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. We first introduce additive quantization with a lightweight encoder and co… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: ICML 2025 poster

  45. arXiv:2506.18841  [pdf, ps, other

    cs.CL cs.AI cs.LG

    LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

    Authors: Yuhao Wu, Yushi Bai, Zhiqiang Hu, Roy Ka-Wei Lee, Juanzi Li

    Abstract: Ultra-long generation by large language models (LLMs) is a widely demanded scenario, yet it remains a significant challenge due to their maximum generation length limit and overall quality degradation as sequence length increases. Previous approaches, exemplified by LongWriter, typically rely on ''teaching'', which involves supervised fine-tuning (SFT) on synthetic long-form outputs. However, this… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  46. arXiv:2506.18716  [pdf, ps, other

    cs.LG cs.CL

    Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation

    Authors: Jie Li, Shifei Ding, Lili Guo, Xuan Li

    Abstract: Emotion Recognition in Conversation (ERC) aims to detect the emotions of individual utterances within a conversation. Generating efficient and modality-specific representations for each utterance remains a significant challenge. Previous studies have proposed various models to integrate features extracted using different modality-specific encoders. However, they neglect the varying contributions o… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted by IJCAI2025

  47. arXiv:2506.18696  [pdf, ps, other

    cs.LG

    SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding

    Authors: Yuchang Zhu, Jintang Li, Huizhe Zhang, Liang Chen, Zibin Zheng

    Abstract: Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar ind… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Under review

  48. arXiv:2506.18682  [pdf, ps, other

    cs.CV cs.AI

    Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios

    Authors: Imad Ali Shah, Jiarong Li, Tim Brophy, Martin Glavin, Edward Jones, Enda Ward, Brian Deegan

    Abstract: Recent advances in autonomous driving (AD) have highlighted the potential of Hyperspectral Imaging (HSI) for enhanced environmental perception, particularly in challenging weather and lighting conditions. However, efficiently processing its high-dimensional spectral data remains a significant challenge. This paper introduces a Multi-scale Spectral Attention Module (MSAM) that enhances spectral fea… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  49. arXiv:2506.18679  [pdf, ps, other

    cs.CV

    MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation

    Authors: Ruicheng Zhang, Yu Sun, Zeyu Zhang, Jinai Li, Xiaofan Liu, Au Hoi Fan, Haowei Guo, Puxin Yan

    Abstract: We introduce MARL-MambaContour, the first contour-based medical image segmentation framework based on Multi-Agent Reinforcement Learning (MARL). Our approach reframes segmentation as a multi-agent cooperation task focused on generate topologically consistent object-level contours, addressing the limitations of traditional pixel-based methods which could lack topological constraints and holistic st… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  50. arXiv:2506.18671  [pdf, ps, other

    cs.SD cs.CV cs.GR eess.AS

    TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography

    Authors: Yuqin Dai, Wanlu Zhu, Ronghui Li, Xiu Li, Zhenyu Zhang, Jun Li, Jian Yang

    Abstract: Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper,… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.