Skip to main content

Showing 1–50 of 1,472 results for author: Zhou, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04630  [pdf, ps, other

    cs.CV

    Learn 3D VQA Better with Active Selection and Reannotation

    Authors: Shengli Zhou, Yang Liu, Feng Zheng

    Abstract: 3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scen… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM MM 2025

  2. arXiv:2507.04289  [pdf, ps, other

    cs.CV cs.AI

    M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding

    Authors: Shenxi Liu, Kan Li, Mingyang Zhao, Yuhang Tian, Bin Li, Shoujun Zhou, Hongliang Li, Fuxia Yang

    Abstract: With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2)… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: 19 pages, 8 figures, 7 tables

  3. arXiv:2507.03304  [pdf, ps, other

    cs.CV

    Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

    Authors: Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao

    Abstract: Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal dataset… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  4. arXiv:2507.02665  [pdf, ps, other

    cs.SE

    Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?

    Authors: Timo Kehrer, Robert Haines, Guido Juckeland, Shurui Zhou, David E. Bernholdt

    Abstract: Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge ga… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Early access journal version: T. Kehrer, R. Haines, G. Juckeland, S. Zhou and D. E. Bernholdt, "Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?," in Computing in Science & Engineering, doi: 10.1109/MCSE.2025.3557236

  5. arXiv:2507.01800  [pdf, ps, other

    cs.CV cs.MM

    HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision

    Authors: Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng

    Abstract: 3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ICANN 2025

  6. arXiv:2506.23301  [pdf, ps, other

    cs.IT eess.SP

    Parallax QAMA: Novel Downlink Multiple Access for MISO Systems with Simple Receivers

    Authors: Jie Huang, Ming Zhao, Shengli Zhou, Ling Qiu, Jinkang Zhu

    Abstract: In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  7. arXiv:2506.23132  [pdf, ps, other

    cs.CV

    Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval

    Authors: Sophie Zhou, Shu Kong

    Abstract: Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting pho… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: to appear at AVSS'25

  8. arXiv:2506.22608  [pdf, ps, other

    cs.DS

    On Fine-Grained Distinct Element Estimation

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Thanasis Pittas, David P. Woodruff, Samson Zhou

    Abstract: We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  9. arXiv:2506.21967  [pdf, ps, other

    cs.CL cs.LG

    More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents

    Authors: Weimin Xiong, Ke Wang, Yifan Song, Hanchao Liu, Sai Zhou, Wei Peng, Sujian Li

    Abstract: Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process,… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  10. arXiv:2506.21915  [pdf

    cs.NE math.OC

    An Effective Two-Phase Genetic Algorithm for Solving the Resource Constrained Project Scheduling Problem (RCPSP)

    Authors: D. Sun, S. Zhou

    Abstract: This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases i… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 12 pages

    MSC Class: 90-08

  11. arXiv:2506.21619  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

    Authors: Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu

    Abstract: Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  12. arXiv:2506.21001  [pdf, ps, other

    cs.CV

    Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology

    Authors: Qiuyi Qi, Xin Li, Ming Kong, Zikang Xu, Bingdi Chen, Qiang Zhu, S Kevin Zhou

    Abstract: Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and ro… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: MIDL 2025 Oral

  13. arXiv:2506.19742  [pdf, ps, other

    eess.IV cs.AI cs.CV

    NeRF-based CBCT Reconstruction needs Normalization and Initialization

    Authors: Zhuowei Xu, Han Li, Dai Sun, Zhicheng Li, Yujia Li, Qingpeng Kong, Zhiwei Cheng, Nassir Navab, S. Kevin Zhou

    Abstract: Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specif… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  14. arXiv:2506.19651  [pdf, ps, other

    cs.CV cs.LG cs.PF

    PEVLM: Parallel Encoding for Vision-Language Models

    Authors: Letian Kang, Shixian Luo, Yiqiang Li, Xiaoyang Yu, Shenxuan Zhou, Yong Wu

    Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long… ▽ More

    Submitted 7 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

  15. arXiv:2506.18930  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking

    Authors: Chong Di, Shuwang Zhou, Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen

    Abstract: The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenari… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  16. arXiv:2506.18897  [pdf, ps, other

    cs.RO cs.AI

    MinD: Unified Visual Imagination and Control via Hierarchical World Models

    Authors: Xiaowei Chi, Kuangzhi Ge, Jiaming Liu, Siyuan Zhou, Peidong Jia, Zichen He, Yuzhen Liu, Tingguang Li, Lei Han, Sirui Han, Shanghang Zhang, Yike Guo

    Abstract: Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate i… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  17. arXiv:2506.18851  [pdf, ps, other

    cs.CV

    Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

    Authors: Zhuowei Chen, Bingchuan Li, Tianxiang Ma, Lijie Liu, Mingcong Liu, Yi Zhang, Gen Li, Xinghui Li, Siyu Zhou, Qian He, Xinglong Wu

    Abstract: Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by samp… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page:https://phantom-video.github.io/Phantom-Data/

  18. arXiv:2506.18034  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

    Authors: Fenghe Tang, Wenxin Ma, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou

    Abstract: With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly,… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted by MICCAI 2025. Code: https://github.com/FengheTan9/LLM4Seg

  19. arXiv:2506.18019  [pdf, ps, other

    cs.AI

    Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

    Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu

    Abstract: AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are require… ▽ More

    Submitted 4 July, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 20 pages, 7 figures

  20. arXiv:2506.17784  [pdf, ps, other

    cs.AI

    AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction

    Authors: Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li

    Abstract: Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  21. arXiv:2506.16716  [pdf, ps, other

    cs.HC

    V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos

    Authors: Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin

    Abstract: Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  22. arXiv:2506.16661  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Private Training & Data Generation by Clustering Embeddings

    Authors: Felix Zhou, Samson Zhou, Vahab Mirrokni, Alessandro Epasto, Vincent Cohen-Addad

    Abstract: Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been shown to unintentionally memorize and reveal sensitive information, including reconstructing entire training samples. Differential privacy (DP) provides a robust… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  23. arXiv:2506.16201  [pdf, ps, other

    cs.RO cs.CV

    FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation

    Authors: Sen Wang, Le Wang, Sanping Zhou, Jingyi Tian, Jiayi Li, Haowen Sun, Wei Tang

    Abstract: Robotic manipulation in high-precision tasks is essential for numerous industrial and real-world applications where accuracy and speed are required. Yet current diffusion-based policy learning methods generally suffer from low computational efficiency due to the iterative denoising process during inference. Moreover, these methods do not fully explore the potential of generative models for enhanci… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  24. arXiv:2506.16114  [pdf, ps, other

    cs.IR cs.AI

    GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks

    Authors: Yejing Wang, Shengyu Zhou, Jinyu Lu, Qidong Liu, Xinhang Li, Wenlin Zhang, Feng Li, Pengjie Wang, Jian Xu, Bo Zheng, Xiangyu Zhao

    Abstract: Generative recommendations (GR), which usually include item tokenizers and generative Large Language Models (LLMs), have demonstrated remarkable success across a wide range of scenarios. The majority of existing research efforts primarily concentrate on developing powerful item tokenizers or advancing LLM decoding strategies to attain superior performance. However, the critical fine-tuning step in… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  25. arXiv:2506.16082  [pdf, ps, other

    cs.CV

    PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning

    Authors: Yizhe Li, Sanping Zhou, Zheng Qin, Le Wang

    Abstract: Dense video captioning is a challenging task that aims to localize and caption multiple events in an untrimmed video. Recent studies mainly follow the transformer-based architecture to jointly perform the two sub-tasks, i.e., event localization and caption generation, in an end-to-end manner. Based on the general philosophy of detection transformer, these methods implicitly learn the event locatio… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  26. arXiv:2506.15712  [pdf, ps, other

    cs.LG cs.AI

    BatteryBERT for Realistic Battery Fault Detection Using Point-Masked Signal Modeling

    Authors: Songqi Zhou, Ruixue Liu, Yixing Wang, Jia Lu, Benben Jiang

    Abstract: Accurate fault detection in lithium-ion batteries is essential for the safe and reliable operation of electric vehicles and energy storage systems. However, existing methods often struggle to capture complex temporal dependencies and cannot fully leverage abundant unlabeled data. Although large language models (LLMs) exhibit strong representation capabilities, their architectures are not directly… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  27. arXiv:2506.15672  [pdf, ps, other

    cs.AI cs.MA

    SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

    Authors: Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp

    Abstract: The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic sys… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 41 pages

  28. arXiv:2506.15120  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Advancing Loss Functions in Recommender Systems: A Comparative Study with a Rényi Divergence-Based Solution

    Authors: Shengjia Zhang, Jiawei Chen, Changdong Li, Sheng Zhou, Qihao Shi, Yan Feng, Chun Chen, Can Wang

    Abstract: Loss functions play a pivotal role in optimizing recommendation models. Among various loss functions, Softmax Loss (SL) and Cosine Contrastive Loss (CCL) are particularly effective. Their theoretical connections and differences warrant in-depth exploration. This work conducts comprehensive analyses of these losses, yielding significant insights: 1) Common strengths -- both can be viewed as augment… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: AAAI 2025

  29. arXiv:2506.14477  [pdf, ps, other

    cs.AI

    GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

    Authors: Jingqi Yang, Zhilong Song, Jiawei Chen, Mingli Song, Sheng Zhou, linjun sun, Xiaogang Ouyang, Chun Chen, Can Wang

    Abstract: The development of high-quality datasets is crucial for benchmarking and advancing research in Graphical User Interface (GUI) agents. Despite their importance, existing datasets are often constructed under idealized conditions, overlooking the diverse anomalies frequently encountered in real-world deployments. To address this limitation, we introduce GUI-Robust, a novel dataset designed for compre… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 10 pages, 4 figures, submitted to NIPS 2025

  30. arXiv:2506.13301  [pdf, ps, other

    cs.CV

    AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing

    Authors: Biao Yang, Muqi Huang, Yuhui Zhang, Yun Xiong, Kun Zhou, Xi Chen, Shiyang Zhou, Huishuai Bao, Chuan Li, Feng Shi, Hualei Liu

    Abstract: Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step po… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  31. arXiv:2506.13137  [pdf, ps, other

    cs.IT eess.SP

    On secure UAV-aided ISCC systems

    Authors: Hongjiang Lei, Congke Jiang, Ki-Hong Park, Mohamed A. Aboulhassan, Sen Zhou, Gaofeng Pan

    Abstract: Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV se… ▽ More

    Submitted 27 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, submitted to IEEE Journal for review

  32. arXiv:2506.12808  [pdf, ps, other

    cs.CV

    Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises

    Authors: Afifa Khaled, Mohammed Sabir, Rizwan Qureshi, Camillo Maria Caruso, Valerio Guarrasi, Suncheng Xiang, S Kevin Zhou

    Abstract: The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research by providing freely accessible, deidentified records from tens of thousands of critical care admissions, enabling a broad spectrum of applications in clinical decision support, outcome prediction, and healthcare analytics. Although numerous studies and surveys have explored the predic… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  33. arXiv:2506.12568  [pdf, ps, other

    cs.CV cs.AI

    MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification

    Authors: Chunjiang Wang, Kun Zhang, Yandong Liu, Zhiyang He, Xiaodong Tao, S. Kevin Zhou

    Abstract: The concept bottleneck model (CBM), as a technique improving interpretability via linking predictions to human-understandable concepts, makes high-risk and life-critical medical image classification credible. Typically, existing CBM methods associate the final layer of visual encoders with concepts to explain the model's predictions. However, we empirically discover the phenomenon of concept prefe… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 7 pages, 6 figures,

    Journal ref: IJCAI2025

  34. arXiv:2506.12349  [pdf, ps, other

    cs.CY cs.AI cs.CL

    Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek

    Authors: Peiran Qiu, Siyi Zhou, Emilio Ferrara

    Abstract: This study examines information suppression mechanisms in DeepSeek, an open-source large language model (LLM) developed in China. We propose an auditing framework and use it to analyze the model's responses to 646 politically sensitive prompts by comparing its final output with intermediate chain-of-thought (CoT) reasoning. Our audit unveils evidence of semantic-level information suppression in De… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  35. arXiv:2506.12287  [pdf, ps, other

    cs.DS

    Relative Error Fair Clustering in the Weak-Strong Oracle Model

    Authors: Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H. -C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou

    Abstract: We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasing… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  36. arXiv:2506.12119  [pdf, ps, other

    cs.CL cs.AI

    Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?

    Authors: Houyi Li, Ka Man Lo, Ziqi Wang, Zili Wang, Wenzhen Zheng, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute. However, can MoEs surpass dense architectures under strictly equal resource constraints - that is, when the total parameter count, training compute, and data budget are identical? This question remains under-explored despite its significant practical… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  37. arXiv:2506.11928  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.LG

    LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

    Authors: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie

    Abstract: Recent reports claim that large language models (LLMs) now outperform elite humans in competitive programming. Drawing on knowledge from a group of medalists in international algorithmic contests, we revisit this claim, examining how LLMs differ from human experts and where limitations still remain. We introduce LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI tha… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Project Page at https://livecodebenchpro.com/

  38. arXiv:2506.10972  [pdf, ps, other

    cs.LG cs.AI

    Farseer: A Refined Scaling Law in Large Language Models

    Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing… ▽ More

    Submitted 14 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 34

    ACM Class: I.2

  39. arXiv:2506.09427  [pdf, other

    cs.CV cs.AI

    A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

    Authors: Yukang Feng, Jianwen Sun, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, Kaipeng Zhang

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed us… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  40. arXiv:2506.09422  [pdf, ps, other

    cs.RO cs.LG

    Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation

    Authors: Ye Niu, Sanping Zhou, Yizhe Li, Ye Den, Le Wang

    Abstract: In many complex scenarios, robotic manipulation relies on generative models to estimate the distribution of multiple successful actions. As the diffusion model has better training robustness than other generative models, it performs well in imitation learning through successful robot demonstrations. However, the diffusion-based policy methods typically require significant time to iteratively denoi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  41. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  42. arXiv:2506.07865  [pdf, ps, other

    cs.CV cs.AI cs.CE cs.LG cs.RO

    FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity

    Authors: Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang

    Abstract: In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the p… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: CVPR 2025. Code and data are available at: https://github.com/vLAR-group/FreeGave

  43. arXiv:2506.07581  [pdf, ps, other

    cs.LG cs.AI cs.DC

    FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning

    Authors: Tan Chen, Jintao Yan, Yuxuan Sun, Sheng Zhou, Zhisheng Niu

    Abstract: Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  44. arXiv:2506.07328  [pdf, ps, other

    cs.LG

    Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification

    Authors: Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical m… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  45. arXiv:2506.06821  [pdf, ps, other

    cs.CL cs.AI cs.SE

    Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

    Authors: Yuhan Cao, Zian Chen, Kun Quan, Ziliang Zhang, Yu Wang, Xiaoning Dong, Yeqi Feng, Guanzhong He, Jingcheng Huang, Jianhao Li, Yixuan Tan, Jiafu Tang, Yilin Tang, Junlei Wu, Qianyu Xiao, Can Zheng, Shouchen Zhou, Yuxiang Zhu, Yiming Huang, Tian Xie, Tianxing He

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test case generation remains largely unexplored. We investigate this problem from the perspective of competition-level programming (CP) programs and propose TCGBench, a… ▽ More

    Submitted 10 June, 2025; v1 submitted 7 June, 2025; originally announced June 2025.

    Comments: 37 pages, 22 figures

  46. arXiv:2506.06199  [pdf, ps, other

    cs.RO cs.CV

    3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

    Authors: Hongyan Zhi, Peihao Chen, Siyuan Zhou, Yubo Dong, Quanxi Wu, Lei Han, Mingkui Tan

    Abstract: Manipulation has long been a challenging task for robots, while humans can effortlessly perform complex interactions with objects, such as hanging a cup on the mug rack. A key reason is the lack of a large and uniform dataset for teaching robots manipulation skills. Current robot datasets often record robot action in different action spaces within a simple scene. This hinders the robot to learn a… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  47. arXiv:2506.05985  [pdf, ps, other

    cs.LG cs.RO

    Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning

    Authors: Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo

    Abstract: A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the contex… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  48. arXiv:2506.05902  [pdf, ps, other

    cs.LG physics.soc-ph

    A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics

    Authors: Shirui Zhou, Jiying Yan, Junfang Tian, Tao Wang, Yongfu Li, Shiquan Zhong

    Abstract: A fundamental challenge in car-following modeling lies in accurately representing the multi-scale complexity of driving behaviors, particularly the intra-driver heterogeneity where a single driver's actions fluctuate dynamically under varying conditions. While existing models, both conventional and data-driven, address behavioral heterogeneity to some extent, they often emphasize inter-driver hete… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  49. arXiv:2506.05864  [pdf, ps, other

    cs.CV

    CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy

    Authors: Jiakai Zhang, Shouchen Zhou, Haizhao Dai, Xinhang Liu, Peihao Wang, Zhiwen Fan, Yuan Pei, Jingyi Yu

    Abstract: Pose estimation from unordered images is fundamental for 3D reconstruction, robotics, and scientific imaging. Recent geometric foundation models, such as DUSt3R, enable end-to-end dense 3D reconstruction but remain underexplored in scientific imaging fields like cryo-electron microscopy (cryo-EM) for near-atomic protein reconstruction. In cryo-EM, pose estimation and 3D reconstruction from unorder… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  50. arXiv:2506.05495  [pdf, ps, other

    cs.DS cs.LG

    Learning-Augmented Hierarchical Clustering

    Authors: Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou

    Abstract: Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the probl… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: ICML 2025; abstract shortened for arxiv requirements