Skip to main content

Showing 1–50 of 2,550 results for author: Wang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04725  [pdf, ps, other

    cs.CV

    Unleashing the Power of Neural Collapse: Consistent Supervised-Unsupervised Alignment for Generalized Category Discovery

    Authors: Jizhou Han, Shaokun Wang, Yuhang He, Chenhao Ding, Qiang Wang, Xinyuan Gao, SongLin Dong, Yihong Gong

    Abstract: Generalized Category Discovery (GCD) focuses on classifying known categories while simultaneously discovering novel categories from unlabeled data. However, previous GCD methods face challenges due to inconsistent optimization objectives and category confusion. This leads to feature overlap and ultimately hinders performance on novel categories. To address these issues, we propose the Neural Colla… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.04632  [pdf, ps, other

    cs.AI cs.LG

    Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

    Authors: Yun Qu, Qi Cheems Wang, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji

    Abstract: Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  3. Multimodal image registration for effective thermographic fever screening

    Authors: C. Y. N. Dwith, Pejhman Ghassemi, Joshua Pfefer, Jon Casamento, Quanzeng Wang

    Abstract: Fever screening based on infrared thermographs (IRTs) is a viable mass screening approach during infectious disease pandemics, such as Ebola and SARS, for temperature monitoring in public places like hospitals and airports. IRTs have found to be powerful, quick and non-invasive methods to detect elevated temperatures. Moreover, regions medially adjacent to the inner canthi (called the canthi regio… ▽ More

    Submitted 29 June, 2025; originally announced July 2025.

    Journal ref: Proceedings Volume 10057, Multimodal Biomedical Imaging XII 100570S, 2017

  4. arXiv:2507.02908  [pdf, ps, other

    cs.LG cs.AI

    Hyperbolic Kernel Graph Neural Networks for Neurocognitive Decline Analysis from Multimodal Brain Imaging

    Authors: Meimei Yang, Yongheng Sun, Qianqian Wang, Andrea Bozoki, Maureen Kohi, Mingxia Liu

    Abstract: Multimodal neuroimages, such as diffusion tensor imaging (DTI) and resting-state functional MRI (fMRI), offer complementary perspectives on brain activities by capturing structural or functional interactions among brain regions. While existing studies suggest that fusing these multimodal data helps detect abnormal brain activity caused by neurocognitive decline, they are generally implemented in E… ▽ More

    Submitted 24 June, 2025; originally announced July 2025.

    Comments: 14 pages, 5 figures, 7 tables

  5. arXiv:2507.01975  [pdf, ps, other

    cs.LG cs.AI physics.flu-dyn

    Learnable-Differentiable Finite Volume Solver for Accelerated Simulation of Flows

    Authors: Mengtao Yan, Qi Wang, Haining Wang, Ruizhi Chengze, Yi Zhang, Hongsheng Liu, Zidong Wang, Fan Yu, Qi Qi, Hao Sun

    Abstract: Simulation of fluid flows is crucial for modeling physical phenomena like meteorology, aerodynamics, and biomedicine. Classical numerical solvers often require fine spatiotemporal grids to satisfy stability, consistency, and convergence conditions, leading to substantial computational costs. Although machine learning has demonstrated better efficiency, they typically suffer from issues of interpre… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

    Comments: 19 pages, 12 figures, accepted at KDD 2025 (ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

  6. arXiv:2507.01949  [pdf, ps, other

    cs.CV

    Kwai Keye-VL Technical Report

    Authors: Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, Fan Yang, Guorui Zhou, Hao Peng, Haojie Ding, Jiaming Huang, Jiangxia Cao, Jiankang Chen, Jingyun Hua, Jin Ouyang, Kaibing Chen, Kaiyu Jiang, Kaiyu Tang, Kun Gai, Shengnan Zhang, Siyang Mao , et al. (35 additional authors not shown)

    Abstract: While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities on static images, they often fall short in comprehending dynamic, information-dense short-form videos, a dominant medium in today's digital landscape. To bridge this gap, we introduce \textbf{Kwai Keye-VL}, an 8-billion-parameter multimodal foundation model engineered for leading-edge performance in short-video unde… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Technical Report: https://github.com/Kwai-Keye/Keye

  7. arXiv:2507.00477  [pdf, ps, other

    cs.IR

    Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training

    Authors: Qi Wang, Yixuan Cao, Yifan Liu, Jiangtao Zhao, Ping Luo

    Abstract: A Retrieval-Augmented Generation (RAG)-based question-answering (QA) system enhances a large language model's knowledge by retrieving relevant documents based on user queries. Discrepancies between user queries and document phrasings often necessitate query rewriting. However, in specialized domains, the rewriter model may struggle due to limited domain-specific knowledge. To resolve this, we prop… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2507.00454  [pdf, ps, other

    cs.CV cs.AI

    ATSTrack: Enhancing Visual-Language Tracking by Aligning Temporal and Spatial Scales

    Authors: Yihao Zhen, Qiang Wang, Yu Qiao, Liangqiong Qu, Huijie Fan

    Abstract: A main challenge of Visual-Language Tracking (VLT) is the misalignment between visual inputs and language descriptions caused by target movement. Previous trackers have explored many effective feature modification methods to preserve more aligned features. However, an important yet unexplored factor ultimately hinders their capability, which is the inherent differences in the temporal and spatial… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  9. arXiv:2507.00430  [pdf, ps, other

    cs.CV

    MFH: Marrying Frequency Domain with Handwritten Mathematical Expression Recognition

    Authors: Huanxin Yang, Qiwen Wang

    Abstract: Handwritten mathematical expression recognition (HMER) suffers from complex formula structures and character layouts in sequence prediction. In this paper, we incorporate frequency domain analysis into HMER and propose a method that marries frequency domain with HMER (MFH), leveraging the discrete cosine transform (DCT). We emphasize the structural analysis assistance of frequency information for… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  10. arXiv:2507.00273  [pdf, ps, other

    cs.RO

    Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation

    Authors: Yusuke Tanaka, Alvin Zhu, Quanyou Wang, Dennis Hong

    Abstract: Reinforcement learning (RL) has enabled significant advances in humanoid robot locomotion, yet most learning frameworks do not account for mechanical intelligence embedded in parallel actuation mechanisms due to limitations in simulator support for closed kinematic chains. This omission can lead to inaccurate motion modeling and suboptimal policies, particularly for robots with high actuation comp… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  11. arXiv:2506.24123  [pdf, ps, other

    cs.CV

    Calligrapher: Freestyle Text Image Customization

    Authors: Yue Ma, Qingyan Bai, Hao Ouyang, Ka Leong Cheng, Qiuyu Wang, Hongyu Liu, Zichen Liu, Haofan Wang, Jingye Chen, Yujun Shen, Qifeng Chen

    Abstract: We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style control and data dependency in typographic customization, our framework incorporates three key technical contributions. First, we develop a self-distillation mechani… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Project page: https://calligrapher2025.github.io/Calligrapher Code: https://github.com/Calligrapher2025/Calligrapher

  12. arXiv:2506.23979  [pdf, ps, other

    cs.CL

    TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation

    Authors: Renren Jin, Tianhao Shen, Xinwei Wu, Dan Shi, Haoran Sun, Wuwei Huang, Quandong Wang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

    Abstract: Conducting supervised fine-tuning and preference fine-tuning on large language models (LLMs) requires high-quality datasets to improve their ability to follow instructions and align with human preferences and values. However, constructing such datasets is resource-intensive, and most available datasets for supervised and preference fine-tuning are in English. To address these challenges, we propos… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 33 pages, 15 tables, 11 figures

  13. arXiv:2506.23650  [pdf, ps, other

    quant-ph cs.IT

    Optimal Quantum Algorithm for Estimating Fidelity to a Pure State

    Authors: Wang Fang, Qisheng Wang

    Abstract: We present an optimal quantum algorithm for fidelity estimation between two quantum states when one of them is pure. In particular, the (square root) fidelity of a mixed state to a pure state can be estimated to within additive error $\varepsilon$ by using $Θ(1/\varepsilon)$ queries to their state-preparation circuits, achieving a quadratic speedup over the folklore $O(1/\varepsilon^2)$. Our appro… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 14 pages. To appear in ESA 2025

  14. arXiv:2506.23603  [pdf

    cs.CR cs.AI

    SoK: Semantic Privacy in Large Language Models

    Authors: Baihe Ma, Yanna Jiang, Xu Wang, Guangshen Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu

    Abstract: As Large Language Models (LLMs) are increasingly deployed in sensitive domains, traditional data privacy measures prove inadequate for protecting information that is implicit, contextual, or inferable - what we define as semantic privacy. This Systematization of Knowledge (SoK) introduces a lifecycle-centric framework to analyze how semantic privacy risks emerge across input processing, pretrainin… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  15. arXiv:2506.22727  [pdf, ps, other

    cs.CR

    Convergent Privacy Framework with Contractive GNN Layers for Multi-hop Aggregations

    Authors: Yu Zheng, Chenang Li, Zhou Li, Qingsong Wang

    Abstract: Differential privacy (DP) has been integrated into graph neural networks (GNNs) to protect sensitive structural information, e.g., edges, nodes, and associated features across various applications. A common approach is to perturb the message-passing process, which forms the core of most GNN architectures. However, existing methods typically incur a privacy cost that grows linearly with the number… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 23 pages

  16. arXiv:2506.22065  [pdf, ps, other

    cs.CV

    MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation

    Authors: Dechao Meng, Steven Xiao, Xindi Zhang, Guangyuan Wang, Peng Zhang, Qi Wang, Bang Zhang, Liefeng Bo

    Abstract: Audio-driven portrait animation, which synthesizes realistic videos from reference images using audio signals, faces significant challenges in real-time generation of high-fidelity, temporally coherent animations. While recent diffusion-based methods improve generation quality by integrating audio into denoising processes, their reliance on frame-by-frame UNet architectures introduces prohibitive… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 8 pages, 6 figures

  17. arXiv:2506.21932  [pdf, ps, other

    math.NA cs.CE cs.PF

    StructMG: A Fast and Scalable Structured Algebraic Multigrid

    Authors: Yi Zong, Peinan Yu, Haopeng Huang, Zhengding Hu, Xinliang Wang, Qin Wang, Chensong Zhang, Xiaowen Xu, Jian Sun, Yongxiao Zhou, Wei Xue

    Abstract: Parallel multigrid is widely used as preconditioners in solving large-scale sparse linear systems. However, the current multigrid library still needs more satisfactory performance for structured grid problems regarding speed and scalability. Based on the classical 'multigrid seesaw', we derive three necessary principles for an efficient structured multigrid, which instructs our design and implemen… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  18. arXiv:2506.21458  [pdf, ps, other

    cs.AI cs.CL cs.CV

    Spatial Mental Modeling from Limited Views

    Authors: Baiqiao Yin, Qineng Wang, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei

    Abstract: Can Vision Language Models (VLMs) imagine the full scene from just a few views, like humans do? Humans form spatial mental models, internal representations of unseen space, to reason about layout, perspective, and motion. Our new MindCube benchmark with 21,154 questions across 3,268 images exposes this critical gap, where existing VLMs exhibit near-random performance. Using MindCube, we systematic… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Preprint version

  19. arXiv:2506.20178  [pdf, ps, other

    cs.CL cs.AI cs.LG

    COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees

    Authors: Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) for foundation models is essential to identify and mitigate potential hallucinations in automatically generated text. However, heuristic UQ approaches lack formal guarantees for key metrics such as the false discovery rate (FDR) in selective prediction. Previous work adopts the split conformal prediction (SCP) framework to ensure desired coverage of admissible answe… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  20. arXiv:2506.19998  [pdf, ps, other

    cs.CL

    Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation

    Authors: Xinyi Ni, Haonan Jian, Qiuyang Wang, Vedanshi Chetan Shah, Pengyu Hong

    Abstract: REST APIs play important roles in enriching the action space of web agents, yet most API-based agents rely on curated and uniform toolsets that do not reflect the complexity of real-world APIs. Building tool-using agents for arbitrary domains remains a major challenge, as it requires reading unstructured API documentation, testing APIs and inferring correct parameters. We propose Doc2Agent, a scal… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  21. arXiv:2506.19558  [pdf, ps, other

    cs.LG cs.CV

    ConCM: Consistency-Driven Calibration and Matching for Few-Shot Class-Incremental Learning

    Authors: QinZhe Wang, Zixuan Chen, Keke Huang, Xiu Su, Chunhua Yang, Chang Xu

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) requires models to adapt to novel classes with limited supervision while preserving learned knowledge. Existing prospective learning-based space construction methods reserve space to accommodate novel classes. However, prototype deviation and structure fixity limit the expressiveness of the embedding space. In contrast to fixed space reservation, we expl… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures(Excluding the appendix)

    MSC Class: 68T40 ACM Class: I.2.6; I.4.9

  22. arXiv:2506.18717  [pdf

    cs.CE cs.AI

    A Study of Dynamic Stock Relationship Modeling and S&P500 Price Forecasting Based on Differential Graph Transformer

    Authors: Linyue Hu, Qi Wang

    Abstract: Stock price prediction is vital for investment decisions and risk management, yet remains challenging due to markets' nonlinear dynamics and time-varying inter-stock correlations. Traditional static-correlation models fail to capture evolving stock relationships. To address this, we propose a Differential Graph Transformer (DGT) framework for dynamic relationship modeling and price prediction. Our… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  23. arXiv:2506.18575  [pdf, ps, other

    cs.CV

    2D Triangle Splatting for Direct Differentiable Mesh Training

    Authors: Kaifeng Sheng, Zheng Zhou, Yingliang Peng, Qianwei Wang

    Abstract: Differentiable rendering with 3D Gaussian primitives has emerged as a powerful method for reconstructing high-fidelity 3D scenes from multi-view images. While it offers improvements over NeRF-based methods, this representation still encounters challenges with rendering speed and advanced rendering effects, such as relighting and shadow rendering, compared to mesh-based models. In this paper, we pr… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: 13 pages, 8 figures

  24. arXiv:2506.18013  [pdf, ps, other

    cs.DB cs.DS

    Dual-Hierarchy Labelling: Scaling Up Distance Queries on Dynamic Road Networks

    Authors: Muhammad Farhan, Henning Koehler, Qing Wang

    Abstract: Computing the shortest-path distance between any two given vertices in road networks is an important problem. A tremendous amount of research has been conducted to address this problem, most of which are limited to static road networks. Since road networks undergo various real-time traffic conditions, there is a pressing need to address this problem for dynamic road networks. Existing state-of-the… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  25. arXiv:2506.17728  [pdf, ps, other

    cs.CL cs.AI

    KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation

    Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo, Haofen Wang, Huajun Chen

    Abstract: In this paper, we introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM). Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process in question-answering (Q&A) tasks on… ▽ More

    Submitted 30 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

  26. arXiv:2506.17539  [pdf, ps, other

    cs.SE

    Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing

    Authors: Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Mengfei Wang, Chunyang Chen

    Abstract: The growing dependence on mobile phones and their apps has made multi-user interactive features, like chat calls, live streaming, and video conferencing, indispensable for bridging the gaps in social connectivity caused by physical and situational barriers. However, automating these interactive features for testing is fraught with challenges, owing to their inherent need for timely, dynamic, and c… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to International Conference on Software Engineering (ICSE 2026). arXiv admin note: substantial text overlap with arXiv:2504.15474

  27. arXiv:2506.16735  [pdf, other

    cs.CV eess.IV

    3DeepRep: 3D Deep Low-rank Tensor Representation for Hyperspectral Image Inpainting

    Authors: Yunshan Li, Wenwu Gong, Qianqian Wang, Chao Wang, Lili Yang

    Abstract: Recent approaches based on transform-based tensor nuclear norm (TNN) have demonstrated notable effectiveness in hyperspectral image (HSI) inpainting by leveraging low-rank structures in latent representations. Recent developments incorporate deep transforms to improve low-rank tensor representation; however, existing approaches typically restrict the transform to the spectral mode, neglecting low-… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  28. arXiv:2506.16716  [pdf, ps, other

    cs.HC

    V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos

    Authors: Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin

    Abstract: Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  29. arXiv:2506.15704  [pdf, other

    cs.LG cs.AI cs.CL

    Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding

    Authors: Feiyu Yao, Qian Wang

    Abstract: As large language models (LLMs) continue to support increasingly longer contexts, the memory demand for key-value (KV) caches during decoding grows rapidly, becoming a critical bottleneck in both GPU memory capacity and PCIe bandwidth. Sparse attention mechanisms alleviate this issue by computing attention weights only for selected key-value pairs. However, their indexing computation typically req… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  30. arXiv:2506.15451  [pdf, ps, other

    cs.CL

    AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Yin Cai, Hao Shen, Xingzhou Chen, Qingyi Wang, Jialin Li, Xiaoran Shi, Haoran Guo, Wenxuan Huang, Hongwei Feng, Yanghua Xiao, Zheyu Ye, Yao Hu, Shaosheng Cao

    Abstract: Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-domain generalizability, and performance guarantees, particularly as task complexity and number of agents increases. We introduces AgentGroupChat-V2, a novel framewo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  31. arXiv:2506.15242  [pdf, ps, other

    cs.CV

    RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have emerged as powerful tools for 3D reconstruction and SLAM tasks. However, their performance depends heavily on accurate camera pose priors. Existing approaches attempt to address this issue by introducing external constraints but fall short of achieving satisfactory accuracy, particularly when camera trajectories are complex. In th… ▽ More

    Submitted 24 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: IROS 2025

  32. arXiv:2506.14549  [pdf, ps, other

    cs.CV

    DreamLight: Towards Harmonious and Consistent Image Relighting

    Authors: Yong Liu, Wenpeng Xiao, Qianqian Wang, Junlin Chen, Shiyin Wang, Yitong Wang, Xinglong Wu, Yansong Tang

    Abstract: We introduce a model named DreamLight for universal image relighting in this work, which can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone. The background can be specified by natural images (image-based relighting) or generated from unlimited text prompts (text-based relighting). Existing studies primarily focus on im… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  33. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  34. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  35. arXiv:2506.13492  [pdf, ps, other

    cs.CV

    GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

    Authors: Chengrui Zhang, Maizhen Ning, Zihao Zhou, Jie Sun, Kaizhu Huang, Qiufeng Wang

    Abstract: Plane Geometry Diagram Synthesis has been a crucial task in computer graphics, with applications ranging from educational tools to AI-driven mathematical reasoning. Traditionally, we rely on computer tools (e.g., Matplotlib and GeoGebra) to manually generate precise diagrams, but it usually requires huge, complicated calculations cost. Recently, researchers start to work on learning-based methods… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  36. arXiv:2506.12909  [pdf, ps, other

    cs.CL

    SciDA: Scientific Dynamic Assessor of LLMs

    Authors: Junting Zhou, Tingjia Miao, Yiyan Liao, Qichao Wang, Zhoufutu Wen, Yanqin Wang, Yunjie Huang, Ge Yan, Leqi Wang, Yucheng Xia, Hongwan Gao, Yuansong Zeng, Renjie Zheng, Chen Dun, Yitao Liang, Tong Yang, Wenhao Huang, Ge Zhang

    Abstract: Advancement in Large Language Models (LLMs) reasoning capabilities enables them to solve scientific problems with enhanced efficacy. Thereby, a high-quality benchmark for comprehensive and appropriate assessment holds significance, while existing ones either confront the risk of data contamination or lack involved disciplines. To be specific, due to the data source overlap of LLMs training and sta… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  37. arXiv:2506.12830  [pdf, ps, other

    cs.CV

    ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

    Authors: Chenglin Wang, Yucheng Zhou, Qianning Wang, Zhe Wang, Kai Zhang

    Abstract: Text-driven image editing has achieved remarkable success in following single instructions. However, real-world scenarios often involve complex, multi-step instructions, particularly ``chain'' instructions where operations are interdependent. Current models struggle with these intricate directives, and existing benchmarks inadequately evaluate such capabilities. Specifically, they often overlook m… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 7 Pages

  38. arXiv:2506.12622  [pdf, ps, other

    cs.LG cs.AI math.OC

    DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

    Authors: Mingxuan Cui, Duo Zhou, Yuxuan Han, Grani A. Hanasusanto, Qiong Wang, Huan Zhang, Zhengyuan Zhou

    Abstract: Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties. To solve this challenge, some robust RL algorithms have been proposed, but most are limited to tabular settings. In this work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designe… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 24 Pages

  39. arXiv:2506.12559  [pdf, ps, other

    cs.IT math.CO

    On the cross-correlation properties of large-size families of Costas arrays

    Authors: Runfeng Liu, Qi Wang

    Abstract: Costas arrays have been an interesting combinatorial object for decades because of their optimal aperiodic auto-correlation properties. Meanwhile, it is interesting to find families of Costas arrays or extended arrays with small maximal cross-correlation values, since for applications in multi-user systems, the cross-interferences between different signals should also be small. The objective of th… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  40. arXiv:2506.12421  [pdf, ps, other

    cs.AI cs.CL

    Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM

    Authors: Dongjie Yang, Chengqiang Lu, Qimeng Wang, Xinbei Ma, Yan Gao, Yao Hu, Hai Zhao

    Abstract: Travel planning is a complex task requiring the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints and preferences in the context, leading to suboptimal itineraries. We formulate this as an $L^3$ planning problem, emphasizing long context, long instruction, and long… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  41. arXiv:2506.11886  [pdf, ps, other

    cs.CL

    Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

    Authors: Xiaoran Liu, Siyang He, Qiqi Wang, Ruixiao Li, Yuerong Song, Zhigeng Liu, Linlin Li, Qun Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu

    Abstract: Large Language Models struggle with memory demands from the growing Key-Value (KV) cache as context lengths increase. Existing compression methods homogenize head dimensions or rely on attention-guided token pruning, often sacrificing accuracy or introducing computational overhead. We propose FourierAttention, a training-free framework that exploits the heterogeneous roles of transformer head dime… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 pages, 7 figures, work in progress

  42. arXiv:2506.11070  [pdf, ps, other

    cs.CL

    Targeted control of fast prototyping through domain-specific interface

    Authors: Yu-Zhe Shi, Mingchen Liu, Hanlu Ma, Qiao Xu, Huamin Qu, Kun He, Lecheng Ruan, Qining Wang

    Abstract: Industrial designers have long sought a natural and intuitive way to achieve the targeted control of prototype models -- using simple natural language instructions to configure and adjust the models seamlessly according to their intentions, without relying on complex modeling commands. While Large Language Models have shown promise in this area, their potential for controlling prototype models thr… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: In International Conference on Machine Learning (ICML'25)

  43. arXiv:2506.10972  [pdf, ps, other

    cs.LG cs.AI

    Farseer: A Refined Scaling Law in Large Language Models

    Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing… ▽ More

    Submitted 14 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 34

    ACM Class: I.2

  44. arXiv:2506.10574  [pdf, ps, other

    cs.CV cs.MM cs.SD eess.AS

    DanceChat: Large Language Model-Guided Music-to-Dance Generation

    Authors: Qing Wang, Xiaohang Yang, Yilan Dong, Naveen Raj Govindaraj, Gregory Slabaugh, Shanxin Yuan

    Abstract: Music-to-dance generation aims to synthesize human dance motion conditioned on musical input. Despite recent progress, significant challenges remain due to the semantic gap between music and dance motion, as music offers only abstract cues, such as melody, groove, and emotion, without explicitly specifying the physical movements. Moreover, a single piece of music can produce multiple plausible dan… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: check demos at https://dancechat.github.io/anon/

  45. arXiv:2506.10484  [pdf, ps, other

    cs.SE

    EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair

    Authors: Fangwen Mu, Junjie Wang, Lin Shi, Song Wang, Shoubin Li, Qing Wang

    Abstract: Automatically repairing software issues remains a fundamental challenge at the intersection of software engineering and AI. Although recent advancements in Large Language Models (LLMs) have demonstrated potential for repository-level repair tasks, current methodologies exhibit two notable limitations: (1) they often address issues in isolation, neglecting to incorporate insights from previously re… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  46. arXiv:2506.10395  [pdf, ps, other

    cs.CV cs.AI

    Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

    Authors: Zhiyang Xu, Jiuhai Chen, Zhaojiang Lin, Xichen Pan, Lifu Huang, Tianyi Zhou, Madian Khabsa, Qifan Wang, Di Jin, Michihiro Yasunaga, Lili Yu, Xi Victoria Lin, Shaoliang Nie

    Abstract: Recent advances in large language models (LLMs) have enabled multimodal foundation models to tackle both image understanding and generation within a unified framework. Despite these gains, unified models often underperform compared to specialized models in either task. A key challenge in developing unified models lies in the inherent differences between the visual features needed for image underst… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Unified image understanding and generation model

  47. arXiv:2506.10344  [pdf, ps, other

    cs.CV

    RealKeyMorph: Keypoints in Real-world Coordinates for Resolution-agnostic Image Registration

    Authors: Mina C. Moghadam, Alan Q. Wang, Omer Taub, Martin R. Prince, Mert R. Sabuncu

    Abstract: Many real-world settings require registration of a pair of medical images that differ in spatial resolution, which may arise from differences in image acquisition parameters like pixel spacing, slice thickness, and field-of-view. However, all previous machine learning-based registration techniques resample images onto a fixed resolution. This is suboptimal because resampling can introduce artifact… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 23 pages, 8 figures, to be submitted to MELBA

  48. arXiv:2506.10116  [pdf, ps, other

    cs.CL

    ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering

    Authors: Caijun Jia, Nan Xu, Jingxuan Wei, Qingli Wang, Lei Wang, Bihui Yu, Junnan Zhu

    Abstract: Recently, large language models have shown remarkable reasoning capabilities through long-chain reasoning before responding. However, how to extend this capability to visual reasoning tasks remains an open challenge. Existing multimodal reasoning approaches transfer such visual reasoning task into textual reasoning task via several image-to-text conversions, which often lose critical structural an… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  49. arXiv:2506.09080  [pdf, other

    cs.LG cs.AI q-fin.CP

    FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making

    Authors: Jiaxiang Chen, Mingxi Zou, Zhuo Wang, Qifan Wang, Dongning Sun, Chi Zhang, Zenglin Xu

    Abstract: Financial decision-making presents unique challenges for language models, demanding temporal reasoning, adaptive risk assessment, and responsiveness to dynamic events. While large language models (LLMs) show strong general reasoning capabilities, they often fail to capture behavioral patterns central to human financial decisions-such as expert reliance under information asymmetry, loss-averse sens… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  50. arXiv:2506.08440  [pdf, ps, other

    cs.RO cs.AI

    TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

    Authors: Zengjue Chen, Runliang Niu, He Kong, Qi Wang

    Abstract: Recent advances in Vision-Language-Action (VLA) model have demonstrated strong generalization capabilities across diverse scenes, tasks, and robotic platforms when pretrained at large-scale datasets. However, these models still require task-specific fine-tuning in novel environments, a process that relies almost exclusively on supervised fine-tuning (SFT) using static trajectory datasets. Such app… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.