Skip to main content

Showing 1–50 of 139 results for author: Qian, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26529  [pdf, ps, other

    cs.DC cs.SE

    CSnake: Detecting Self-Sustaining Cascading Failure via Causal Stitching of Fault Propagations

    Authors: Shangshu Qian, Lin Tan, Yongle Zhang

    Abstract: Recent studies have revealed that self-sustaining cascading failures in distributed systems frequently lead to widespread outages, which are challenging to contain and recover from. Existing failure detection techniques struggle to expose such failures prior to deployment, as they typically require a complex combination of specific conditions to be triggered. This challenge stems from the inherent… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: Accepted by EuroSys 2026

  2. arXiv:2509.24986  [pdf, ps, other

    cs.GR cs.AI cs.CV

    Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes

    Authors: Yuhan Wang, Weikai Chen, Zeyu Hu, Runze Zhang, Yingda Yin, Ruoyu Wu, Keyang Luo, Shengju Qian, Yiyan Ma, Hongyi Li, Yuan Gao, Yuhuan Zhou, Hao Luo, Wan Wang, Xiaobin Shen, Zhaowei Li, Kuixin Zhu, Chuanlang Hong, Yueyue Wang, Lijie Feng, Xin Wang, Chen Change Loy

    Abstract: In user-generated-content (UGC) applications, non-expert users often rely on image-to-3D generative models to create 3D assets. In this context, primitive-based shape abstraction offers a promising solution for UGC scenarios by compressing high-resolution meshes into compact, editable representations. Towards this end, effective shape abstraction must therefore be structure-aware, characterized by… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: SIGGRAPH Asia 2025. Project Page https://johann.wang/Light-SQ/

  3. arXiv:2509.22186  [pdf, ps, other

    cs.CV cs.CL

    MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

    Authors: Junbo Niu, Zheng Liu, Zhuangcheng Gu, Bin Wang, Linke Ouyang, Zhiyuan Zhao, Tao Chu, Tianyao He, Fan Wu, Qintong Zhang, Zhenjiang Jin, Guang Liang, Rui Zhang, Wenzheng Zhang, Yuan Qu, Zhifei Ren, Yuefeng Sun, Yuanhong Zheng, Dongsheng Ma, Zirui Tang, Boyu Niu, Ziyang Miao, Hejun Dong, Siyi Qian, Junyuan Zhang , et al. (36 additional authors not shown)

    Abstract: We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsamp… ▽ More

    Submitted 29 September, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

    Comments: Technical Report; GitHub Repo: https://github.com/opendatalab/MinerU Hugging Face Model: https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B Hugging Face Demo: https://huggingface.co/spaces/opendatalab/MinerU

  4. arXiv:2509.20557  [pdf, ps, other

    cs.CL

    SiniticMTError: A Machine Translation Dataset with Error Annotations for Sinitic Languages

    Authors: Hannah Liu, Junghyun Min, Ethan Yue Heng Cheung, Shou-Yi Hung, Syed Mekael Wasti, Runtong Liang, Shiyao Qian, Shizhao Zheng, Elsie Chan, Ka Ieng Charlotte Lo, Wing Yu Yip, Richard Tzong-Han Tsai, En-Shiun Annie Lee

    Abstract: Despite major advances in machine translation (MT) in recent years, progress remains limited for many low-resource languages that lack large-scale training data and linguistic resources. Cantonese and Wu Chinese are two Sinitic examples, although each enjoys more than 80 million speakers around the world. In this paper, we introduce SiniticMTError, a novel dataset that builds on existing parallel… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: Work in progress. 14 pages, 4 figures, 5 tables

  5. arXiv:2509.11134  [pdf, ps, other

    cs.DC

    GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management

    Authors: Jiaang Duan, Shenglin Xu, Shiyou Qian, Dingyu Yang, Kangjin Wang, Chenzhi Liao, Yinghao Yu, Qin Hua, Hanwen Hu, Qi Wang, Wenchao Wu, Dongqing Bao, Tianyu Lu, Jian Cao, Guangtao Xue, Guodong Yang, Liping Zhang, Gang Chen

    Abstract: The surge in large language models (LLMs) has fundamentally reshaped the landscape of GPU usage patterns, creating an urgent need for more efficient management strategies. While cloud providers employ spot instances to reduce costs for low-priority (LP) tasks, existing schedulers still grapple with high eviction rates and lengthy queuing times. To address these limitations, we present GFS, a novel… ▽ More

    Submitted 14 September, 2025; originally announced September 2025.

    Comments: This paper has been accepted to the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2026)

  6. arXiv:2509.07538  [pdf, ps, other

    cs.CV

    TextlessRAG: End-to-End Visual Document RAG by Speech Without Text

    Authors: Peijin Xie, Shun Qian, Bingquan Liu, Dexin Wang, Lin Sun, Xiangzheng Zhang

    Abstract: Document images encapsulate a wealth of knowledge, while the portability of spoken queries enables broader and flexible application scenarios. Yet, no prior work has explored knowledge base question answering over visual document images with queries provided directly in speech. We propose TextlessRAG, the first end-to-end framework for speech-based question answering over large-scale document imag… ▽ More

    Submitted 10 September, 2025; v1 submitted 9 September, 2025; originally announced September 2025.

    Comments: 5 pages, 4 figures,

  7. arXiv:2509.01584  [pdf, ps, other

    cs.CV

    ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

    Authors: Ganlin Zhang, Shenhan Qian, Xi Wang, Daniel Cremers

    Abstract: We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This desi… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: Project page: https://ganlinzhang.xyz/vista-slam/

  8. arXiv:2508.18569  [pdf, ps, other

    cs.CL cs.CV

    The Mind's Eye: A Multi-Faceted Reward Framework for Guiding Visual Metaphor Generation

    Authors: Girish A. Koushik, Fatemeh Nazarieh, Katherine Birch, Shenbin Qian, Diptesh Kanojia

    Abstract: Visual metaphor generation is a challenging task that aims to generate an image given an input text metaphor. Inherently, it needs language understanding to bind a source concept with a target concept, in a way that preserves meaning while ensuring visual coherence. We propose a self-evaluating visual metaphor generation framework that focuses on metaphor alignment. Our self-evaluation approach co… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: Under Review

  9. arXiv:2508.13401  [pdf, ps, other

    cs.CV

    AIM 2025 Rip Current Segmentation (RipSeg) Challenge Report

    Authors: Andrei Dumitriu, Florin Miron, Florin Tatui, Radu Tudor Ionescu, Radu Timofte, Aakash Ralhan, Florin-Alexandru Vasluianu, Shenyang Qian, Mitchell Harley, Imran Razzak, Yang Song, Pu Luo, Yumei Li, Cong Xu, Jinming Chai, Kexin Zhang, Licheng Jiao, Lingling Li, Siqi Yu, Chao Zhang, Kehuan Song, Fang Liu, Puhua Chen, Xu Liu, Jin Hu , et al. (2 additional authors not shown)

    Abstract: This report presents an overview of the AIM 2025 RipSeg Challenge, a competition designed to advance techniques for automatic rip current segmentation in still images. Rip currents are dangerous, fast-moving flows that pose a major risk to beach safety worldwide, making accurate visual detection an important and underexplored research task. The challenge builds on RipVIS, the largest available rip… ▽ More

    Submitted 3 September, 2025; v1 submitted 18 August, 2025; originally announced August 2025.

    Comments: Challenge report paper from AIM2025 Workshop at ICCVW 2025

    MSC Class: cs.AI ACM Class: I.4.0; I.4.9

  10. arXiv:2508.07484  [pdf, ps, other

    cs.CL cs.AI

    ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

    Authors: Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew, Constantin Orasan, Diptesh Kanojia

    Abstract: Large Language Models (LLMs) have shown remarkable performance across a wide range of natural language processing tasks. Quality Estimation (QE) for Machine Translation (MT), which assesses the quality of a source-target pair without relying on reference translations, remains a challenging cross-lingual task for LLMs. The challenges stem from the inherent limitations of existing LLM-based QE syste… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

    Comments: Accepted to COLM 2025 Conference

  11. arXiv:2508.01070  [pdf, ps, other

    cs.HC

    How Long Does It Take to Alleviate Discomfort? A Preliminary Study on Reducing Cybersickness in Novice Users

    Authors: Zhengxin Zhang, Shufang Qian, Yi Wang, Xiao Liu, Thuong Hoang, Chetan Arora, Jingjing Zhang, Henry Been Lirn Duh

    Abstract: Cybersickness significantly impacts the user experience in VR applications. Locomotion tunneling is a widely adopted technique for mitigating cybersickness in susceptible users. However, there is a lack of research investigating the effects of prolonged use of locomotion tunneling among novice users. To fill this gap, we used VRChat as our experimental platform. We recruited 24 novice VR users, de… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

  12. arXiv:2507.10548  [pdf, ps, other

    cs.CV cs.AI cs.CL

    EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

    Authors: Mingxian Lin, Wei Huang, Yitang Li, Chengjie Jiang, Kui Wu, Fangwei Zhong, Shengju Qian, Xin Wang, Xiaojuan Qi

    Abstract: Recent advanced vision-language models(VLMs) have demonstrated strong performance on passive, offline image and video understanding tasks. However, their effectiveness in embodied settings, which require online interaction and active scene understanding remains limited. In such scenarios, an agent perceives the environment from a first-person perspective, with each action dynamically shaping subse… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

    Comments: Project page: https://mxllc.github.io/EmbRACE-3K/

  13. arXiv:2507.02345  [pdf, ps, other

    q-bio.BM cs.AI

    HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3

    Authors: Jie Gao, Jing Hu, Shanzhuo Zhang, Kunrui Zhu, Sheng Qian, Yueyang Huang, Xiaonan Zhang, Xiaomin Fang

    Abstract: Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction mo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  14. arXiv:2507.01961  [pdf, ps, other

    cs.RO cs.AI

    AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

    Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

    Abstract: Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumu… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: Project website: https://ac-dit.github.io/

  15. arXiv:2506.22499  [pdf, ps, other

    cs.CV cs.AI stat.AP

    Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data

    Authors: Jiachao Liu, Pablo Guarda, Koichiro Niinuma, Sean Qian

    Abstract: This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, leveraging high-resolution satellite imagery together with conventional traffic data from local sensors. Unlike sparse local detectors, satellite imagery offers consistent, city-wide road and traffic information of both parking and moving vehicles, over… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  16. arXiv:2506.19743  [pdf, ps, other

    cs.IR cs.CL

    NEAR$^2$: A Nested Embedding Approach to Efficient Product Retrieval and Ranking

    Authors: Shenbin Qian, Diptesh Kanojia, Samarth Agrawal, Hadeel Saadany, Swapnil Bhosale, Constantin Orasan, Zhe Wu

    Abstract: E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Emb… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This paper is accepted to the 2025 SIGIR Workshop on eCommerce

  17. arXiv:2506.15523  [pdf, ps, other

    cs.PF

    Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud Microservices

    Authors: Jiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across various nodes as encapsulated containers. Given their vast scale, even minor performance enhancements can lead to significant cost reductions. In this paper, we introduce Atys1, an efficient profiling… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  18. arXiv:2505.24163  [pdf, ps, other

    cs.CL cs.AI

    LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing

    Authors: Jiaqi Sun, Shiyou Qian, Zhangchi Han, Wei Li, Zelin Qian, Dingyu Yang, Jian Cao, Guangtao Xue

    Abstract: Knowledge Graphs (KGs) structure real-world entities and their relationships into triples, enhancing machine reasoning for various tasks. While domain-specific KGs offer substantial benefits, their manual construction is often inefficient and requires specialized knowledge. Recent approaches for knowledge graph construction (KGC) based on large language models (LLMs), such as schema-guided KGC and… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Submitting to EDBT 2026

  19. arXiv:2505.21019  [pdf, ps, other

    eess.IV cs.LG

    Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants

    Authors: Devran Ugurlu, Shuang Qian, Elliot Fairweather, Charlene Mauger, Bram Ruijsink, Laura Dal Toso, Yu Deng, Marina Strocchi, Reza Razavi, Alistair Young, Pablo Lamata, Steven Niederer, Martin Bishop

    Abstract: A cardiac digital twin is a virtual replica of a patient's heart for screening, diagnosis, prognosis, risk assessment, and treatment planning of cardiovascular diseases. This requires an anatomically accurate patient-specific 3D structural representation of the heart, suitable for electro-mechanical simulations or study of disease mechanisms. However, generation of cardiac digital twins at scale i… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  20. arXiv:2505.19874  [pdf, ps, other

    cs.CV cs.AI cs.MM

    StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

    Authors: Yi Wu, Lingting Zhu, Shengju Qian, Lei Liu, Wandi Qiao, Lequan Yu, Bin Li

    Abstract: In the current research landscape, multimodal autoregressive (AR) models have shown exceptional capabilities across various domains, including visual understanding and generation. However, complex tasks such as style-aligned text-to-image generation present significant challenges, particularly in data acquisition. In analogy to instruction-following tuning for image editing of AR models, style-ali… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  21. arXiv:2505.15074  [pdf, ps, other

    cs.CL cs.AI cs.LG

    DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

    Authors: Yuhang Zhou, Jing Zhu, Shengyi Qian, Zhuokai Zhao, Xiyao Wang, Xiaoyu Liu, Ming Li, Paiheng Xu, Wei Ai, Furong Huang

    Abstract: Large Language Models (LLMs) are increasingly aligned with human preferences through Reinforcement Learning from Human Feedback (RLHF). Among RLHF methods, Group Relative Policy Optimization (GRPO) has gained attention for its simplicity and strong performance, notably eliminating the need for a learned value function. However, GRPO implicitly assumes a balanced domain distribution and uniform sem… ▽ More

    Submitted 24 September, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by EMNLP 2025 Findings

  22. arXiv:2504.14221  [pdf, other

    cs.CV

    Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

    Authors: Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin-Bin Gao, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yulong Chen, Shuguang Qian, Mingmin Chi, Bo Peng, Lizhuang Ma

    Abstract: The increasing complexity of industrial anomaly detection (IAD) has positioned multimodal detection methods as a focal area of machine vision research. However, dedicated multimodal datasets specifically tailored for IAD remain limited. Pioneering datasets like MVTec 3D have laid essential groundwork in multimodal IAD by incorporating RGB+3D data, but still face challenges in bridging the gap with… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 13 pages. Dataset and code: https://realiad4ad.github.io/Real-IAD D3

  23. arXiv:2504.10878  [pdf, other

    cs.CV cs.AI cs.LG

    Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content

    Authors: Yilang Peng, Sijia Qian, Yingdan Lu, Cuihua Shen

    Abstract: In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to the diversity and richness of visual features. We introduce a Large Language Model (LLM)-informed feature discovery framework that leverages multimodal LLMs, s… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 26 pages

    ACM Class: I.4.9; J.4

  24. arXiv:2503.23746  [pdf, ps, other

    cs.CV cs.CL cs.LG cs.MM cs.SI

    Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model

    Authors: Dizhan Xue, Shengsheng Qian, Chuanrui Hu, Changsheng Xu

    Abstract: Short-video platforms have gained immense popularity, captivating the interest of millions, if not billions, of users globally. Recently, researchers have highlighted the significance of analyzing the propagation of short-videos, which typically involves discovering commercial values, public opinions, user behaviors, etc. This paper proposes a new Short-video Propagation Influence Rating (SPIR) ta… ▽ More

    Submitted 4 September, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  25. arXiv:2503.23512  [pdf, ps, other

    cs.CL

    SCORE: Story Coherence and Retrieval Enhancement for AI Narratives

    Authors: Qiang Yi, Yangfan He, Jianhui Wang, Xinyuan Song, ShiYao Qian, Xinhang Yuan, Yi Xin, Yijin Wang, Jingqun Tang, Yuchen Li, Junjiang Lin, Hongyang He, Zhen Tian, Tianxiang Xu, Keqin Li, Kuan Lu, Menghao Huo, Jiaqi Chen, Miao Zhang, Tianyu Shi, Jianyuan Ni

    Abstract: Large Language Models (LLMs) can generate creative and engaging narratives from user-specified input, but maintaining coherence and emotional depth throughout these AI-generated stories remains a challenge. In this work, we propose SCORE, a framework for Story Coherence and Retrieval Enhancement, designed to detect and resolve narrative inconsistencies. By tracking key item statuses and generating… ▽ More

    Submitted 17 September, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  26. arXiv:2503.20519  [pdf, other

    cs.CV

    MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

    Authors: Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee

    Abstract: Recent advances in auto-regressive transformers have revolutionized generative modeling across different domains, from language processing to visual generation, demonstrating remarkable capabilities. However, applying these advances to 3D generation presents three key challenges: the unordered nature of 3D data conflicts with sequential next-token prediction paradigm, conventional vector quantizat… ▽ More

    Submitted 20 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: CVPR 2025 Highlight: https://jinnan-chen.github.io/projects/MAR-3D/

  27. arXiv:2503.18461  [pdf, other

    cs.CV

    MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing

    Authors: Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu

    Abstract: Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appear… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 17 pages, 14 figures

  28. arXiv:2503.16158  [pdf, other

    cs.CL

    Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: Evaluating machine translation (MT) of user-generated content (UGC) involves unique challenges such as checking whether the nuance of emotions from the source are preserved in the target text. Recent studies have proposed emotion-related datasets, frameworks and models to automatically evaluate MT quality of Chinese UGC, without relying on reference translations. However, whether these models are… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted to the 10th Workshop on Noisy and User-generated Text at NAACL 2025

  29. arXiv:2502.18309  [pdf, ps, other

    cs.GR cs.CV cs.SD eess.AS

    GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation

    Authors: Xinran Liu, Xu Dong, Shenbin Qian, Diptesh Kanojia, Wenwu Wang, Zhenhua Feng

    Abstract: Music-driven dance generation is a challenging task as it requires strict adherence to genre-specific choreography while ensuring physically realistic and precisely synchronized dance sequences with the music's beats and rhythm. Although significant progress has been made in music-conditioned dance generation, most existing methods struggle to convey specific stylistic attributes in generated danc… ▽ More

    Submitted 29 September, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  30. arXiv:2502.10810  [pdf, other

    cs.CV

    SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

    Authors: Zhenyu Yang, Yuhang Hu, Zemin Du, Dizhan Xue, Shengsheng Qian, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu

    Abstract: Despite the significant advancements of Large Vision-Language Models (LVLMs) on established benchmarks, there remains a notable gap in suitable evaluation regarding their applicability in the emerging domain of long-context streaming video understanding. Current benchmarks for video understanding typically emphasize isolated single-instance text inputs and fail to evaluate the capacity to sustain… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

    Comments: ICLR 2025 Accept (Spotlight)

  31. arXiv:2501.16237  [pdf, other

    cs.LG physics.ins-det

    Application of Structured State Space Models to High energy physics with locality-sensitive hashing

    Authors: Cheng Jiang, Sitian Qian

    Abstract: Modern high-energy physics (HEP) experiments are increasingly challenged by the vast size and complexity of their datasets, particularly regarding large-scale point cloud processing and long sequences. In this study, to address these challenges, we explore the application of structured state space models (SSMs), proposing one of the first trials to integrate local-sensitive hashing into either a h… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 6 figures, accepted by AISTATS 2025 as poster, camera ready versions to be updated

  32. arXiv:2501.04473  [pdf, other

    cs.CL

    When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

    Authors: Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Shenbin Qian

    Abstract: This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score (0-100) to the translated output. We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios and perform instruction fine-tun… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  33. arXiv:2412.13877  [pdf, other

    cs.RO cs.AI

    RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

    Authors: Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, Jingyang He, Yulin Luo , et al. (12 additional authors not shown)

    Abstract: In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, a… ▽ More

    Submitted 26 May, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: 21 pages, 17 figures, Robotics: Science and Systems 2025

  34. arXiv:2412.10892  [pdf, other

    cs.LG cs.AI

    Know Unreported Roadway Incidents in Real-time: Early Traffic Anomaly Detection

    Authors: Haocheng Duan, Hao Wu, Sean Qian

    Abstract: This research aims to know traffic anomalies as early as possible. A traffic anomaly refers to a generic incident on the road that influences traffic flow and calls for urgent traffic management measures. `Knowing'' the occurrence of a traffic anomaly is twofold: the ability to detect this anomaly before it is reported anywhere, or it may be such that an anomaly can be predicted before it actually… ▽ More

    Submitted 23 April, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

  35. arXiv:2412.00851  [pdf, other

    cs.CV

    DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

    Authors: Weihang Li, Weirong Chen, Shenhan Qian, Jiajie Chen, Daniel Cremers, Haoang Li

    Abstract: Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gauss… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  36. arXiv:2411.10478  [pdf, other

    cs.LG cs.AI

    Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey

    Authors: Yang Gu, Hengyu You, Jian Cao, Muran Yu, Haoran Fan, Shiyou Qian

    Abstract: Building effective machine learning (ML) workflows to address complex tasks is a primary focus of the Automatic ML (AutoML) community and a critical step toward achieving artificial general intelligence (AGI). Recently, the integration of Large Language Models (LLMs) into ML workflows has shown great potential for automating and enhancing various stages of the ML pipeline. This survey provides a c… ▽ More

    Submitted 25 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

  37. arXiv:2411.08561  [pdf, other

    cs.SE cs.AI

    LogLLM: Log-based Anomaly Detection Using Large Language Models

    Authors: Wei Guan, Jian Cao, Shiyou Qian, Jianqi Gao, Chun Ouyang

    Abstract: Software systems often record important runtime information in logs to help with troubleshooting. Log-based anomaly detection has become a key research area that aims to identify system issues through log data, ultimately enhancing the reliability of software systems. Traditional deep learning methods often struggle to capture the semantic information embedded in log data, which is typically organ… ▽ More

    Submitted 13 April, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

  38. arXiv:2411.04799  [pdf, other

    cs.CL cs.AI

    Kwai-STaR: Transform LLMs into State-Transition Reasoners

    Authors: Xingyu Lu, Yuhang Hu, Changyi Liu, Tianke Zhang, Zhenyu Yang, Zhixiang Ding, Shengsheng Qian, Meng Du, Ruiwen Kang, Kaiyu Tang, Fan Yang, Tingting Gao, Di Zhang, Hai-Tao Zheng, Bin Wen

    Abstract: Mathematical reasoning presents a significant challenge to the cognitive capabilities of LLMs. Various methods have been proposed to enhance the mathematical ability of LLMs. However, few recognize the value of state transition for LLM reasoning. In this work, we define mathematical problem-solving as a process of transiting from an initial unsolved state to the final resolved state, and propose K… ▽ More

    Submitted 12 November, 2024; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: 6 pages, 2 figures

  39. arXiv:2410.18362  [pdf, ps, other

    cs.SE cs.CL cs.CV

    WAFFLE: Finetuning Multi-Modal Model for Automated Front-End Development

    Authors: Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan

    Abstract: Web development involves turning UI designs into functional webpages, which can be difficult for both beginners and experienced developers due to the complexity of HTML's hierarchical structures and styles. While Large Language Models (LLMs) have shown promise in generating source code, two major challenges persist in UI-to-HTML code generation: (1) effectively representing HTML's hierarchical str… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  40. arXiv:2410.10319  [pdf, other

    cs.CV cs.MM

    Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation

    Authors: Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang

    Abstract: The projector plays a crucial role in multi-modal language models (MLLMs). The number of visual tokens it outputs affects the efficiency of the MLLM, while the quality of the visual tokens influences the visual understanding capabilities of the MLLM. Current explorations on the projector focus on reducing the number of visual tokens to improve efficiency, often overlooking the inherent spatial dis… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 10 pages, 3 figures

  41. arXiv:2410.06338  [pdf, other

    cs.CL

    Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve this, we employ an existing emotion-related dataset with human-annotated errors and calculate quality evaluation scores based on the Multi-dimensional Q… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  42. arXiv:2410.03278  [pdf, other

    cs.CL

    What do Large Language Models Need for Machine Translation Evaluation?

    Authors: Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia, Constantin Orăsan, Tharindu Ranasinghe, Frédéric Blain

    Abstract: Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, refere… ▽ More

    Submitted 9 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  43. arXiv:2410.03277  [pdf, other

    cs.CL

    A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not focus on these ubiquitous features of UGC. To address this issue, we utilize an existing emotion-related dataset that includes emotion labels and human-… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  44. arXiv:2409.18996  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models

    Authors: Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, Changsheng Xu

    Abstract: Cross-modal reasoning (CMR), the intricate process of synthesizing and drawing inferences across divergent sensory modalities, is increasingly recognized as a crucial capability in the progression toward more sophisticated and anthropomorphic artificial intelligence systems. Large Language Models (LLMs) represent a class of AI algorithms specifically engineered to parse, produce, and engage with h… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    ACM Class: A.1

  45. Crafting Synthetic Realities: Examining Visual Realism and Misinformation Potential of Photorealistic AI-Generated Images

    Authors: Qiyao Peng, Yingdan Lu, Yilang Peng, Sijia Qian, Xinyi Liu, Cuihua Shen

    Abstract: Advances in generative models have created Artificial Intelligence-Generated Images (AIGIs) nearly indistinguishable from real photographs. Leveraging a large corpus of 30,824 AIGIs collected from Instagram and Twitter, and combining quantitative content analysis with qualitative analysis, this study unpacks AI photorealism of AIGIs from four key dimensions, content, human, aesthetic, and producti… ▽ More

    Submitted 14 March, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

  46. arXiv:2409.16803  [pdf, other

    eess.AS cs.SD

    Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings

    Authors: Ruoyu Wang, Shutong Niu, Gaobin Yang, Jun Du, Shuangqing Qian, Tian Gao, Jia Pan

    Abstract: Although fully end-to-end speaker diarization systems have made significant progress in recent years, modular systems often achieve superior results in real-world scenarios due to their greater adaptability and robustness. Historically, modular speaker diarization methods have seldom discussed how to leverage spatial cues from multi-channel speech. This paper proposes a three-stage modular system… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 5 pages, Submitted to ICASSP 2025

  47. arXiv:2409.03282  [pdf, other

    cs.LG eess.SP

    Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions

    Authors: Zemian Ke, Haocheng Duan, Sean Qian

    Abstract: Non-recurrent conditions caused by incidents are different from recurrent conditions that follow periodic patterns. Existing traffic speed prediction studies are incident-agnostic and use one single model to learn all possible patterns from these drastically diverse conditions. This study proposes a novel Mixture of Experts (MoE) model to improve traffic speed prediction under two separate conditi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  48. arXiv:2409.02041  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

    Authors: Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao

    Abstract: This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge. The primary difficulty of this challenge is the dataset recorded across various conference rooms, which captures real-world complexities such as high overlap rates, background noises, a variable number of speakers, and natural conversation styles. To address these issues, we optimized the system in several a… ▽ More

    Submitted 24 October, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  49. arXiv:2408.13945  [pdf, other

    eess.IV cs.CV physics.med-ph

    Personalized Topology-Informed Localization of Standard 12-Lead ECG Electrode Placement from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

    Authors: Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Shuang Qian, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

    Abstract: Cardiac digital twins (CDTs) offer personalized in-silico cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso imaging an… ▽ More

    Submitted 25 February, 2025; v1 submitted 25 August, 2024; originally announced August 2024.

  50. arXiv:2407.12248  [pdf, other

    cs.DC

    Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

    Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: Journal of Supercomputing 2025