Skip to main content

Showing 1–50 of 126 results for author: Qian, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02345  [pdf, ps, other

    q-bio.BM cs.AI

    HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3

    Authors: Jie Gao, Jing Hu, Shanzhuo Zhang, Kunrui Zhu, Sheng Qian, Yueyang Huang, Xiaonan Zhang, Xiaomin Fang

    Abstract: Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction mo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  2. arXiv:2507.01961  [pdf, ps, other

    cs.RO cs.AI

    AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

    Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

    Abstract: Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumu… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: Project website: https://ac-dit.github.io/

  3. arXiv:2506.22499  [pdf, ps, other

    cs.CV cs.AI stat.AP

    Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data

    Authors: Jiachao Liu, Pablo Guarda, Koichiro Niinuma, Sean Qian

    Abstract: This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, leveraging high-resolution satellite imagery together with conventional traffic data from local sensors. Unlike sparse local detectors, satellite imagery offers consistent, city-wide road and traffic information of both parking and moving vehicles, over… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  4. arXiv:2506.19743  [pdf, ps, other

    cs.IR cs.CL

    NEAR$^2$: A Nested Embedding Approach to Efficient Product Retrieval and Ranking

    Authors: Shenbin Qian, Diptesh Kanojia, Samarth Agrawal, Hadeel Saadany, Swapnil Bhosale, Constantin Orasan, Zhe Wu

    Abstract: E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Emb… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This paper is accepted to the 2025 SIGIR Workshop on eCommerce

  5. arXiv:2506.15523  [pdf, ps, other

    cs.PF

    Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud Microservices

    Authors: Jiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across various nodes as encapsulated containers. Given their vast scale, even minor performance enhancements can lead to significant cost reductions. In this paper, we introduce Atys1, an efficient profiling… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2505.24163  [pdf, ps, other

    cs.CL cs.AI

    LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing

    Authors: Jiaqi Sun, Shiyou Qian, Zhangchi Han, Wei Li, Zelin Qian, Dingyu Yang, Jian Cao, Guangtao Xue

    Abstract: Knowledge Graphs (KGs) structure real-world entities and their relationships into triples, enhancing machine reasoning for various tasks. While domain-specific KGs offer substantial benefits, their manual construction is often inefficient and requires specialized knowledge. Recent approaches for knowledge graph construction (KGC) based on large language models (LLMs), such as schema-guided KGC and… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Submitting to EDBT 2026

  7. arXiv:2505.21019  [pdf, ps, other

    eess.IV cs.LG

    Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants

    Authors: Devran Ugurlu, Shuang Qian, Elliot Fairweather, Charlene Mauger, Bram Ruijsink, Laura Dal Toso, Yu Deng, Marina Strocchi, Reza Razavi, Alistair Young, Pablo Lamata, Steven Niederer, Martin Bishop

    Abstract: A cardiac digital twin is a virtual replica of a patient's heart for screening, diagnosis, prognosis, risk assessment, and treatment planning of cardiovascular diseases. This requires an anatomically accurate patient-specific 3D structural representation of the heart, suitable for electro-mechanical simulations or study of disease mechanisms. However, generation of cardiac digital twins at scale i… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.19874  [pdf, ps, other

    cs.CV cs.AI cs.MM

    StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

    Authors: Yi Wu, Lingting Zhu, Shengju Qian, Lei Liu, Wandi Qiao, Lequan Yu, Bin Li

    Abstract: In the current research landscape, multimodal autoregressive (AR) models have shown exceptional capabilities across various domains, including visual understanding and generation. However, complex tasks such as style-aligned text-to-image generation present significant challenges, particularly in data acquisition. In analogy to instruction-following tuning for image editing of AR models, style-ali… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  9. arXiv:2505.15074  [pdf, ps, other

    cs.CL cs.AI cs.LG

    DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

    Authors: Yuhang Zhou, Jing Zhu, Shengyi Qian, Zhuokai Zhao, Xiyao Wang, Xiaoyu Liu, Ming Li, Paiheng Xu, Wei Ai, Furong Huang

    Abstract: Large Language Models (LLMs) are increasingly aligned with human preferences through Reinforcement Learning from Human Feedback (RLHF). Among RLHF methods, Group Relative Policy Optimization (GRPO) has gained attention for its simplicity and strong performance, notably eliminating the need for a learned value function. However, GRPO implicitly assumes a balanced domain distribution and uniform sem… ▽ More

    Submitted 9 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 13 pages, 3 figures

  10. arXiv:2504.14221  [pdf, other

    cs.CV

    Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

    Authors: Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin-Bin Gao, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yulong Chen, Shuguang Qian, Mingmin Chi, Bo Peng, Lizhuang Ma

    Abstract: The increasing complexity of industrial anomaly detection (IAD) has positioned multimodal detection methods as a focal area of machine vision research. However, dedicated multimodal datasets specifically tailored for IAD remain limited. Pioneering datasets like MVTec 3D have laid essential groundwork in multimodal IAD by incorporating RGB+3D data, but still face challenges in bridging the gap with… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 13 pages. Dataset and code: https://realiad4ad.github.io/Real-IAD D3

  11. arXiv:2504.10878  [pdf, other

    cs.CV cs.AI cs.LG

    Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content

    Authors: Yilang Peng, Sijia Qian, Yingdan Lu, Cuihua Shen

    Abstract: In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to the diversity and richness of visual features. We introduce a Large Language Model (LLM)-informed feature discovery framework that leverages multimodal LLMs, s… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 26 pages

    ACM Class: I.4.9; J.4

  12. arXiv:2503.23746  [pdf, other

    cs.CV cs.CL cs.LG cs.MM cs.SI

    Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model

    Authors: Dizhan Xue, Jing Cui, Shengsheng Qian, Chuanrui Hu, Changsheng Xu

    Abstract: Short-video platforms have gained immense popularity, captivating the interest of millions, if not billions, of users globally. Recently, researchers have highlighted the significance of analyzing the propagation of short-videos, which typically involves discovering commercial values, public opinions, user behaviors, etc. This paper proposes a new Short-video Propagation Influence Rating (SPIR) ta… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  13. arXiv:2503.23512  [pdf, ps, other

    cs.CL

    SCORE: Story Coherence and Retrieval Enhancement for AI Narratives

    Authors: Qiang Yi, Yangfan He, Jianhui Wang, Xinyuan Song, Shiyao Qian, Xinhang Yuan, Li Sun, Yi Xin, Jingqun Tang, Keqin Li, Kuan Lu, Menghao Huo, Jiaqi Chen, Tianyu Shi

    Abstract: Large Language Models (LLMs) can generate creative and engaging narratives from user-specified input, but maintaining coherence and emotional depth throughout these AI-generated stories remains a challenge. In this work, we propose SCORE, a framework for Story Coherence and Retrieval Enhancement, designed to detect and resolve narrative inconsistencies. By tracking key item statuses and generating… ▽ More

    Submitted 12 June, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  14. arXiv:2503.20519  [pdf, other

    cs.CV

    MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

    Authors: Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee

    Abstract: Recent advances in auto-regressive transformers have revolutionized generative modeling across different domains, from language processing to visual generation, demonstrating remarkable capabilities. However, applying these advances to 3D generation presents three key challenges: the unordered nature of 3D data conflicts with sequential next-token prediction paradigm, conventional vector quantizat… ▽ More

    Submitted 20 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: CVPR 2025 Highlight: https://jinnan-chen.github.io/projects/MAR-3D/

  15. arXiv:2503.18461  [pdf, other

    cs.CV

    MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing

    Authors: Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu

    Abstract: Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appear… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 17 pages, 14 figures

  16. arXiv:2503.16158  [pdf, other

    cs.CL

    Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: Evaluating machine translation (MT) of user-generated content (UGC) involves unique challenges such as checking whether the nuance of emotions from the source are preserved in the target text. Recent studies have proposed emotion-related datasets, frameworks and models to automatically evaluate MT quality of Chinese UGC, without relying on reference translations. However, whether these models are… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted to the 10th Workshop on Noisy and User-generated Text at NAACL 2025

  17. arXiv:2502.10810  [pdf, other

    cs.CV

    SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

    Authors: Zhenyu Yang, Yuhang Hu, Zemin Du, Dizhan Xue, Shengsheng Qian, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu

    Abstract: Despite the significant advancements of Large Vision-Language Models (LVLMs) on established benchmarks, there remains a notable gap in suitable evaluation regarding their applicability in the emerging domain of long-context streaming video understanding. Current benchmarks for video understanding typically emphasize isolated single-instance text inputs and fail to evaluate the capacity to sustain… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

    Comments: ICLR 2025 Accept (Spotlight)

  18. arXiv:2501.16237  [pdf, other

    cs.LG physics.ins-det

    Application of Structured State Space Models to High energy physics with locality-sensitive hashing

    Authors: Cheng Jiang, Sitian Qian

    Abstract: Modern high-energy physics (HEP) experiments are increasingly challenged by the vast size and complexity of their datasets, particularly regarding large-scale point cloud processing and long sequences. In this study, to address these challenges, we explore the application of structured state space models (SSMs), proposing one of the first trials to integrate local-sensitive hashing into either a h… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 6 figures, accepted by AISTATS 2025 as poster, camera ready versions to be updated

  19. arXiv:2501.04473  [pdf, other

    cs.CL

    When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

    Authors: Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Shenbin Qian

    Abstract: This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score (0-100) to the translated output. We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios and perform instruction fine-tun… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  20. arXiv:2412.13877  [pdf, other

    cs.RO cs.AI

    RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

    Authors: Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, Jingyang He, Yulin Luo , et al. (12 additional authors not shown)

    Abstract: In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, a… ▽ More

    Submitted 26 May, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: 21 pages, 17 figures, Robotics: Science and Systems 2025

  21. arXiv:2412.10892  [pdf, other

    cs.LG cs.AI

    Know Unreported Roadway Incidents in Real-time: Early Traffic Anomaly Detection

    Authors: Haocheng Duan, Hao Wu, Sean Qian

    Abstract: This research aims to know traffic anomalies as early as possible. A traffic anomaly refers to a generic incident on the road that influences traffic flow and calls for urgent traffic management measures. `Knowing'' the occurrence of a traffic anomaly is twofold: the ability to detect this anomaly before it is reported anywhere, or it may be such that an anomaly can be predicted before it actually… ▽ More

    Submitted 23 April, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

  22. arXiv:2412.00851  [pdf, other

    cs.CV

    DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

    Authors: Weihang Li, Weirong Chen, Shenhan Qian, Jiajie Chen, Daniel Cremers, Haoang Li

    Abstract: Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gauss… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  23. arXiv:2411.10478  [pdf, other

    cs.LG cs.AI

    Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey

    Authors: Yang Gu, Hengyu You, Jian Cao, Muran Yu, Haoran Fan, Shiyou Qian

    Abstract: Building effective machine learning (ML) workflows to address complex tasks is a primary focus of the Automatic ML (AutoML) community and a critical step toward achieving artificial general intelligence (AGI). Recently, the integration of Large Language Models (LLMs) into ML workflows has shown great potential for automating and enhancing various stages of the ML pipeline. This survey provides a c… ▽ More

    Submitted 25 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

  24. arXiv:2411.08561  [pdf, other

    cs.SE cs.AI

    LogLLM: Log-based Anomaly Detection Using Large Language Models

    Authors: Wei Guan, Jian Cao, Shiyou Qian, Jianqi Gao, Chun Ouyang

    Abstract: Software systems often record important runtime information in logs to help with troubleshooting. Log-based anomaly detection has become a key research area that aims to identify system issues through log data, ultimately enhancing the reliability of software systems. Traditional deep learning methods often struggle to capture the semantic information embedded in log data, which is typically organ… ▽ More

    Submitted 13 April, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

  25. arXiv:2411.04799  [pdf, other

    cs.CL cs.AI

    Kwai-STaR: Transform LLMs into State-Transition Reasoners

    Authors: Xingyu Lu, Yuhang Hu, Changyi Liu, Tianke Zhang, Zhenyu Yang, Zhixiang Ding, Shengsheng Qian, Meng Du, Ruiwen Kang, Kaiyu Tang, Fan Yang, Tingting Gao, Di Zhang, Hai-Tao Zheng, Bin Wen

    Abstract: Mathematical reasoning presents a significant challenge to the cognitive capabilities of LLMs. Various methods have been proposed to enhance the mathematical ability of LLMs. However, few recognize the value of state transition for LLM reasoning. In this work, we define mathematical problem-solving as a process of transiting from an initial unsolved state to the final resolved state, and propose K… ▽ More

    Submitted 12 November, 2024; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: 6 pages, 2 figures

  26. arXiv:2410.18362  [pdf, ps, other

    cs.SE cs.CL cs.CV

    WAFFLE: Finetuning Multi-Modal Model for Automated Front-End Development

    Authors: Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan

    Abstract: Web development involves turning UI designs into functional webpages, which can be difficult for both beginners and experienced developers due to the complexity of HTML's hierarchical structures and styles. While Large Language Models (LLMs) have shown promise in generating source code, two major challenges persist in UI-to-HTML code generation: (1) effectively representing HTML's hierarchical str… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  27. arXiv:2410.10319  [pdf, other

    cs.CV cs.MM

    Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation

    Authors: Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang

    Abstract: The projector plays a crucial role in multi-modal language models (MLLMs). The number of visual tokens it outputs affects the efficiency of the MLLM, while the quality of the visual tokens influences the visual understanding capabilities of the MLLM. Current explorations on the projector focus on reducing the number of visual tokens to improve efficiency, often overlooking the inherent spatial dis… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 10 pages, 3 figures

  28. arXiv:2410.06338  [pdf, other

    cs.CL

    Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve this, we employ an existing emotion-related dataset with human-annotated errors and calculate quality evaluation scores based on the Multi-dimensional Q… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  29. arXiv:2410.03278  [pdf, other

    cs.CL

    What do Large Language Models Need for Machine Translation Evaluation?

    Authors: Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia, Constantin Orăsan, Tharindu Ranasinghe, Frédéric Blain

    Abstract: Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, refere… ▽ More

    Submitted 9 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  30. arXiv:2410.03277  [pdf, other

    cs.CL

    A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

    Authors: Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

    Abstract: Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not focus on these ubiquitous features of UGC. To address this issue, we utilize an existing emotion-related dataset that includes emotion labels and human-… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  31. arXiv:2409.18996  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models

    Authors: Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, Changsheng Xu

    Abstract: Cross-modal reasoning (CMR), the intricate process of synthesizing and drawing inferences across divergent sensory modalities, is increasingly recognized as a crucial capability in the progression toward more sophisticated and anthropomorphic artificial intelligence systems. Large Language Models (LLMs) represent a class of AI algorithms specifically engineered to parse, produce, and engage with h… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    ACM Class: A.1

  32. Crafting Synthetic Realities: Examining Visual Realism and Misinformation Potential of Photorealistic AI-Generated Images

    Authors: Qiyao Peng, Yingdan Lu, Yilang Peng, Sijia Qian, Xinyi Liu, Cuihua Shen

    Abstract: Advances in generative models have created Artificial Intelligence-Generated Images (AIGIs) nearly indistinguishable from real photographs. Leveraging a large corpus of 30,824 AIGIs collected from Instagram and Twitter, and combining quantitative content analysis with qualitative analysis, this study unpacks AI photorealism of AIGIs from four key dimensions, content, human, aesthetic, and producti… ▽ More

    Submitted 14 March, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

  33. arXiv:2409.16803  [pdf, other

    eess.AS cs.SD

    Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings

    Authors: Ruoyu Wang, Shutong Niu, Gaobin Yang, Jun Du, Shuangqing Qian, Tian Gao, Jia Pan

    Abstract: Although fully end-to-end speaker diarization systems have made significant progress in recent years, modular systems often achieve superior results in real-world scenarios due to their greater adaptability and robustness. Historically, modular speaker diarization methods have seldom discussed how to leverage spatial cues from multi-channel speech. This paper proposes a three-stage modular system… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 5 pages, Submitted to ICASSP 2025

  34. arXiv:2409.03282  [pdf, other

    cs.LG eess.SP

    Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions

    Authors: Zemian Ke, Haocheng Duan, Sean Qian

    Abstract: Non-recurrent conditions caused by incidents are different from recurrent conditions that follow periodic patterns. Existing traffic speed prediction studies are incident-agnostic and use one single model to learn all possible patterns from these drastically diverse conditions. This study proposes a novel Mixture of Experts (MoE) model to improve traffic speed prediction under two separate conditi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  35. arXiv:2409.02041  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

    Authors: Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao

    Abstract: This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge. The primary difficulty of this challenge is the dataset recorded across various conference rooms, which captures real-world complexities such as high overlap rates, background noises, a variable number of speakers, and natural conversation styles. To address these issues, we optimized the system in several a… ▽ More

    Submitted 24 October, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  36. arXiv:2408.13945  [pdf, other

    eess.IV cs.CV physics.med-ph

    Personalized Topology-Informed Localization of Standard 12-Lead ECG Electrode Placement from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

    Authors: Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Shuang Qian, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

    Abstract: Cardiac digital twins (CDTs) offer personalized in-silico cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso imaging an… ▽ More

    Submitted 25 February, 2025; v1 submitted 25 August, 2024; originally announced August 2024.

  37. arXiv:2407.12248  [pdf, other

    cs.DC

    Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

    Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: Journal of Supercomputing 2025

  38. arXiv:2407.07364  [pdf, other

    cs.LG cs.AI eess.SY

    Real-time system optimal traffic routing under uncertainties -- Can physics models boost reinforcement learning?

    Authors: Zemian Ke, Qiling Zou, Jiachao Liu, Sean Qian

    Abstract: System optimal traffic routing can mitigate congestion by assigning routes for a portion of vehicles so that the total travel time of all vehicles in the transportation system can be reduced. However, achieving real-time optimal routing poses challenges due to uncertain demands and unknown system dynamics, particularly in expansive transportation networks. While physics model-based methods are sen… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  39. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 31 October, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to NeurIPS 2024 | Project page: https://multi-object-hallucination.github.io/

  40. arXiv:2406.18158  [pdf, other

    cs.RO cs.CV

    3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

    Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

    Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D Multi-View Pretraining using masked autoencoders. We leverage R… ▽ More

    Submitted 23 March, 2025; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: CVPR 2025

  41. arXiv:2406.17777  [pdf, other

    cs.CV

    Text-Animator: Controllable Visual Text Video Generation

    Authors: Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian

    Abstract: Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress achieved in Text-to-Video~(T2V) generation, current methods still cannot effectively visualize texts in videos directly, as they mainly focus on summar… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project Page: https://laulampaul.github.io/text-animator.html

  42. arXiv:2406.16321  [pdf, other

    cs.LG cs.AI

    Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

    Authors: Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

    Abstract: Graph machine learning has made significant strides in recent years, yet the integration of visual information with graph structure and its potential for improving performance in downstream tasks remains an underexplored area. To address this critical gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), a pioneering benchmark that incorporates both visual and textual information into graph… ▽ More

    Submitted 30 March, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: CVPR 2025

  43. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  44. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  45. arXiv:2406.05132  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

    Authors: Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

    Abstract: The integration of language and 3D perception is crucial for embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is a lack of large-scale datasets with dense gr… ▽ More

    Submitted 20 March, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: CVPR 2025. Project website: https://3d-grand.github.io

  46. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  47. arXiv:2406.03007  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

    Authors: Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian

    Abstract: With the prosperity of large language models (LLMs), powerful LLM-based intelligent agents have been developed to provide customized services with a set of user-defined tools. State-of-the-art methods for constructing LLM agents adopt trained LLMs and further fine-tune them on data for the agent task. However, we show that such methods are vulnerable to our proposed backdoor attacks named BadAgent… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  48. arXiv:2404.19026  [pdf, other

    cs.CV

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

    Authors: Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

    Abstract: Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gau… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project page: https://conallwang.github.io/MeGA_Pages/

  49. arXiv:2404.18219  [pdf, other

    physics.ins-det cs.LG hep-ex hep-ph physics.data-an

    BUFF: Boosted Decision Tree based Ultra-Fast Flow matching

    Authors: Cheng Jiang, Sitian Qian, Huilin Qu

    Abstract: Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning mo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 9 pages, 10 figures, 1 additional figure in appendix

  50. arXiv:2404.15275  [pdf, other

    cs.CV

    ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

    Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang

    Abstract: Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textb… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Project Page: https://id-animator.github.io/