Skip to main content

Showing 1–50 of 1,730 results for author: Ma, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05317  [pdf, ps, other

    eess.IV cs.AI cs.CV

    PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CT

    Authors: Yi Liu, Yiyang Wen, Zekun Zhou, Junqi Ma, Linghang Wang, Yucheng Yao, Liu Shi, Qiegen Liu

    Abstract: Generative diffusion models have received increasing attention in medical imaging, particularly in limited-angle computed tomography (LACT). Standard diffusion models achieve high-quality image reconstruction but require a large number of sampling steps during inference, resulting in substantial computational overhead. Although skip-sampling strategies have been proposed to improve efficiency, the… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  2. arXiv:2507.04891  [pdf, ps, other

    eess.IV cs.CV

    MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction

    Authors: Mingxin Liu, Chengfei Cai, Jun Li, Pengbo Xu, Jinze Li, Jiquan Ma, Jun Xu

    Abstract: Cancer survival prediction requires integrating pathological Whole Slide Images (WSIs) and genomic profiles, a challenging task due to the inherent heterogeneity and the complexity of modeling both inter- and intra-modality interactions. Current methods often employ straightforward fusion strategies for multimodal feature integration, failing to comprehensively capture modality-specific and modali… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 11 pages, 2 figures, Accepted by MICCAI 2025

  3. arXiv:2507.04225  [pdf, ps, other

    cs.LG cs.AI

    Zero-Shot Cyclic Peptide Design with Composable Geometric Conditions

    Authors: Dapeng Jiang, Xiangzhe Kong, Jiaqi Han, Mingyu Li, Rui Jiao, Wenbing Huang, Stefano Ermon, Jianzhu Ma, Yang Liu

    Abstract: Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide gene… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  4. arXiv:2507.03331  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

    Authors: Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama

    Abstract: To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, oft… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  5. arXiv:2507.01702  [pdf, ps, other

    cs.CL cs.AI

    AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

    Authors: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma

    Abstract: The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as on… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ACL 2025

  6. arXiv:2507.00356  [pdf

    cs.CV cs.AI

    CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation

    Authors: Zhiwei Yi, Xin Cheng, Jingyu Ma, Ruifei Zhu, Junwei Tian, Yuanxiu Zhou, Xinge Zhao, Hongzhe Li

    Abstract: Deep learning methods have significantly advanced the development of intelligent rinterpretation in remote sensing (RS), with foundational model research based on large-scale pre-training paradigms rapidly reshaping various domains of Earth Observation (EO). However, compared to the open accessibility and high spatiotemporal coverage of medium-resolution data, the limited acquisition channels for… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: A Remote Sensing Fundation Model for Very High Resolution Images

  7. arXiv:2506.23863  [pdf, ps, other

    cs.CV

    Puzzles: Unbounded Video-Depth Augmentation for Scalable End-to-End 3D Reconstruction

    Authors: Jiahao Ma, Lei Wang, Miaomiao liu, David Ahmedt-Aristizabal, Chuong Nguyen

    Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision. Recent methods, such as DUST3R and its successors, directly regress pointmaps from image pairs without relying on known scene geometry or camera parameters. However, the performance of these models is constrained by the diversity and scale of available training data. In this work, we introduce Puzzles, a data augmentation st… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Feed-forward 3D reconstruction, Data Augmentation

  8. arXiv:2506.22772  [pdf, ps, other

    cs.AR

    Approximate Logic Synthesis Using BLASYS

    Authors: Jingxiao Ma, Soheil Hashemi, Sherief Reda

    Abstract: Approximate computing is an emerging paradigm where design accuracy can be traded for improvements in design metrics such as design area and power consumption. In this work, we overview our open-source tool, BLASYS, for synthesis of approximate circuits using Boolean Matrix Factorization (BMF). In our methodology the truth table of a given circuit is approximated using BMF to a controllable approx… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Published in the Workshop on Open-Source EDA Technology (WOSET), 2019. (Workshop link: https://woset-workshop.github.io/WOSET2019.html)

    ACM Class: B.6.1; B.2.4; B.8.2

  9. arXiv:2506.22771  [pdf, ps, other

    cs.LG cs.AI cs.NE

    FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision

    Authors: Jingxiao Ma, Priyadarshini Panda, Sherief Reda

    Abstract: Backpropagation has been the cornerstone of neural network training for decades, yet its inefficiencies in time and energy consumption limit its suitability for resource-constrained edge devices. While low-precision neural network quantization has been extensively researched to speed up model inference, its application in training has been less explored. Recently, the Forward-Forward (FF) algorith… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: To be published in the 62nd Design Automation Conference (DAC), 2025

    ACM Class: I.2.0; I.2.6

  10. arXiv:2506.22710  [pdf, ps, other

    cs.CV eess.IV

    LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning

    Authors: Jiang Yuan, JI Ma, Bo Wang, Guanzhou Ke, Weiming Hu

    Abstract: Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and inst… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Journal ref: International Conference on Computer Vision (ICCV) 2025

  11. arXiv:2506.22554  [pdf, ps, other

    cs.CV cs.AI

    Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

    Authors: Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero, Morteza Behrooz, Julia Buffalini, Fabio Maria Carlucci, Joy Chen, Junming Chen, Zhang Chen, Shiyang Cheng, Praveen Chowdary, Joe Chuang, Antony D'Avirro, Jon Daly, Ning Dong, Mark Duppenthaler, Cynthia Gao, Jeff Girard, Martin Gleize, Sahir Gomez, Hongyu Gong, Srivathsan Govindarajan, Brandon Han, Sen He, Denise Hernandez , et al. (59 additional authors not shown)

    Abstract: Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  12. arXiv:2506.22012  [pdf, ps, other

    eess.IV cs.CV

    Noise-Inspired Diffusion Model for Generalizable Low-Dose CT Reconstruction

    Authors: Qi Gao, Zhihao Chen, Dong Zeng, Junping Zhang, Jianhua Ma, Hongming Shan

    Abstract: The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in Medical Image Analysis, 2025

  13. arXiv:2506.21784  [pdf, ps, other

    cs.AI

    MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models

    Authors: Yifan Liu, Xishun Liao, Haoxuan Ma, Jonathan Liu, Rohan Jadhav, Jiaqi Ma

    Abstract: Understanding and modeling human mobility patterns is crucial for effective transportation planning and urban development. Despite significant advances in mobility research, there remains a critical gap in simulation platforms that allow for algorithm development, policy implementation, and comprehensive evaluation at scale. Traditional activity-based models require extensive data collection and m… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  14. arXiv:2506.21535  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Exploring the Design Space of 3D MLLMs for CT Report Generation

    Authors: Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

    Abstract: Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods tha… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  15. arXiv:2506.20167  [pdf, ps, other

    cs.CL cs.AI

    SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs

    Authors: Fengze Li, Yue Wang, Yangle Liu, Ming Huang, Dou Hong, Jieming Ma

    Abstract: Multivariate time series forecasting requires models to simultaneously capture variable-wise structural dependencies and generalize across diverse tasks. While structural encoders are effective in modeling feature interactions, they lack the capacity to support semantic-level reasoning or task adaptation. Conversely, large language models (LLMs) possess strong generalization capabilities but remai… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  16. arXiv:2506.19681  [pdf, ps, other

    cs.CV

    Genome-Anchored Foundation Model Embeddings Improve Molecular Prediction from Histology Images

    Authors: Cheng Jin, Fengtao Zhou, Yunfang Yu, Jiabo Ma, Yihui Wang, Yingxue Xu, Huajun Zhou, Hao Jiang, Luyang Luo, Luhui Mao, Zifan He, Xiuming Zhang, Jing Zhang, Ronald Chan, Herui Yao, Hao Chen

    Abstract: Precision oncology requires accurate molecular insights, yet obtaining these directly from genomics is costly and time-consuming for broad clinical use. Predicting complex molecular features and patient prognosis directly from routine whole-slide images (WSI) remains a major challenge for current deep learning methods. Here we introduce PathLUPI, which uses transcriptomic privileged information du… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Under Review

  17. arXiv:2506.19258  [pdf, ps, other

    cs.CL cs.LG

    Personality Prediction from Life Stories using Language Models

    Authors: Rasiq Hussain, Jerry Ma, Rithik Khandelwal, Joshua Oltmanns, Mehak Gupta

    Abstract: Natural Language Processing (NLP) offers new avenues for personality assessment by leveraging rich, open-ended text, moving beyond traditional questionnaires. In this study, we address the challenge of modeling long narrative interview where each exceeds 2000 tokens so as to predict Five-Factor Model (FFM) personality traits. We propose a two-step approach: first, we extract contextual embeddings… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 13 pages, 5 figures

  18. arXiv:2506.18407  [pdf, ps, other

    cs.GR cs.CV

    What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models

    Authors: Yiyao Wang, Bo Pan, Ke Wang, Han Liu, Jinyuan Mao, Yuxin Liu, Minfeng Zhu, Bo Zhang, Weifeng Chen, Xiuqi Huang, Wei Chen

    Abstract: Direct volume rendering (DVR) is a fundamental technique for visualizing volumetric data, with transfer functions (TFs) playing a crucial role in extracting meaningful structures. However, designing effective TFs remains unintuitive due to the semantic gap between user intent and TF parameter space. Researchers have developed numerous TF optimization methods to bridge this gap. However, existing m… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  19. arXiv:2506.18290  [pdf, ps, other

    cs.LG

    Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction

    Authors: Han Zhang, Jinghong Mao, Shangwen Zhu, Zhantao Yang, Lianghua Huang, Yu Liu, Deli Zhao, Ruili Feng, Fan Cheng

    Abstract: Diffusion reconstruction plays a critical role in various applications such as image editing, restoration, and style transfer. In theory, the reconstruction should be simple - it just inverts and regenerates images by numerically solving the Probability Flow-Ordinary Differential Equation (PF-ODE). Yet in practice, noticeable reconstruction errors have been observed, which cannot be well explained… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  20. arXiv:2506.18245  [pdf, ps, other

    cs.CR cs.AI cs.SE

    Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection

    Authors: Lei Yu, Zhirong Huang, Hang Yuan, Shiqi Cheng, Li Yang, Fengjun Zhang, Chenjie Shen, Jiajia Ma, Jingyuan Zhang, Junyi Lu, Chun Zuo

    Abstract: Smart contract vulnerability detection remains a major challenge in blockchain security. Existing vulnerability detection methods face two main issues: (1) Existing datasets lack comprehensive coverage and high-quality explanations for preference learning. (2) Large language models (LLMs) often struggle with accurately interpreting specific concepts in smart contract security. Empirical analysis s… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted to ISSTA 2025

  21. arXiv:2506.16893  [pdf, ps, other

    cs.IR

    Multi-Objective Recommendation in the Era of Generative AI: A Survey of Recent Progress and Future Prospects

    Authors: Zihan Hong, Yushi Wu, Zhiting Zhao, Shanshan Feng, Jianghong Ma, Jiao Liu, Tianjun Wei

    Abstract: With the recent progress in generative artificial intelligence (Generative AI), particularly in the development of large language models, recommendation systems are evolving to become more versatile. Unlike traditional techniques, generative AI not only learns patterns and representations from complex data but also enables content generation, data synthesis, and personalized experiences. This gene… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 21 pages

  22. arXiv:2506.16400  [pdf, ps, other

    cs.CR

    Physical-Layer Signal Injection Attacks on EV Charging Ports: Bypassing Authentication via Electrical-Level Exploits

    Authors: Hetian Shi, Yi He, Shangru Song, Jianwei Zhuge, Jian Mao

    Abstract: The proliferation of electric vehicles in recent years has significantly expanded the charging infrastructure while introducing new security risks to both vehicles and chargers. In this paper, we investigate the security of major charging protocols such as SAE J1772, CCS, IEC 61851, GB/T 20234, and NACS, uncovering new physical signal spoofing attacks in their authentication mechanisms. By inserti… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  23. arXiv:2506.15868  [pdf, ps, other

    cs.RO

    CooperRisk: A Driving Risk Quantification Pipeline with Multi-Agent Cooperative Perception and Prediction

    Authors: Mingyue Lei, Zewei Zhou, Hongchen Li, Jia Hu, Jiaqi Ma

    Abstract: Risk quantification is a critical component of safe autonomous driving, however, constrained by the limited perception range and occlusion of single-vehicle systems in complex and dense scenarios. Vehicle-to-everything (V2X) paradigm has been a promising solution to sharing complementary perception information, nevertheless, how to ensure the risk interpretability while understanding multi-agent i… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: IROS2025

  24. Compilation, Optimization, Error Mitigation, and Machine Learning in Quantum Algorithms

    Authors: Shuangbao Paul Wang, Jianzhou Mao, Eric Sakk

    Abstract: This paper discusses the compilation, optimization, and error mitigation of quantum algorithms, essential steps to execute real-world quantum algorithms. Quantum algorithms running on a hybrid platform with QPU and CPU/GPU take advantage of existing high-performance computing power with quantum-enabled exponential speedups. The proposed approximate quantum Fourier transform (AQFT) for quantum algo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Journal ref: Computer Science & Information Technology (CS & IT) ISSN : 2231 - 5403 Volume 15, Number 05, March 2025

  25. Refined Causal Graph Structure Learning via Curvature for Brain Disease Classification

    Authors: Falih Gozi Febrinanto, Adonia Simango, Chengpei Xu, Jingjing Zhou, Jiangang Ma, Sonika Tyagi, Feng Xia

    Abstract: Graph neural networks (GNNs) have been developed to model the relationship between regions of interest (ROIs) in brains and have shown significant improvement in detecting brain diseases. However, most of these frameworks do not consider the intrinsic relationship of causality factor between brain ROIs, which is arguably more essential to observe cause and effect interaction between signals rather… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  26. arXiv:2506.14975  [pdf, ps, other

    cs.RO

    Time-Optimized Safe Navigation in Unstructured Environments through Learning Based Depth Completion

    Authors: Jeffrey Mao, Raghuram Cauligi Srinivas, Steven Nogar, Giuseppe Loianno

    Abstract: Quadrotors hold significant promise for several applications such as agriculture, search and rescue, and infrastructure inspection. Achieving autonomous operation requires systems to navigate safely through complex and unfamiliar environments. This level of autonomy is particularly challenging due to the complexity of such environments and the need for real-time decision making especially for plat… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  27. arXiv:2506.14769  [pdf, ps, other

    cs.CV cs.RO

    CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

    Authors: Jiahua Ma, Yiran Qin, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang

    Abstract: Diffusion Policy (DP) enables robots to learn complex behaviors by imitating expert demonstrations through action diffusion. However, in practical applications, hardware limitations often degrade data quality, while real-time constraints restrict model inference to instantaneous state and scene observations. These limitations seriously reduce the efficacy of learning from expert demonstrations, re… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  28. arXiv:2506.14742  [pdf, ps, other

    cs.CV

    SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting

    Authors: Ziqiao Peng, Wentao Hu, Junyuan Ma, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Hui Tian, Jun He, Hongyan Liu, Zhaoxin Fan

    Abstract: Achieving high synchronization in the synthesis of realistic, speech-driven talking head videos presents a significant challenge. A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses. The absence of these synchronizations is a fundamental flaw, leading to unrealistic results. To address the critical issue of synchronizati… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  29. arXiv:2506.14589  [pdf, ps, other

    cs.RO

    NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving

    Authors: Ren Xin, Hongji Liu, Xiaodong Mei, Wenru Liu, Maosheng Ye, Zhili Chen, Jun Ma

    Abstract: Integrating General Models (GMs) such as Large Language Models (LLMs), with Specialized Models (SMs) in autonomous driving tasks presents a promising approach to mitigating challenges in data diversity and model capacity of existing specialized driving models. However, this integration leads to problems of asynchronous systems, which arise from the distinct characteristics inherent in GMs and SMs.… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  30. arXiv:2506.13757  [pdf, ps, other

    cs.CV

    AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

    Authors: Zewei Zhou, Tianhui Cai, Seth Z. Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, Jiaqi Ma

    Abstract: Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities. However, current VLA models often struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning. In this paper, we propose AutoVLA, a novel VLA model that unifies reasoning and actio… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Website link:https://autovla.github.io/

  31. arXiv:2506.13756  [pdf, ps, other

    cs.GR cs.CV

    UltraZoom: Generating Gigapixel Images from Regular Photos

    Authors: Jingwei Ma, Vivek Jayaram, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz

    Abstract: We present UltraZoom, a system for generating gigapixel-resolution images of objects from casually captured inputs, such as handheld phone photos. Given a full-shot image (global, low-detail) and one or more close-ups (local, high-detail), UltraZoom upscales the full image to match the fine detail and scale of the close-up examples. To achieve this, we construct a per-instance paired dataset from… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Project page: https://ultra-zoom.github.io/

  32. arXiv:2506.13651  [pdf, ps, other

    cs.LG

    xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

    Authors: Kaiyuan Chen, Yixin Ren, Yang Liu, Xiaobo Hu, Haotong Tian, Tianbao Xie, Fangfu Liu, Haoye Zhang, Hongzhang Liu, Yuan Gong, Chen Sun, Han Hou, Hui Yang, James Pan, Jianan Lou, Jiayi Mao, Jizheng Liu, Jinpeng Li, Kangyi Liu, Kenkun Liu, Rui Wang, Run Li, Tong Niu, Wenlong Zhang, Wenqi Yan , et al. (8 additional authors not shown)

    Abstract: We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Project page: https://xbench.org

  33. arXiv:2506.13470  [pdf, ps, other

    cs.CL

    Abstract, Align, Predict: Zero-Shot Stance Detection via Cognitive Inductive Reasoning

    Authors: Jun Ma, Fuqiang Niu, Dong Li, Jinzhou Cao, Genan Dai, Bowen Zhang

    Abstract: Zero-shot stance detection (ZSSD) aims to identify the stance of text toward previously unseen targets, a setting where conventional supervised models often fail due to reliance on labeled data and shallow lexical cues. Inspired by human cognitive reasoning, we propose the Cognitive Inductive Reasoning Framework (CIRF), which abstracts transferable reasoning schemas from unlabeled text and encodes… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    MSC Class: I.2.7; I.2.6

  34. arXiv:2506.09755  [pdf, ps, other

    cs.CE cs.AI

    Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era

    Authors: Shuo Jiang, Min Xie, Frank Youhua Chen, Jian Ma, Jianxi Luo

    Abstract: Research and practice in Intelligent Design (ID) have significantly enhanced engineering innovation, efficiency, quality, and productivity over recent decades, fundamentally reshaping how engineering designers think, behave, and interact with design processes. The recent emergence of Foundation Models (FMs), particularly Large Language Models (LLMs), has demonstrated general knowledge-based reason… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    ACM Class: I.2.7; I.2.1

  35. arXiv:2506.09110  [pdf, ps, other

    cs.LG

    CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

    Authors: Jingying Ma, Feng Wu, Qika Lin, Yucheng Xing, Chenyu Liu, Ziyu Jia, Mengling Feng

    Abstract: Electroencephalography (EEG) provides real-time insights into brain activity and is widely used in neuroscience. However, variations in channel configurations, sequence lengths, and task objectives limit the transferability of traditional task-specific models. Although recent EEG foundation models (EFMs) aim to learn generalizable representations, they struggle with limited heterogeneous represent… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  36. arXiv:2506.08708  [pdf, ps, other

    cs.RO cs.AI cs.CV

    PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

    Authors: Liang Ma, Jiajun Wen, Min Lin, Rongtao Xu, Xiwen Liang, Bingqian Lin, Jun Ma, Yongxin Wang, Ziming Wei, Haokun Lin, Mingfei Han, Meng Cao, Bokui Chen, Ivan Laptev, Xiaodan Liang

    Abstract: While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  37. arXiv:2506.08626  [pdf, ps, other

    cs.IR

    Leveraging LLMs to Evaluate Usefulness of Document

    Authors: Xingzhu Wang, Erhan Zhang, Yiqun Chen, Jinghan Xuan, Yucheng Hou, Yitong Xu, Ying Nie, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao

    Abstract: The conventional Cranfield paradigm struggles to effectively capture user satisfaction due to its weak correlation between relevance and satisfaction, alongside the high costs of relevance annotation in building test collections. To tackle these issues, our research explores the potential of leveraging large language models (LLMs) to generate multilevel usefulness labels for evaluation. We introdu… ▽ More

    Submitted 10 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  38. arXiv:2506.08399  [pdf, ps, other

    cs.AI cs.LG

    SafeCoT: Improving VLM Safety with Minimal Reasoning

    Authors: Jiachen Ma, Zhanhui Zhou, Chao Yang, Chaochao Lu

    Abstract: Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling,… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  39. arXiv:2506.08279  [pdf

    cs.CV cs.AI cs.LG

    Seeing Voices: Generating A-Roll Video from Audio with Mirage

    Authors: Aditi Sundararaman, Amogh Adishesha, Andrew Jaegle, Dan Bigioi, Hyoung-Kyu Song, Jon Kyl, Justin Mao, Kevin Lan, Mojtaba Komeili, ShahRukh Athar, Sheila Babayan, Stanislau Beliasau, William Buchwalter

    Abstract: From professional filmmaking to user-generated content, creators and consumers have long recognized that the power of video depends on the harmonious integration of what we hear (the video's audio track) with what we see (the video's image sequence). Current approaches to video generation either ignore sound to focus on general-purpose but silent image sequence generation or address both visual an… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Technical report website: mirage.app/research/seeing-voices, product website: mirage.app

  40. arXiv:2506.07918  [pdf, ps, other

    cs.LG stat.ML

    CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

    Authors: Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, Rahul G. Krishnan

    Abstract: Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, i… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  41. arXiv:2506.07031  [pdf, ps, other

    cs.CR cs.AI cs.CL

    HauntAttack: When Attack Follows Reasoning as a Shadow

    Authors: Jingyuan Ma, Rui Li, Zheng Li, Junfeng Liu, Lei Sha, Zhifang Sui

    Abstract: Emerging Large Reasoning Models (LRMs) consistently excel in mathematical and reasoning tasks, showcasing exceptional capabilities. However, the enhancement of reasoning abilities and the exposure of their internal reasoning processes introduce new safety vulnerabilities. One intriguing concern is: when reasoning is strongly entangled with harmfulness, what safety-reasoning trade-off do LRMs exhib… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  42. arXiv:2506.05981  [pdf, ps, other

    cs.AI

    CrimeMind: Simulating Urban Crime with Multi-Modal LLM Agents

    Authors: Qingbin Zeng, Ruotong Zhao, Jinzhu Mao, Haoyang Li, Fengli Xu, Yong Li

    Abstract: Modeling urban crime is an important yet challenging task that requires understanding the subtle visual, social, and cultural cues embedded in urban environments. Previous work has mainly focused on rule-based agent-based modeling (ABM) and deep learning methods. ABMs offer interpretability of internal mechanisms but exhibit limited predictive accuracy. In contrast, deep learning methods are often… ▽ More

    Submitted 9 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

    Comments: Typos corrected

  43. arXiv:2506.04619  [pdf, ps, other

    cs.CV

    Deep Learning Reforms Image Matching: A Survey and Outlook

    Authors: Shihua Zhang, Zizhuo Li, Kaining Zhang, Yifan Lu, Yuxin Deng, Linfeng Tang, Xingyu Jiang, Jiayi Ma

    Abstract: Image matching, which establishes correspondences between two-view images to recover 3D structure and camera geometry, serves as a cornerstone in computer vision and underpins a wide range of applications, including visual localization, 3D reconstruction, and simultaneous localization and mapping (SLAM). Traditional pipelines composed of ``detector-descriptor, feature matcher, outlier filter, and… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  44. arXiv:2506.04467  [pdf

    physics.med-ph cs.AI

    Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy

    Authors: Yuzhen Ding, Jason Holmes, Hongying Feng, Martin Bues, Lisa A. McGee, Jean-Claude M. Rwigema, Nathan Y. Yu, Terence S. Sio, Sameer R. Keole, William W. Wong, Steven E. Schild, Jonathan B. Ashman, Sujay A. Vora, Daniel J. Ma, Samir H. Patel, Wei Liu

    Abstract: Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  45. arXiv:2506.03819  [pdf, ps, other

    stat.ML cs.LG

    Spatially Resolved Meteorological and Ancillary Data in Central Europe for Rainfall Streamflow Modeling

    Authors: Marc Aurel Vischer, Noelia Otero, Jackie Ma

    Abstract: We present a dataset for rainfall streamflow modeling that is fully spatially resolved with the aim of taking neural network-driven hydrological modeling beyond lumped catchments. To this end, we compiled data covering five river basins in central Europe: upper Danube, Elbe, Oder, Rhine, and Weser. The dataset contains meteorological forcings, as well as ancillary information on soil, rock, land c… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 6 pages, 1 figure

    ACM Class: I.2.1; I.6.5; J.2

  46. arXiv:2506.03662  [pdf, ps, other

    cs.CV cs.RO

    Zero-Shot Temporal Interaction Localization for Egocentric Videos

    Authors: Erhang Zhang, Junyi Ma, Yin-Dong Zheng, Yixuan Zhou, Hesheng Wang

    Abstract: Locating human-object interaction (HOI) actions within video serves as the foundation for multiple downstream tasks, such as human behavior analysis and human-robot skill transfer. Current temporal action localization methods typically rely on annotated action and object categories of interactions for optimization, which leads to domain bias and low deployment efficiency. Although some recent work… ▽ More

    Submitted 16 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  47. arXiv:2506.03309  [pdf, ps, other

    cs.GT

    Position Auctions in AI-Generated Content

    Authors: Santiago Balseiro, Kshipra Bhawalkar, Yuan Deng, Zhe Feng, Jieming Mao, Aranyak Mehta, Vahab Mirrokni, Renato Paes Leme, Di Wang, Song Zuo

    Abstract: We consider an extension to the classic position auctions in which sponsored creatives can be added within AI generated content rather than shown in predefined slots. New challenges arise from the natural requirement that sponsored creatives should smoothly fit into the context. With the help of advanced LLM technologies, it becomes viable to accurately estimate the benefits of adding each individ… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  48. arXiv:2506.02242  [pdf, ps, other

    cs.LG cs.CY

    From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models

    Authors: Yihong Tang, Ao Qu, Xujing Yu, Weipeng Deng, Jun Ma, Jinhua Zhao, Lijun Sun

    Abstract: Urban and transportation research has long sought to uncover statistically meaningful relationships between key variables and societal outcomes such as road safety, to generate actionable insights that guide the planning, development, and renewal of urban and transportation systems. However, traditional workflows face several key challenges: (1) reliance on human experts to propose hypotheses, whi… ▽ More

    Submitted 17 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  49. arXiv:2506.02037  [pdf, other

    cs.CL cs.AI

    FinS-Pilot: A Benchmark for Online Financial System

    Authors: Feng Wang, Yiding Sun, Jiaxin Mao, Wei Xue, Danqing Xu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various professional domains, with their performance typically evaluated through standardized benchmarks. However, the development of financial RAG benchmarks has been constrained by data confidentiality issues and the lack of dynamic data integration. To address this issue, we introduces FinS-Pilot, a novel benchmark fo… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  50. arXiv:2506.01881  [pdf, ps, other

    cs.AI cs.CL

    WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue

    Authors: Yaoyao Qian, Jindan Huang, Yuanli Wang, Simon Yu, Kyrie Zhixuan Zhou, Jiayuan Mao, Mingfu Liang, Hanhan Zhou

    Abstract: Task-oriented dialogue systems often face difficulties when user utterances seem semantically complete but lack necessary structural information for appropriate system action. This arises because users frequently do not fully understand their own needs, while systems require precise intent definitions. Current LLM-based agents cannot effectively distinguish between linguistically complete and cont… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 43 pages, 31 figures. Project website: https://nanostorm.netlify.app/