Skip to main content

Showing 1–50 of 306 results for author: Qian, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01275  [pdf, ps, other

    cs.CV

    Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing

    Authors: Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang

    Abstract: Unpaired image dehazing has attracted increasing attention due to its flexible data requirements during model training. Dominant methods based on contrastive learning not only introduce haze-unrelated content information, but also ignore haze-specific properties in the frequency domain (\ie,~haze-related degradation is mainly manifested in the amplitude spectrum). To address these issues, we propo… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  2. arXiv:2507.01131  [pdf, ps, other

    cs.LG physics.comp-ph

    Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

    Authors: Yuchao Lin, Cong Fu, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: $\rm{SO}(3)… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2507.00407  [pdf, ps, other

    physics.chem-ph cs.AI q-bio.QM

    Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

    Authors: Cong Fu, Yuchao Lin, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  4. arXiv:2506.22890  [pdf, ps, other

    cs.CV cs.CR

    CP-Guard: A Unified, Probability-Agnostic, and Adaptive Framework for Malicious Agent Detection and Defense in Multi-Agent Embodied Perception Systems

    Authors: Senkang Hu, Yihang Tao, Guowen Xu, Xinyuan Qian, Yiqin Deng, Xianhao Chen, Sam Tak Wu Kwong, Yuguang Fang

    Abstract: Collaborative Perception (CP) has been shown to be a promising technique for multi-agent autonomous driving and multi-agent robotic systems, where multiple agents share their perception information to enhance the overall perception performance and expand the perception range. However, in CP, an ego agent needs to receive messages from its collaborators, which makes it vulnerable to attacks from ma… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  5. arXiv:2506.22645  [pdf, ps, other

    cs.LG stat.ML

    Cost-effective Reduced-Order Modeling via Bayesian Active Learning

    Authors: Amir Hossein Rahmati, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian

    Abstract: Machine Learning surrogates have been developed to accelerate solving systems dynamics of complex processes in different science and engineering applications. To faithfully capture governing systems dynamics, these methods rely on large training datasets, hence restricting their applicability in real-world problems. In this work, we propose BayPOD-AL, an active learning framework based on an uncer… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  6. arXiv:2506.14235  [pdf, ps, other

    cs.CL

    A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

    Authors: Yimin Deng, Yuxia Wu, Yejing Wang, Guoshuai Zhao, Li Zhu, Qidong Liu, Derong Xu, Zichuan Fu, Xian Wu, Yefeng Zheng, Xiangyu Zhao, Xueming Qian

    Abstract: Temporal knowledge graph reasoning aims to predict future events with knowledge of existing facts and plays a key role in various downstream tasks. Previous methods focused on either graph structure learning or semantic reasoning, failing to integrate dual reasoning perspectives to handle different prediction scenarios. Moreover, they lack the capability to capture the inherent differences between… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: ACL25 findings

  7. MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

    Authors: Zhaode Wang, Jingbang Yang, Xinyu Qian, Shiwen Xing, Xiaotang Jiang, Chengfei Lv, Shengyu Zhang

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduce… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 7 pages, 5 figures. Published in the Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops (MMAsia '24 Workshops). The final authenticated version is available at https://dl.acm.org/doi/10.1145/3700410.3702126

  8. arXiv:2506.09513  [pdf, ps, other

    cs.CL cs.AI cs.MA

    ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

    Authors: Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Yu Rong, Wenbing Huang, Qifeng Bai, Tingyang Xu

    Abstract: Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is co… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 24 pages, 6 figures, 7 tables

  9. arXiv:2506.09398  [pdf, ps, other

    cs.LG physics.comp-ph

    Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

    Authors: Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

    Abstract: We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivarian… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Code available at: https://github.com/divelab/AIRS

  10. arXiv:2506.05616  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci physics.comp-ph

    Toward Greater Autonomy in Materials Discovery Agents: Unifying Planning, Physics, and Scientists

    Authors: Lianhao Zhou, Hongyi Ling, Keqiang Yan, Kaiji Zhao, Xiaoning Qian, Raymundo Arróyave, Xiaofeng Qian, Shuiwang Ji

    Abstract: We aim at designing language agents with greater autonomy for crystal materials discovery. While most of existing studies restrict the agents to perform specific tasks within predefined workflows, we aim to automate workflow planning given high-level goals and scientist intuition. To this end, we propose Materials Agent unifying Planning, Physics, and Scientists, known as MAPPS. MAPPS consists of… ▽ More

    Submitted 9 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  11. arXiv:2505.17773  [pdf, ps, other

    cs.LG

    C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models

    Authors: Amir Hossein Rahmati, Sanket Jantre, Weifeng Zhang, Yucheng Wang, Byung-Jun Yoon, Nathan M. Urban, Xiaoning Qian

    Abstract: Low-Rank Adaptation (LoRA) offers a cost-effective solution for fine-tuning large language models (LLMs), but it often produces overconfident predictions in data-scarce few-shot settings. To address this issue, several classical statistical learning approaches have been repurposed for scalable uncertainty-aware LoRA fine-tuning. However, these approaches neglect how input characteristics affect th… ▽ More

    Submitted 28 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  12. arXiv:2505.17460  [pdf, other

    cs.SE

    Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models

    Authors: Xinran Zheng, Xingzhi Qian, Huichi Zhou, Shuo Yang, Yiling He, Suman Jana, Lorenzo Cavallaro

    Abstract: Language models (LMs) show promise for vulnerability detection but struggle with long, real-world code due to sparse and uncertain vulnerability locations. These issues, exacerbated by token limits, often cause models to miss vulnerability-related signals, thereby impairing effective learning. A key intuition is to enhance LMs with concise, information-rich context. Commit-based annotations offer… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  13. arXiv:2505.16372  [pdf, ps, other

    cs.CV cs.AI

    Temporal and Spatial Feature Fusion Framework for Dynamic Micro Expression Recognition

    Authors: Feng Liu, Bingyu Nan, Xuezhong Qian, Xiaolan Fu

    Abstract: When emotions are repressed, an individual's true feelings may be revealed through micro-expressions. Consequently, micro-expressions are regarded as a genuine source of insight into an individual's authentic emotions. However, the transient and highly localised nature of micro-expressions poses a significant challenge to their accurate recognition, with the accuracy rate of micro-expression recog… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 17 pages

  14. arXiv:2505.16280  [pdf, ps, other

    cs.DC

    Brand: Managing Training Data with Batched Random Access

    Authors: Yuhao Li, Xuanhua Shi, Yunfei Zhao, Yongluan Zhou, Yusheng Hua, Xuehai Qian

    Abstract: This paper propose Brand, a comprehensive memory management system for deep learning training (DLT) where the memory capacity is much smaller than the size of the training datasets. Brand starts with a bold design choice that data files are always read from disk in batch, named chunk. Based on this assumption, we propose efficient data access protocol in both single-node setting and distributed en… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  15. arXiv:2505.15859  [pdf, ps, other

    cs.IR cs.AI

    AutoData: A Multi-Agent System for Open Web Data Collection

    Authors: Tianyi Ma, Yiyue Qian, Zheyuan Zhang, Zehong Wang, Xiaoye Qian, Feifan Bai, Yifan Ding, Xuwei Luo, Shinan Zhang, Keerthiram Murugesan, Chuxu Zhang, Yanfang Ye

    Abstract: The exponential growth of data-driven systems and AI technologies has intensified the demand for high-quality web-sourced datasets. While existing datasets have proven valuable, conventional web data collection approaches face significant limitations in terms of human effort and scalability. Current data-collecting solutions fall into two categories: wrapper-based methods that struggle with adapta… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  16. arXiv:2505.12045  [pdf, ps, other

    cs.CV

    FIGhost: Fluorescent Ink-based Stealthy and Flexible Backdoor Attacks on Physical Traffic Sign Recognition

    Authors: Shuai Yuan, Guowen Xu, Hongwei Li, Rui Zhang, Xinyuan Qian, Wenbo Jiang, Hangcheng Cao, Qingchuan Zhao

    Abstract: Traffic sign recognition (TSR) systems are crucial for autonomous driving but are vulnerable to backdoor attacks. Existing physical backdoor attacks either lack stealth, provide inflexible attack control, or ignore emerging Vision-Large-Language-Models (VLMs). In this paper, we introduce FIGhost, the first physical-world backdoor attack leveraging fluorescent ink as triggers. Fluorescent triggers… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  17. Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework

    Authors: Jun Li, Hongzhang Zhu, Tao Chen, Xiaohua Qian

    Abstract: Recently, numerous pancreas segmentation methods have achieved promising performance on local single-source datasets. However, these methods don't adequately account for generalizability issues, and hence typically show limited performance and low stability on test data from other sources. Considering the limited availability of distinct data sources, we seek to improve the generalization performa… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: accept by IEEE JBHI. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  18. arXiv:2505.05856  [pdf, ps, other

    cs.DC

    DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

    Authors: Xuan Peng, Xuanhua Shi, Haolin Zhang, Yunfei Zhao, Xuehai Qian

    Abstract: Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively support. In this paper, we introduce DawnPiper, a memory-scalable pipeline parallel training framework. Firstly, we develop a DL compilation-based profiling met… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  19. A Dual-Task Synergy-Driven Generalization Framework for Pancreatic Cancer Segmentation in CT Scans

    Authors: Jun Li, Yijue Zhang, Haibo Shi, Minhong Li, Qiwei Li, Xiaohua Qian

    Abstract: Pancreatic cancer, characterized by its notable prevalence and mortality rates, demands accurate lesion delineation for effective diagnosis and therapeutic interventions. The generalizability of extant methods is frequently compromised due to the pronounced variability in imaging and the heterogeneous characteristics of pancreatic lesions, which may mimic normal tissues and exhibit significant int… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: accept by IEEE Transactions on Medical Imaging (TMI) 2025

  20. arXiv:2504.15171  [pdf, other

    cs.LG

    Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture

    Authors: Meng Cui, Xianghu Yue, Xinyuan Qian, Jinzheng Zhao, Haohe Liu, Xubo Liu, Daoliang Li, Wenwu Wang

    Abstract: Fish Feeding Intensity Assessment (FFIA) is crucial in industrial aquaculture management. Recent multi-modal approaches have shown promise in improving FFIA robustness and efficiency. However, these methods face significant challenges when adapting to new fish species or environments due to catastrophic forgetting and the lack of suitable datasets. To address these limitations, we first introduce… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  21. arXiv:2504.14156  [pdf, other

    physics.optics cs.AI quant-ph

    Breaking the Diffraction Barrier for Passive Sources: Parameter-Decoupled Superresolution Assisted by Physics-Informed Machine Learning

    Authors: Abdelali Sajia, Bilal Benzimoun, Pawan Khatiwada, Guogan Zhao, Xiao-Feng Qian

    Abstract: We present a parameter-decoupled superresolution framework for estimating sub-wavelength separations of passive two-point sources without requiring prior knowledge or control of the source. Our theoretical foundation circumvents the need to estimate multiple challenging parameters such as partial coherence, brightness imbalance, random relative phase, and photon statistics. A physics-informed mach… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 12 pages, 3 figures

  22. arXiv:2504.11164  [pdf, other

    cs.CV

    TSAL: Few-shot Text Segmentation Based on Attribute Learning

    Authors: Chenming Li, Chengxu Liu, Yuanting Fan, Xiao Jin, Xingsong Hou, Xueming Qian

    Abstract: Recently supervised learning rapidly develops in scene text segmentation. However, the lack of high-quality datasets and the high cost of pixel annotation greatly limit the development of them. Considering the well-performed few-shot learning methods for downstream tasks, we investigate the application of the few-shot learning method to scene text segmentation. We propose TSAL, which leverages CLI… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  23. arXiv:2504.09077  [pdf, other

    cs.CV

    A Visual Self-attention Mechanism Facial Expression Recognition Network beyond Convnext

    Authors: Bingyu Nan, Feng Liu, Xuezhong Qian, Wei Song

    Abstract: Facial expression recognition is an important research direction in the field of artificial intelligence. Although new breakthroughs have been made in recent years, the uneven distribution of datasets and the similarity between different categories of facial expressions, as well as the differences within the same category among different subjects, remain challenges. This paper proposes a visual fa… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  24. arXiv:2504.08725  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    DocAgent: A Multi-Agent System for Automated Code Documentation Generation

    Authors: Dayu Yang, Antoine Simoulin, Xin Qian, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Grey Yang

    Abstract: High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete, unhelpful, or factually incorrect outputs. We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental cont… ▽ More

    Submitted 23 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted by ACL 2025. Code: github.com/facebookresearch/DocAgent

  25. On Benchmarking Code LLMs for Android Malware Analysis

    Authors: Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, Lorenzo Cavallaro

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in various code intelligence tasks. However, their effectiveness for Android malware analysis remains underexplored. Decompiled Android malware code presents unique challenges for analysis, due to the malicious logic being buried within a large number of functions and the frequent lack of meaningful function names. This paper prese… ▽ More

    Submitted 23 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted to the 34th ACM SIGSOFT ISSTA Companion (LLMSC Workshop 2025)

  26. arXiv:2504.00598  [pdf, ps, other

    cs.DC

    CFP: Efficient Optimization of Intra-Operator Parallelism Plans for Large Model Training

    Authors: Weifang Hu, Xuanhua Shi, Yunkai Zhang, Chang Wu, Xuan Peng, Jiaqi Zhai, Hai Jin, Xuehai Qian, Jingling Xue, Yongluan Zhou

    Abstract: Optimizing the parallel training of large models requires exploring intra-operator parallelism plans for a computation graph that typically contains tens of thousands of primitive operators. While the optimization of parallel data processing graphs has been extensively researched in database systems, the vast search space makes it challenging to apply traditional database query optimization method… ▽ More

    Submitted 6 July, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  27. arXiv:2503.21463  [pdf, other

    cs.CR cs.AI

    Unveiling Latent Information in Transaction Hashes: Hypergraph Learning for Ethereum Ponzi Scheme Detection

    Authors: Junhao Wu, Yixin Yang, Chengxiang Jin, Silu Mu, Xiaolei Qian, Jiajun Zhou, Shanqing Yu, Qi Xuan

    Abstract: With the widespread adoption of Ethereum, financial frauds such as Ponzi schemes have become increasingly rampant in the blockchain ecosystem, posing significant threats to the security of account assets. Existing Ethereum fraud detection methods typically model account transactions as graphs, but this approach primarily focuses on binary transactional relationships between accounts, failing to ad… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  28. arXiv:2503.05771  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci physics.comp-ph

    A Materials Foundation Model via Hybrid Invariant-Equivariant Architectures

    Authors: Keqiang Yan, Montgomery Bohde, Andrii Kryvenko, Ziyu Xiang, Kaiji Zhao, Siya Zhu, Saagar Kolachina, Doğuhan Sarıtürk, Jianwen Xie, Raymundo Arroyave, Xiaoning Qian, Xiaofeng Qian, Shuiwang Ji

    Abstract: Machine learning interatomic potentials (MLIPs) can predict energy, force, and stress of materials and enable a wide range of downstream discovery tasks. A key design choice in MLIPs involves the trade-off between invariant and equivariant architectures. Invariant models offer computational efficiency but may not perform as well, especially when predicting high-order outputs. In contrast, equivari… ▽ More

    Submitted 29 May, 2025; v1 submitted 25 February, 2025; originally announced March 2025.

    Comments: Preprint

  29. arXiv:2503.02242  [pdf, other

    cs.CV eess.IV

    $\mathbfΦ$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data

    Authors: Xidan Zhang, Yihan Zhuang, Qian Guo, Haodong Yang, Xuelin Qian, Gong Cheng, Junwei Han, Zhongling Huang

    Abstract: Approaches for improving generative adversarial networks (GANs) training under a few samples have been explored for natural images. However, these methods have limited effectiveness for synthetic aperture radar (SAR) images, as they do not account for the unique electromagnetic scattering properties of SAR. To remedy this, we propose a physics-inspired regularization method dubbed $Φ$-GAN, which i… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  30. arXiv:2503.00152  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

    Authors: Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arróyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

    Abstract: We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we p… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: This paper has been accepted as a NeurIPS 2024 Poster

  31. arXiv:2502.19958  [pdf, ps, other

    cs.CV

    ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

    Authors: Ke Niu, Haiyang Yu, Mengyang Zhao, Teng Fu, Siyang Yi, Wei Lu, Bin Li, Xuelin Qian, Xiangyang Xue

    Abstract: Person re-identification (Re-ID) is a crucial task in computer vision, aiming to recognize individuals across non-overlapping camera views. While recent advanced vision-language models (VLMs) excel in logical reasoning and multi-task generalization, their applications in Re-ID tasks remain limited. They either struggle to perform accurate matching based on identity-relevant features or assist imag… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  32. arXiv:2502.19411  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs

    Authors: Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian McAuley

    Abstract: In large language models (LLMs), code and reasoning reinforce each other: code offers an abstract, modular, and logic-driven structure that supports reasoning, while reasoning translates high-level goals into smaller, executable steps that drive more advanced code intelligence. In this study, we examine how code serves as a structured medium for enhancing reasoning: it provides verifiable executio… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: Project Repo: https://github.com/dayuyang1999/Awesome-Code-Reasoning

  33. arXiv:2502.13055  [pdf, other

    cs.CR cs.AI cs.LG

    LAMD: Context-driven Android Malware Detection and Classification with LLMs

    Authors: Xingzhi Qian, Xinran Zheng, Yiling He, Shuo Yang, Lorenzo Cavallaro

    Abstract: The rapid growth of mobile applications has escalated Android malware threats. Although there are numerous detection methods, they often struggle with evolving attacks, dataset biases, and limited explainability. Large Language Models (LLMs) offer a promising alternative with their zero-shot inference and reasoning capabilities. However, applying LLMs to Android malware detection presents two key… ▽ More

    Submitted 21 April, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: accepted by 2025 46th IEEE Symposium on Security and Privacy Workshops (SPW)

  34. arXiv:2502.11205  [pdf, other

    cs.LG cs.CY

    Deep Contrastive Learning for Feature Alignment: Insights from Housing-Household Relationship Inference

    Authors: Xiao Qian, Shangjia Dong, Rachel Davidson

    Abstract: Housing and household characteristics are key determinants of social and economic well-being, yet our understanding of their interrelationships remains limited. This study addresses this knowledge gap by developing a deep contrastive learning (DCL) model to infer housing-household relationships using the American Community Survey (ACS) Public Use Microdata Sample (PUMS). More broadly, the proposed… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  35. arXiv:2502.06874  [pdf, other

    cs.CL cs.AI cs.LG

    Group Reasoning Emission Estimation Networks

    Authors: Yanming Guo, Xiao Qian, Kevin Credit, Jin Ma

    Abstract: Accurate greenhouse gas (GHG) emission reporting is critical for governments, businesses, and investors. However, adoption remains limited particularly among small and medium enterprises due to high implementation costs, fragmented emission factor databases, and a lack of robust sector classification methods. To address these challenges, we introduce Group Reasoning Emission Estimation Networks (G… ▽ More

    Submitted 27 March, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  36. arXiv:2502.06173  [pdf, other

    cs.LG cs.AI cs.CL stat.AP stat.ML

    Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis

    Authors: Sanket Jantre, Tianle Wang, Gilchan Park, Kriti Chopra, Nicholas Jeon, Xiaoning Qian, Nathan M. Urban, Byung-Jun Yoon

    Abstract: Identification of protein-protein interactions (PPIs) helps derive cellular mechanistic understanding, particularly in the context of complex conditions such as neurodegenerative disorders, metabolic syndromes, and cancer. Large Language Models (LLMs) have demonstrated remarkable potential in predicting protein structures and interactions via automated mining of vast biomedical literature; yet the… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  37. arXiv:2502.02205  [pdf, other

    cs.LG

    From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control

    Authors: Peiyan Hu, Xiaowei Qian, Wenhao Deng, Rui Wang, Haodong Feng, Ruiqi Feng, Tao Zhang, Long Wei, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: The application of deep learning for partial differential equation (PDE)-constrained control is gaining increasing attention. However, existing methods rarely consider safety requirements crucial in real-world applications. To address this limitation, we propose Safe Diffusion Models for PDE Control (SafeDiffCon), which introduce the uncertainty quantile as model uncertainty quantification to achi… ▽ More

    Submitted 16 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: ICML 2025. 24 pages, 5 figures

  38. arXiv:2501.14735  [pdf, other

    cs.SE cs.AI

    ARCEAK: An Automated Rule Checking Framework Enhanced with Architectural Knowledge

    Authors: Junyong Chen, Ling-I Wu, Minyu Chen, Xiaoying Qian, Haoze Zhu, Qiongfang Zhang, Guoqiang Li

    Abstract: Automated Rule Checking (ARC) plays a crucial role in advancing the construction industry by addressing the laborious, inconsistent, and error-prone nature of traditional model review conducted by industry professionals. Manual assessment against intricate sets of rules often leads to significant project delays and expenses. In response to these challenges, ARC offers a promising solution to impro… ▽ More

    Submitted 10 December, 2024; originally announced January 2025.

    Comments: 12 pages, 5 figures

  39. arXiv:2501.03257  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition

    Authors: Wei Zhang, Tian-Hao Zhang, Chao Luo, Hui Zhou, Chao Yang, Xinyuan Qian, Xu-Cheng Yin

    Abstract: Recently, end-to-end automatic speech recognition has become the mainstream approach in both industry and academia. To optimize system performance in specific scenarios, the Weighted Finite-State Transducer (WFST) is extensively used to integrate acoustic and language models, leveraging its capacity to implicitly fuse language models within static graphs, thereby ensuring robust recognition while… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  40. arXiv:2501.03181  [pdf, other

    cs.SD cs.AI eess.AS

    FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles

    Authors: Tian-Hao Zhang, Jiawei Zhang, Jun Wang, Xinyuan Qian, Xu-Cheng Yin

    Abstract: Humans can perceive speakers' characteristics (e.g., identity, gender, personality and emotion) by their appearance, which are generally aligned to their voice style. Recently, vision-driven Text-to-speech (TTS) scholars grounded their investigations on real-person faces, thereby restricting effective speech synthesis from applying to vast potential usage scenarios with diverse characters and imag… ▽ More

    Submitted 15 April, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  41. arXiv:2412.13496  [pdf, other

    cs.CV

    QueryCDR: Query-Based Controllable Distortion Rectification Network for Fisheye Images

    Authors: Pengbo Guo, Chengxu Liu, Xingsong Hou, Xueming Qian

    Abstract: Fisheye image rectification aims to correct distortions in images taken with fisheye cameras. Although current models show promising results on images with a similar degree of distortion as the training data, they will produce sub-optimal results when the degree of distortion changes and without retraining. The lack of generalization ability for dealing with varying degrees of distortion limits th… ▽ More

    Submitted 23 December, 2024; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: ECCV2024

  42. arXiv:2412.07193  [pdf, other

    cs.LG stat.ML

    Epidemiological Model Calibration via Graybox Bayesian Optimization

    Authors: Puhua Niu, Byung-Jun Yoon, Xiaoning Qian

    Abstract: In this study, we focus on developing efficient calibration methods via Bayesian decision-making for the family of compartmental epidemiological models. The existing calibration methods usually assume that the compartmental model is cheap in terms of its output and gradient evaluation, which may not hold in practice when extending them to more general settings. Therefore, we introduce model calibr… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  43. arXiv:2412.05573  [pdf, other

    cs.CV cs.AI

    Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

    Authors: Ye Wang, Yaxiong Wang, Guoshuai Zhao, Xueming Qian

    Abstract: Continuous Generalized Category Discovery (C-GCD) aims to continually discover novel classes from unlabelled image sets while maintaining performance on old classes. In this paper, we propose a novel learning framework, dubbed Neighborhood Commonality-aware Evolution Network (NCENet) that conquers this task from the perspective of representation learning. Concretely, to learn discriminative repres… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: 12 pages, 7 Figures

  44. arXiv:2412.03666  [pdf, other

    cs.LG

    Hyperparameter Tuning Through Pessimistic Bilevel Optimization

    Authors: Meltem Apaydin Ustun, Liang Xu, Bo Zeng, Xiaoning Qian

    Abstract: Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters o… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  45. arXiv:2412.03312  [pdf, other

    cs.LG cs.AI stat.ML

    Path-Guided Particle-based Sampling

    Authors: Mingzhou Fan, Ruida Zhou, Chao Tian, Xiaoning Qian

    Abstract: Particle-based Bayesian inference methods by sampling from a partition-free target (posterior) distribution, e.g., Stein variational gradient descent (SVGD), have attracted significant attention. We propose a path-guided particle-based sampling~(PGPS) method based on a novel Log-weighted Shrinkage (LwS) density path linking an initial distribution to the target distribution. We propose to utilize… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  46. arXiv:2412.02270  [pdf, other

    cs.CV cs.AI

    Sustainable Self-evolution Adversarial Training

    Authors: Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang

    Abstract: With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security. However, existing adversarial training defense models, which rely on single or limited types of attacks under a one-time learning process, struggle to adapt to the dynamic and evolving nature… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: Accepted to ACMMM 2024

  47. arXiv:2412.02218  [pdf, other

    cs.ET cs.AR

    MASIM: An Efficient Multi-Array Scheduler for In-Memory SIMD Computation

    Authors: Xingyue Qian, Chen Nie, Zhezhi He, Weikang Qian

    Abstract: Single instruction, multiple data (SIMD) is a popular design style of in-memory computing (IMC) architectures, which enables memory arrays to perform logic operations to achieve low energy consumption and high parallelism. To implement a target function on the data stored in memory, the function is first transformed into a netlist of the supported logic operations through logic synthesis. Then, th… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  48. arXiv:2412.02212  [pdf, other

    cs.ET

    High-Quality Iterative Logic Compiler for In-Memory SIMD Computation with Tight Coupling of Synthesis and Scheduling

    Authors: Xingyue Qian, Chenyang Lv, Zhezhi He, Weikang Qian

    Abstract: In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a function, a logic compiler is needed that involves two steps: logic synthesis and scheduling. Logic synthesis transforms the function into a netlist of supported op… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  49. arXiv:2411.16949  [pdf

    cs.CV

    A SAM-guided and Match-based Semi-Supervised Segmentation Framework for Medical Imaging

    Authors: Guoping Xu, Xiaoxue Qian, Hua Chieh Shao, Jax Luo, Weiguo Lu, You Zhang

    Abstract: This study introduces SAMatch, a SAM-guided Match-based framework for semi-supervised medical image segmentation, aimed at improving pseudo label quality in data-scarce scenarios. While Match-based frameworks are effective, they struggle with low-quality pseudo labels due to the absence of ground truth. SAM, pre-trained on a large dataset, generalizes well across diverse tasks and assists in gener… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  50. arXiv:2411.13711  [pdf, ps, other

    cs.LG math.OC stat.ML

    Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

    Authors: Xiaochi Qian, Zixuan Xie, Xinyu Liu, Shangtong Zhang

    Abstract: This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise. As a corollary, we also obtain convergence rates in $L^p$. Key to our successes is a novel discretization of the mean ODE of stochastic approximation algorithms using intervals with diminishi… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.