Skip to main content

Showing 1–50 of 1,547 results for author: Xiao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09986  [pdf, other

    cs.CV eess.IV

    High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

    Authors: Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terres… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.08518  [pdf, ps, other

    math.OC cs.LG

    SPP-SBL: Space-Power Prior Sparse Bayesian Learning for Block Sparse Recovery

    Authors: Yanhao Zhang, Zhihan Zhu, Yong Xia

    Abstract: The recovery of block-sparse signals with unknown structural patterns remains a fundamental challenge in structured sparse signal reconstruction. By proposing a variance transformation framework, this paper unifies existing pattern-based block sparse Bayesian learning methods, and introduces a novel space power prior based on undirected graph models to adaptively capture the unknown patterns of bl… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 12 pages, 6 figures, 4 tables

  3. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  4. arXiv:2505.07214  [pdf, other

    cs.HC cs.AI cs.CV

    Towards user-centered interactive medical image segmentation in VR with an assistive AI agent

    Authors: Pascal Spiegler, Arash Harirpoush, Yiming Xiao

    Abstract: Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a nov… ▽ More

    Submitted 15 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  5. arXiv:2505.06371  [pdf, ps, other

    cs.LG cs.AI

    The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

    Authors: Jae-Won Chung, Jiachen Liu, Jeff J. Ma, Ruofan Wu, Oh Jun Kweon, Yuxuan Xia, Zhiyu Wu, Mosharaf Chowdhury

    Abstract: As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environ… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Leaderboard: https://ml.energy/leaderboard

  6. arXiv:2505.06131  [pdf, ps, other

    cs.RO

    ELA-ZSON: Efficient Layout-Aware Zero-Shot Object Navigation Agent with Hierarchical Planning

    Authors: Jiawei Hou, Yuting Xiao, Xiangyang Xue, Taiping Zeng

    Abstract: We introduce ELA-ZSON, an efficient layout-aware zero-shot object navigation (ZSON) approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, ELA-ZSON achieves both efficient and effective navigation. The process is managed by an LL… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  7. arXiv:2505.05870  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Towards Facial Image Compression with Consistency Preserving Diffusion Prior

    Authors: Yimin Zhou, Yichong Xia, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reco… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  8. arXiv:2505.05007  [pdf, other

    cs.CV

    Driving with Context: Online Map Matching for Complex Roads Using Lane Markings and Scenario Recognition

    Authors: Xin Bi, Zhichao Li, Yuxuan Xia, Panpan Tong, Lijuan Zhang, Yang Chen, Junsheng Fu

    Abstract: Accurate online map matching is fundamental to vehicle navigation and the activation of intelligent driving functions. Current online map matching methods are prone to errors in complex road networks, especially in multilevel road area. To address this challenge, we propose an online Standard Definition (SD) map matching method by constructing a Hidden Markov Model (HMM) with multiple probability… ▽ More

    Submitted 10 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages and 12 figures. Under review at IEEE RA-L

  9. arXiv:2505.02087  [pdf, other

    cs.AI

    Retrieval-augmented in-context learning for multimodal large language models in disease classification

    Authors: Zaifu Zhan, Shuang Zhou, Xiaoshan Zhou, Yongkang Xiao, Jun Wang, Jiawen Deng, He Zhu, Yu Hou, Rui Zhang

    Abstract: Objectives: We aim to dynamically retrieve informative demonstrations, enhancing in-context learning in multimodal large language models (MLLMs) for disease classification. Methods: We propose a Retrieval-Augmented In-Context Learning (RAICL) framework, which integrates retrieval-augmented generation (RAG) and in-context learning (ICL) to adaptively select demonstrations with similar disease pat… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 17 Pages, 1 figure, 7 tables

  10. arXiv:2505.01222  [pdf, other

    cs.NI eess.SP

    Performance of Cell-Free Massive MIMO in Realistic Urban Propagation Environments

    Authors: Yunlu Xiao, Ljiljana Simić

    Abstract: While UE-centric cell-free massive MIMO (CF-mMIMO) provides high and uniform throughput performance under the assumption of a uniform propagation environment modeled by the log-distance path loss channel model, the performance under a realistic urban propagation environment is not yet fully addressed. In this paper we conduct the first comparative performance study of CF-mMIMO under both the widel… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: This paper is accepted to be published in IEEE WCNC25

  11. arXiv:2505.00551  [pdf, other

    cs.CL

    100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

    Authors: Chong Zhang, Yue Deng, Xiang Lin, Bin Wang, Dianwen Ng, Hai Ye, Xingxuan Li, Yao Xiao, Zhanfeng Mo, Qi Zhang, Lidong Bing

    Abstract: The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open… ▽ More

    Submitted 15 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  12. arXiv:2504.21433  [pdf, other

    cs.AI

    NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence

    Authors: Zhicong Li, Hangyu Mao, Jiangjin Yin, Mingzhe Xing, Zhiwei Xu, Yuanxing Zhang, Yang Xiao

    Abstract: This paper argues that the next generation of AI agent (NGENT) should integrate across-domain abilities to advance toward Artificial General Intelligence (AGI). Although current AI agents are effective in specialized tasks such as robotics, role-playing, and tool-using, they remain confined to narrow domains. We propose that future AI agents should synthesize the strengths of these specialized sys… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  13. arXiv:2504.21228  [pdf, other

    cs.CR cs.AI

    CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks

    Authors: Rui Wang, Junda Wu, Yu Xia, Tong Yu, Ruiyi Zhang, Ryan Rossi, Lina Yao, Julian McAuley

    Abstract: Large Language Models (LLMs) are identified as being susceptible to indirect prompt injection attack, where the model undesirably deviates from user-provided instructions by executing tasks injected in the prompt context. This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt. In this paper, we propose CachePrune that defends against this attack… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  14. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

    Authors: Yao Xiao, Tingfa Xu, Yu Xin, Jianan Li

    Abstract: Embedded flight devices with visual capabilities have become essential for a wide range of applications. In aerial image detection, while many existing methods have partially addressed the issue of small target detection, challenges remain in optimizing small target detection and balancing detection accuracy with efficiency. These issues are key obstacles to the advancement of real-time aerial ima… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: AAAI 2025

  15. arXiv:2504.20213  [pdf, other

    cs.LG cs.AI

    Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

    Authors: Yuan Xia, Akanksha Atrey, Fadoua Khmaissia, Kedar S. Namjoshi

    Abstract: This paper investigates the logical reasoning capabilities of large language models (LLMs). For a precisely defined yet tractable formulation, we choose the conceptually simple but technically complex task of constructing proofs in Boolean logic. A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions. Inco… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  16. arXiv:2504.19258  [pdf, other

    cs.CV cs.RO

    OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial Fusion

    Authors: Shuhao Kang, Martin Y. Liao, Yan Xia, Olaf Wysocki, Boris Jutzi, Daniel Cremers

    Abstract: LiDAR place recognition is a critical capability for autonomous navigation and cross-modal localization in large-scale outdoor environments. Existing approaches predominantly depend on pre-built 3D dense maps or aerial imagery, which impose significant storage overhead and lack real-time adaptability. In this paper, we propose OPAL, a novel network for LiDAR place recognition that leverages OpenSt… ▽ More

    Submitted 30 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Technical report. 15 pages, 9 figures

  17. arXiv:2504.19136  [pdf, other

    cs.CV cs.AI eess.IV

    PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

    Authors: Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li

    Abstract: The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and the underutilization of spectral complementarity. Existing methods often fail to decouple shared structural features from modality-specific radiometric attributes, leading to feature conflicts and information loss. To address this issue, we propose Phase-… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 13 pages, 8 figures

  18. arXiv:2504.18588  [pdf

    cs.LG cs.AI

    Dynamic QoS Prediction via a Non-Negative Tensor Snowflake Factorization

    Authors: YongHui Xia, Lan Wang, Hao Wu

    Abstract: Dynamic quality of service (QoS) data exhibit rich temporal patterns in user-service interactions, which are crucial for a comprehensive understanding of user behavior and service conditions in Web service. As the number of users and services increases, there is a large amount of unobserved QoS data, which significantly affects users'choice of services. To predict unobserved QoS data, we propose a… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  19. arXiv:2504.17584  [pdf, other

    cs.AR cs.LG

    L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference

    Authors: Qingyuan Liu, Liyan Chen, Yanning Yang, Haocheng Wang, Dong Du, Zhigang Mao, Naifeng Jing, Yubin Xia, Haibo Chen

    Abstract: Large Language Models (LLMs) increasingly require processing long text sequences, but GPU memory limitations force difficult trade-offs between memory capacity and bandwidth. While HBM-based acceleration offers high bandwidth, its capacity remains constrained. Offloading data to host-side DIMMs improves capacity but introduces costly data swapping overhead. We identify that the critical memory bot… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 16 pages, 11 figures

  20. arXiv:2504.17577  [pdf, other

    cs.LG

    TileLang: A Composable Tiled Programming Model for AI Systems

    Authors: Lei Wang, Yu Cheng, Yining Shi, Zhengju Tang, Zhiwen Mo, Wenhao Xie, Lingxiao Ma, Yuqing Xia, Jilong Xue, Fan Yang, Zhi Yang

    Abstract: Modern AI workloads rely heavily on optimized computing kernels for both training and inference. These AI kernels follow well-defined data-flow patterns, such as moving tiles between DRAM and SRAM and performing a sequence of computations on those tiles. However, writing high-performance kernels remains complex despite the clarity of these patterns. Achieving peak performance requires careful, har… ▽ More

    Submitted 27 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  21. arXiv:2504.15525  [pdf, other

    cs.LG

    Federated Latent Factor Learning for Recovering Wireless Sensor Networks Signal with Privacy-Preserving

    Authors: Chengjun Yu, Yixin Ran, Yangyi Xia, Jia Wu, Xiaojing Liu

    Abstract: Wireless Sensor Networks (WSNs) are a cutting-edge domain in the field of intelligent sensing. Due to sensor failures and energy-saving strategies, the collected data often have massive missing data, hindering subsequent analysis and decision-making. Although Latent Factor Learning (LFL) has been proven effective in recovering missing data, it fails to sufficiently consider data privacy protection… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted By ICAIS&ISAS 2025

  22. arXiv:2504.15477  [pdf, other

    cs.LG

    In-context Ranking Preference Optimization

    Authors: Junda Wu, Rohan Surana, Zhouhang Xie, Yiran Shen, Yu Xia, Tong Yu, Ryan A. Rossi, Prithviraj Ammanabrolu, Julian McAuley

    Abstract: Recent developments in Direct Preference Optimization (DPO) allow large language models (LLMs) to function as implicit ranking models by maximizing the margin between preferred and non-preferred responses. In practice, user feedback on such lists typically involves identifying a few relevant items in context rather than providing detailed pairwise comparisons for every possible item pair. Moreover… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 10 pages

  23. arXiv:2504.15476  [pdf, other

    cs.IR

    From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

    Authors: Rohan Surana, Junda Wu, Zhouhang Xie, Yu Xia, Harald Steck, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Conversational recommender systems (CRS) typically require extensive domain-specific conversational datasets, yet high costs, privacy concerns, and data-collection challenges severely limit their availability. Although Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities, practical applications often favor smaller, internally managed recommender models due to scala… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures

  24. arXiv:2504.15037  [pdf, other

    cs.LG

    A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

    Authors: Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yan xia, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world, thereby limiting their broader applications. We argue that… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  25. arXiv:2504.14993  [pdf, other

    cs.CR cs.DB

    Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy

    Authors: Rong Du, Qingqing Ye, Yaxin Xiao, Liantong Yu, Yue Fu, Haibo Hu

    Abstract: Stream data from real-time distributed systems such as IoT, tele-health, and crowdsourcing has become an important data source. However, the collection and analysis of user-generated stream data raise privacy concerns due to the potential exposure of sensitive information. To address these concerns, local differential privacy (LDP) has emerged as a promising standard. Nevertheless, applying LDP to… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.14655  [pdf, other

    cs.LG cs.CL cs.SE

    LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs

    Authors: Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, Xiaolong Xu

    Abstract: We introduce LeetCodeDataset, a high-quality benchmark for evaluating and training code-generation models, addressing two key challenges in LLM research: the lack of reasoning-focused coding benchmarks and self-contained training testbeds. By curating LeetCode Python problems with rich metadata, broad coverage, 100+ test cases per problem, and temporal splits (pre/post July 2024), our dataset enab… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  27. arXiv:2504.14538  [pdf, other

    cs.CL

    BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation

    Authors: Yiting Ran, Xintao Wang, Tian Qiu, Jiaqing Liang, Yanghua Xiao, Deqing Yang

    Abstract: Recent advances in large language models (LLMs) have enabled social simulation through multi-agent systems. Prior efforts focus on agent societies created from scratch, assigning agents with newly defined personas. However, simulating established fictional worlds and characters remain largely underexplored, despite its significant practical value. In this paper, we introduce BookWorld, a comprehen… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 19 pages, 4 figures

  28. arXiv:2504.14178  [pdf, other

    cs.CV

    Segregation and Context Aggregation Network for Real-time Cloud Segmentation

    Authors: Yijie Li, Hewei Wang, Jiayi Zhang, Jinjiang You, Jinfeng Xu, Puzhen Wu, Yunzhong Xiao, Soumyabrata Dev

    Abstract: Cloud segmentation from intensity images is a pivotal task in atmospheric science and computer vision, aiding weather forecasting and climate analysis. Ground-based sky/cloud segmentation extracts clouds from images for further feature analysis. Existing methods struggle to balance segmentation accuracy and computational efficiency, limiting real-world deployment on edge devices, so we introduce S… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 15 pages

  29. arXiv:2504.14145  [pdf, other

    cs.DC cs.AI

    PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline

    Authors: Zhenliang Xue, Hanpeng Hu, Xing Chen, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu, Daxin Jiang, Yubin Xia, Haibo Chen

    Abstract: Large multimodal models (LMMs) have demonstrated excellent capabilities in both understanding and generation tasks with various modalities. While these models can accept flexible combinations of input data, their training efficiency suffers from two major issues: pipeline stage imbalance caused by heterogeneous model architectures, and training data dynamicity stemming from the diversity of multim… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  30. arXiv:2504.14108  [pdf, other

    cs.CV

    Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models

    Authors: Zhenyu Yu, Mohd Yamani Idna Idris, Pei Wang, Yuelong Xia

    Abstract: We present DanceText, a training-free framework for multilingual text editing in images, designed to support complex geometric transformations and achieve seamless foreground-background integration. While diffusion-based generative models have shown promise in text-guided image synthesis, they often lack controllability and fail to preserve layout consistency under non-trivial manipulations such a… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  31. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  32. arXiv:2504.13582  [pdf, other

    cs.RO cs.LG

    Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots

    Authors: Zongyuan Chen, Yan Xia, Jiayuan Liu, Jijia Liu, Wenhao Tang, Jiayu Chen, Feng Gao, Longfei Ma, Hongen Liao, Yu Wang, Chao Yu, Boyu Zhang, Fei Xing

    Abstract: Soft robots exhibit inherent compliance and safety, which makes them particularly suitable for applications requiring direct physical interaction with humans, such as surgical procedures. However, their nonlinear and hysteretic behavior, resulting from the properties of soft materials, presents substantial challenges for accurate modeling and control. In this study, we present a soft robotic syste… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  33. arXiv:2504.13267  [pdf, other

    cs.CR eess.SY

    Leveraging Functional Encryption and Deep Learning for Privacy-Preserving Traffic Forecasting

    Authors: Isaac Adom, Mohammmad Iqbal Hossain, Hassan Mahmoud, Ahmad Alsharif, Mahmoud Nabil Mahmoud, Yang Xiao

    Abstract: Over the past few years, traffic congestion has continuously plagued the nation's transportation system creating several negative impacts including longer travel times, increased pollution rates, and higher collision risks. To overcome these challenges, Intelligent Transportation Systems (ITS) aim to improve mobility and vehicular systems, ensuring higher levels of safety by utilizing cutting-edge… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 17 pages, 14 Figures, Journal Publication

  34. arXiv:2504.13044  [pdf, other

    q-bio.QM cs.LG physics.bio-ph

    The Dissipation Theory of Aging: A Quantitative Analysis Using a Cellular Aging Map

    Authors: Farhan Khodaee, Rohola Zandie, Yufan Xia, Elazer R. Edelman

    Abstract: We propose a new theory for aging based on dynamical systems and provide a data-driven computational method to quantify the changes at the cellular level. We use ergodic theory to decompose the dynamics of changes during aging and show that aging is fundamentally a dissipative process within biological systems, akin to dynamical systems where dissipation occurs due to non-conservative forces. To q… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  35. arXiv:2504.12824  [pdf, other

    cs.AR

    Mixed Structural Choice Operator: Enhancing Technology Mapping with Heterogeneous Representations

    Authors: Zhang Hu, Hongyang Pan, Yinshui Xia, Lunyao Wang, Zhufei Chu

    Abstract: The independence of logic optimization and technology mapping poses a significant challenge in achieving high-quality synthesis results. Recent studies have improved optimization outcomes through collaborative optimization of multiple logic representations and have improved structural bias through structural choices. However, these methods still rely on technology-independent optimization and fail… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted by DAC 2025. Please note that this is not the final camera-ready version

  36. arXiv:2504.12345  [pdf, other

    cs.CL cs.CY cs.MA

    Reimagining Urban Science: Scaling Causal Inference with Large Language Models

    Authors: Yutong Xia, Ao Qu, Yunhan Zheng, Yihong Tang, Dingyi Zhuang, Yuxuan Liang, Shenhao Wang, Cathy Wu, Lijun Sun, Roger Zimmermann, Jinhua Zhao

    Abstract: Urban causal research is essential for understanding the complex dynamics of cities and informing evidence-based policies. However, it is challenged by the inefficiency and bias of hypothesis generation, barriers to multimodal data complexity, and the methodological fragility of causal experimentation. Recent advances in large language models (LLMs) present an opportunity to rethink how urban caus… ▽ More

    Submitted 9 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  37. arXiv:2504.12341  [pdf, other

    cs.CL

    Streamlining Biomedical Research with Specialized LLMs

    Authors: Linqing Chen, Weilei Wang, Yubin Xia, Wentao Wu, Peng Xu, Zilong Bai, Jie Fang, Chaobo Xu, Ran Hu, Licong Xu, Haoran Hua, Jing Sun, Hanmeng Zhong, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yong Gu, Tao Shi, Chaochao Wang, Jianping Lu, Cheng Sun, Yixin Wang , et al. (8 additional authors not shown)

    Abstract: In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, imag… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations,p9--19,2025

  38. arXiv:2504.12285  [pdf, other

    cs.CL cs.LG

    BitNet b1.58 2B4T Technical Report

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei

    Abstract: We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performanc… ▽ More

    Submitted 24 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Work in progress

  39. arXiv:2504.12194  [pdf, ps, other

    cs.IT

    The Optimal Condition Number for ReLU Function

    Authors: Yu Xia, Haoyu Zhou

    Abstract: ReLU is a widely used activation function in deep neural networks. This paper explores the stability properties of the ReLU map. For any weight matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ and bias vector $\boldsymbol{b} \in \mathbb{R}^{m}$ at a given layer, we define the condition number $β_{\boldsymbol{A},\boldsymbol{b}}$ as… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 29 pages

  40. arXiv:2504.12167  [pdf, other

    cs.CV cs.LG

    RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning

    Authors: Yuan Luo, Rudolf Hoffmann, Yan Xia, Olaf Wysocki, Benedikt Schwab, Thomas H. Kolbe, Daniel Cremers

    Abstract: Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper, we first introduce a unique dataset, RadarCity, comprising 54K synchronized radar-image pairs and semantic 3D city models. Moreover, we propose a novel neural n… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: The paper accepted for CVPRW '25 (PBVS 2025 - the Perception Beyond the Visible Spectrum)

  41. arXiv:2504.11637  [pdf, other

    cs.CV

    DamageCAT: A Deep Learning Transformer Framework for Typology-Based Post-Disaster Building Damage Categorization

    Authors: Yiming Xiao, Ali Mostafavi

    Abstract: Natural disasters increasingly threaten communities worldwide, creating an urgent need for rapid, reliable building damage assessment to guide emergency response and recovery efforts. Current methods typically classify damage in binary (damaged/undamaged) or ordinal severity terms, limiting their practical utility. In fact, the determination of damage typology is crucial for response and recovery… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 23 pages, 6 figures

  42. arXiv:2504.09039  [pdf, other

    cs.CV cs.AI cs.LG

    Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization

    Authors: Gen Li, Yang Xiao, Jie Ji, Kaiyuan Deng, Bo Hui, Linke Guo, Xiaolong Ma

    Abstract: Text-to-image (T2I) diffusion models have achieved remarkable success in generating high-quality images from textual prompts. However, their ability to store vast amounts of knowledge raises concerns in scenarios where selective forgetting is necessary, such as removing copyrighted content, reducing biases, or eliminating harmful concepts. While existing unlearning methods can remove certain conce… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  43. arXiv:2504.08096  [pdf, other

    physics.bio-ph cs.AI physics.comp-ph

    Cellular Development Follows the Path of Minimum Action

    Authors: Rohola Zandie, Farhan Khodaee, Yufan Xia, Elazer R. Edelman

    Abstract: Cellular development follows a stochastic yet rule-governed trajectory, though the underlying principles remain elusive. Here, we propose that cellular development follows paths of least action, aligning with foundational physical laws that govern dynamic systems across nature. We introduce a computational framework that takes advantage of the deep connection between the principle of least action… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  44. Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models

    Authors: Yisong Xiao, Aishan Liu, Siyuan Liang, Xianglong Liu, Dacheng Tao

    Abstract: LLMs have demonstrated remarkable performance across diverse applications, yet they inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups. These associations perpetuate and even amplify harmful social biases, raising significant fairness concerns. To mitigate such biases, prior studies have attempted to… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by ISSTA 2025.20 pages

  45. arXiv:2504.07753  [pdf

    eess.IV cs.CV

    Virtual-mask Informed Prior for Sparse-view Dual-Energy CT Reconstruction

    Authors: Zini Chen, Yao Xiao, Junyan Zhang, Shaoyu Wang, Liu Shi, Qiegen Liu

    Abstract: Sparse-view sampling in dual-energy computed tomography (DECT) significantly reduces radiation dose and increases imaging speed, yet is highly prone to artifacts. Although diffusion models have demonstrated potential in effectively handling incomplete data, most existing methods in this field focus on the image do-main and lack global constraints, which consequently leads to insufficient reconstru… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  46. arXiv:2504.07733  [pdf, other

    cs.CL econ.GN

    DeepGreen: Effective LLM-Driven Green-washing Monitoring System Designed for Empirical Testing -- Evidence from China

    Authors: Congluo Xu, Yu Miao, Yiling Xiao, Chengmengjia Lin

    Abstract: This paper proposes DeepGreen, an Large Language Model Driven (LLM-Driven) system for detecting corporate green-washing behaviour. Utilizing dual-layer LLM analysis, DeepGreen preliminarily identifies potential green keywords in financial statements and then assesses their implementation degree via iterative semantic analysis of LLM. A core variable GreenImplement is derived from the ratio from th… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  47. arXiv:2504.07070  [pdf, other

    cs.CL

    A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

    Authors: Zhouhang Xie, Junda Wu, Yiran Shen, Yu Xia, Xintong Li, Aaron Chang, Ryan Rossi, Sachin Kumar, Bodhisattwa Prasad Majumder, Jingbo Shang, Prithviraj Ammanabrolu, Julian McAuley

    Abstract: Personalized preference alignment for large language models (LLMs), the process of tailoring LLMs to individual users' preferences, is an emerging research direction spanning the area of NLP and personalization. In this survey, we present an analysis of works on personalized alignment and modeling for LLMs. We introduce a taxonomy of preference alignment techniques, including training time, infere… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  48. arXiv:2504.07002  [pdf, ps, other

    cs.CR cs.SE

    DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction

    Authors: Yuan Xiao, Yuchen Chen, Shiqing Ma, Haocheng Huang, Chunrong Fang, Yanwei Chen, Weisong Sun, Yunfeng Zhu, Xiaofang Zhang, Zhenyu Chen

    Abstract: Watermarking is a technique to help identify the source of data points, which can be used to help prevent the misuse of protected datasets. Existing methods on code watermarking, leveraging the idea from the backdoor research, embed stealthy triggers as watermarks.Despite their high resilience against dilution attacks and backdoor detections, the robustness has not been fully evaluated. To fill th… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted to ISSTA 2025. Code is available at https://github.com/xiaoyuanpigo/DeCoMa

  49. arXiv:2504.06426  [pdf, other

    cs.CL cs.LG

    S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning

    Authors: Hanqing Zeng, Yinglong Xia, Zhuokai Zhao, Gilbert Jiang, Qiang Zhang, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, Benyu Zhang

    Abstract: Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) architectures enhance model capacity at the cost of more & under-utilized parameters. To address these limitations, we propose Structural Mixture of R… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  50. arXiv:2504.06271  [pdf, other

    cs.IR cs.AI cs.CL

    ER-RAG: Enhance RAG with ER-Based Unified Modeling of Heterogeneous Data Sources

    Authors: Yikuan Xia, Jiazun Chen, Yirui Zhan, Suifeng Zhao, Weipeng Jiang, Chaorui Zhang, Wei Han, Bo Bai, Jun Gao

    Abstract: Large language models (LLMs) excel in question-answering (QA) tasks, and retrieval-augmented generation (RAG) enhances their precision by incorporating external evidence from diverse sources like web pages, databases, and knowledge graphs. However, current RAG methods rely on agent-specific strategies for individual data sources, posing challenges low-resource or black-box environments and complic… ▽ More

    Submitted 2 March, 2025; originally announced April 2025.