Skip to main content

Showing 1–50 of 12,646 results for author: Wang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10554  [pdf, ps, other

    cs.CL

    Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

    Authors: Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li

    Abstract: Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning. Prior work has shown that outcome-based reinforcement learning (RL) can incidentally elicit advanced reasoning behaviors such as self-correction, backtracking, and verification phenomena often referred to as the model's "aha moment". However, the timing and consistency of these emergent behaviors r… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: In Progress

  2. arXiv:2505.10354  [pdf, ps, other

    cs.CL

    LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations

    Authors: Yile Wang, Zhanyu Shen, Hui Huang

    Abstract: Semantic text representation is a fundamental task in the field of natural language processing. Existing text embedding (e.g., SimCSE and LLM2Vec) have demonstrated excellent performance, but the values of each dimension are difficult to trace and interpret. Bag-of-words, as classic sparse interpretable embeddings, suffers from poor performance. Recently, Benara et al. (2024) propose interpretable… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: ACL 2025 Findings

  3. arXiv:2505.10315  [pdf, other

    cs.CR cs.AI

    Private Transformer Inference in MLaaS: A Survey

    Authors: Yang Li, Xinyu Zhou, Yitong Wang, Liangxin Qian, Jun Zhao

    Abstract: Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-pa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.10289  [pdf, ps, other

    cs.CV

    MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

    Authors: Yue Wang, Shuai Xu, Xuelin Zhu, Yicong Li

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen state-object combinations by leveraging known combinations. Existing studies basically rely on the cross-modal alignment capabilities of CLIP but tend to overlook its limitations in capturing fine-grained local features, which arise from its architectural and training paradigm. To address this issue, we propose a Multi-Stage Cross-mo… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 9 pages, 5 figures

  5. arXiv:2505.10257  [pdf, ps, other

    cs.CV

    Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot

    Authors: Hao Lu, Jiaqi Tang, Jiyao Wang, Yunfan LU, Xu Cao, Qingyong Hu, Yin Wang, Yuting Zhang, Tianxin Xie, Yunpeng Zhang, Yong Chen, Jiayu. Gao, Bin Huang, Dengbo He, Shuiguang Deng, Hao Chen, Ying-Cong Chen

    Abstract: The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer achieves three highlights: (1) Super alignment: It achieves different reactions according to different people's preferences and biases. (2) Generalist: It can u… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.10226  [pdf, ps, other

    cs.NI

    Solar-CSK: Decoding Color Coded Visible Light Communications using Solar Cells

    Authors: Yanxiang Wang, Yihe Yan, Jiawei Hu, Cheng Jiang, Brano Kusy, Ashraf Uddin, Mahbub Hassan, Wen Hu

    Abstract: Visible Light Communication (VLC) provides an energy-efficient wireless solution by using existing LED-based illumination for high-speed data transmissions. Although solar cells offer the advantage of simultaneous energy harvesting and data reception, their broadband nature hinders accurate decoding of color-coded signals like Color Shift Keying (CSK). In this paper, we propose a novel approach ex… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 14 pages, 25 figures

  7. arXiv:2505.10075  [pdf, ps, other

    cs.RO cs.CV

    FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

    Authors: Jun Guo, Xiaojian Ma, Yikai Wang, Min Yang, Huaping Liu, Qing Li

    Abstract: This paper investigates training better visual world models for robot manipulation, i.e., models that can predict future visual observations by conditioning on past frames and robot actions. Specifically, we consider world models that operate on RGB-D frames (RGB-D world models). As opposed to canonical approaches that handle dynamics prediction mostly implicitly and reconcile it with visual rende… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Project page: see https://sharinka0715.github.io/FlowDreamer/

  8. arXiv:2505.10010  [pdf, ps, other

    cs.LG

    ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts

    Authors: Jing-Cheng Pang, Kaiyuan Li, Yidi Wang, Si-Hang Yang, Shengyi Jiang, Yang Yu

    Abstract: A central challenge in reinforcement learning (RL) is its dependence on extensive real-world interaction data to learn task-specific policies. While recent work demonstrates that large language models (LLMs) can mitigate this limitation by generating synthetic experience (noted as imaginary rollouts) for mastering novel tasks, progress in this emerging field is hindered due to the lack of a standa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  9. arXiv:2505.09990  [pdf, ps, other

    cs.CV

    PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

    Authors: Long Cheng, Jiafei Duan, Yi Ru Wang, Haoquan Fang, Boyang Li, Yushan Huang, Elvis Wang, Ainaz Eftekhar, Jason Lee, Wentao Yuan, Rose Hendrix, Noah A. Smith, Fei Xia, Dieter Fox, Ranjay Krishna

    Abstract: Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts, with applications spanning robotics, assistive technologies, and interactive AI systems. While recent multimodal models have started to support pointing capabilities, existing benchmarks typically focus only on referential object localization tasks. We introduce PointArena, a comprehensive platf… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 10 Pages, Dataset and code:https://pointarena.github.io/

  10. arXiv:2505.09986  [pdf, other

    cs.CV eess.IV

    High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

    Authors: Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terres… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  11. arXiv:2505.09924  [pdf, other

    cs.CL cs.CR

    From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

    Authors: Yidan Wang, Yubing Ren, Yanan Cao, Binxing Fang

    Abstract: The rise of Large Language Models (LLMs) has heightened concerns about the misuse of AI-generated text, making watermarking a promising solution. Mainstream watermarking schemes for LLMs fall into two categories: logits-based and sampling-based. However, current schemes entail trade-offs among robustness, text quality, and security. To mitigate this, we integrate logits-based and sampling-based sc… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  12. arXiv:2505.09921  [pdf, other

    cs.CR cs.CL

    PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

    Authors: Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang

    Abstract: Large Language Models (LLMs) excel in various domains but pose inherent privacy risks. Existing methods to evaluate privacy leakage in LLMs often use memorized prefixes or simple instructions to extract data, both of which well-alignment models can easily block. Meanwhile, Jailbreak attacks bypass LLM safety mechanisms to generate harmful content, but their role in privacy scenarios remains undere… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  13. arXiv:2505.09698  [pdf, ps, other

    cs.RO cs.AI

    ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

    Authors: Enyu Zhao, Vedant Raval, Hejia Zhang, Jiageng Mao, Zeyu Shangguan, Stefanos Nikolaidis, Yue Wang, Daniel Seita

    Abstract: Vision-Language Models (VLMs) have revolutionized artificial intelligence and robotics due to their commonsense reasoning capabilities. In robotic manipulation, VLMs are used primarily as high-level planners, but recent work has also studied their lower-level reasoning ability, which refers to making decisions about precise robot movements. However, the community currently lacks a clear and common… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 47 pages, 29 figures. Under review

  14. arXiv:2505.09655  [pdf, other

    cs.CL

    DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

    Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

    Abstract: Recent advances in reinforcement learning for language model post-training, such as Group Relative Policy Optimization (GRPO), have shown promise in low-resource settings. However, GRPO typically relies on solution-level and scalar reward signals that fail to capture the semantic diversity among sampled completions. This leads to what we identify as a diversity-quality inconsistency, where distinc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  15. arXiv:2505.09616  [pdf, other

    cs.SD cs.AI eess.AS

    SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

    Authors: Yuqi Li, Yuanzhong Zheng, Zhongtian Guo, Yaoxuan Wang, Jianjun Yin, Haojun Fei

    Abstract: This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2Vec2 for feature extraction and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and empha… ▽ More

    Submitted 10 January, 2025; originally announced May 2025.

    Comments: 2 pages,3 figures,1 chart

    MSC Class: I.2.0

  16. arXiv:2505.09615  [pdf, other

    cs.CV cs.SD eess.AS

    UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing

    Authors: Yung-Hsuan Lai, Janek Ebbers, Yu-Chiang Frank Wang, François Germain, Michael Jeffrey Jones, Moitreya Chatterjee

    Abstract: Audio-Visual Video Parsing (AVVP) entails the challenging task of localizing both uni-modal events (i.e., those occurring exclusively in either the visual or acoustic modality of a video) and multi-modal events (i.e., those occurring in both modalities concurrently). Moreover, the prohibitive cost of annotating training data with the class labels of all these events, along with their start and end… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  17. arXiv:2505.09451  [pdf, ps, other

    cs.AR

    SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision Support

    Authors: Haikang Diao, Haoyi Zhang, Jiahao Song, Haoyang Luo, Yibo Lin, Runsheng Wang, Yuan Wang, Xiyuan Tang

    Abstract: Digital computing-in-memory (DCIM) has been a popular solution for addressing the memory wall problem in recent years. However, the DCIM design still heavily relies on manual efforts, and the optimization of DCIM is often based on human experience. These disadvantages limit the time to market while increasing the design difficulty of DCIMs. This work proposes a design space exploration-guided auto… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  18. arXiv:2505.09424  [pdf, ps, other

    cs.RO

    Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion

    Authors: Han Sun, Yizhao Wang, Zhenning Zhou, Shuai Wang, Haibo Yang, Jingyuan Sun, Qixin Cao

    Abstract: Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  19. arXiv:2505.09422  [pdf, ps, other

    cs.CV

    MoRAL: Motion-aware Multi-Frame 4D Radar and LiDAR Fusion for Robust 3D Object Detection

    Authors: Xiangyuan Peng, Yu Wang, Miao Tang, Bierzynski Kay, Lorenzo Servadei, Robert Wille

    Abstract: Reliable autonomous driving systems require accurate detection of traffic participants. To this end, multi-modal fusion has emerged as an effective strategy. In particular, 4D radar and LiDAR fusion methods based on multi-frame radar point clouds have demonstrated the effectiveness in bridging the point density gap. However, they often neglect radar point clouds' inter-frame misalignment caused by… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  20. arXiv:2505.09343  [pdf, ps, other

    cs.DC cs.AI cs.AR

    Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

    Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei

    Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inferen… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version will appear as part of the Industry Track in Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25)

  21. arXiv:2505.09252  [pdf, ps, other

    cs.CV

    Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

    Authors: Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

    Abstract: Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH bi… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  22. arXiv:2505.09161  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG

    Bridging Theory and Experiment in Materials Discovery: Machine-Learning-Assisted Prediction of Synthesizable Structures

    Authors: Yu Xin, Peng Liu, Zhuohang Xie, Wenhui Mi, Pengyue Gao, Hong Jian Zhao, Jian Lv, Yanchao Wang, Yanming Ma

    Abstract: Even though thermodynamic energy-based crystal structure prediction (CSP) has revolutionized materials discovery, the energy-driven CSP approaches often struggle to identify experimentally realizable metastable materials synthesized through kinetically controlled pathways, creating a critical gap between theoretical predictions and experimental synthesis. Here, we propose a synthesizability-driven… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  23. arXiv:2505.09113  [pdf, other

    cs.LG stat.ME

    Sequential Treatment Effect Estimation with Unmeasured Confounders

    Authors: Yingrong Wang, Anpeng Wu, Baohong Li, Ziyang Xiao, Ruoxuan Xiong, Qing Han, Kun Kuang

    Abstract: This paper studies the cumulative causal effects of sequential treatments in the presence of unmeasured confounders. It is a critical issue in sequential decision-making scenarios where treatment decisions and outcomes dynamically evolve over time. Advanced causal methods apply transformer as a backbone to model such time sequences, which shows superiority in capturing long time dependence and per… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  24. arXiv:2505.09092  [pdf, ps, other

    cs.CV cs.RO

    OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions

    Authors: Yuhang Wang, Abdulaziz Alhuraish, Shengming Yuan, Hao Zhou

    Abstract: Lane Keeping Assist (LKA) is widely adopted in modern vehicles, yet its real-world performance remains underexplored due to proprietary systems and limited data access. This paper presents OpenLKA, the first open, large-scale dataset for LKA evaluation and improvement. It includes 400 hours of driving data from 50+ production vehicle models, collected through extensive road testing in Tampa, Flori… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  25. arXiv:2505.09085  [pdf

    cs.LG cs.AI

    Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision

    Authors: Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan

    Abstract: Recent advancements in deep neural networks (DNNs), particularly large-scale language models, have demonstrated remarkable capabilities in image and natural language understanding. Although scaling up model parameters with increasing volume of training data has progressively improved DNN capabilities, achieving complex cognitive abilities - such as understanding abstract concepts, reasoning, and a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  26. arXiv:2505.08961  [pdf, other

    cs.CV cs.LG

    Differentiable Channel Selection in Self-Attention For Person Re-Identification

    Authors: Yancheng Wang, Nebojsa Jojic, Yingzhen Yang

    Abstract: In this paper, we propose a novel attention module termed the Differentiable Channel Selection Attention module, or the DCS-Attention module. In contrast with conventional self-attention, the DCS-Attention module features selection of informative channels in the computation of the attention weights. The selection of the feature channels is performed in a differentiable manner, enabling seamless in… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  27. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  28. arXiv:2505.08822  [pdf, other

    cs.CY cs.LG physics.soc-ph

    The Geography of Transportation Cybersecurity: Visitor Flows, Industry Clusters, and Spatial Dynamics

    Authors: Yuhao Wang, Kailai Wang, Songhua Hu, Yunpeng, Zhang, Gino Lim, Pengyu Zhu

    Abstract: The rapid evolution of the transportation cybersecurity ecosystem, encompassing cybersecurity, automotive, and transportation and logistics sectors, will lead to the formation of distinct spatial clusters and visitor flow patterns across the US. This study examines the spatiotemporal dynamics of visitor flows, analyzing how socioeconomic factors shape industry clustering and workforce distribution… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  29. arXiv:2505.08814  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Understanding Deep Learning Model in Image Recognition via Coverage Test

    Authors: Wenkai Li, Xiaoqi Li, Yingjie Mao, Yishun Wang

    Abstract: Deep neural networks (DNNs) play a crucial role in the field of artificial intelligence, and their security-related testing has been a prominent research focus. By inputting test cases, the behavior of models is examined for anomalies, and coverage metrics are utilized to determine the extent of neurons covered by these test cases. With the widespread application and advancement of DNNs, different… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  30. arXiv:2505.08808  [pdf, ps, other

    cs.CV cs.AI

    SparseMeXT Unlocking the Potential of Sparse Representations for HD Map Construction

    Authors: Anqing Jiang, Jinhao Chai, Yu Gao, Yiru Wang, Yuwen Heng, Zhigang Sun, Hao Sun, Zezhong Zhao, Li Sun, Jian Zhou, Lijuan Zhu, Shugong Xu, Hao Zhao

    Abstract: Recent advancements in high-definition \emph{HD} map construction have demonstrated the effectiveness of dense representations, which heavily rely on computationally intensive bird's-eye view \emph{BEV} features. While sparse representations offer a more efficient alternative by avoiding dense BEV processing, existing methods often lag behind due to the lack of tailored designs. These limitations… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  31. arXiv:2505.08807  [pdf, other

    cs.CR cs.AI

    Security of Internet of Agents: Attacks and Countermeasures

    Authors: Yuntao Wang, Yanghe Pan, Shaolong Guo, Zhou Su

    Abstract: With the rise of large language and vision-language models, AI agents have evolved into autonomous, interactive systems capable of perception, reasoning, and decision-making. As they proliferate across virtual and physical domains, the Internet of Agents (IoA) has emerged as a key infrastructure for enabling scalable and secure coordination among heterogeneous agents. This survey offers a comprehe… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 11 pages, 5 figures, 3 tables, submitted to IEEE OJCS

  32. arXiv:2505.08705  [pdf, ps, other

    cs.CV cs.AI

    Controllable Image Colorization with Instance-aware Texts and Masks

    Authors: Yanru An, Ling Gui, Qiang Hu, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang, Yanfeng Wang

    Abstract: Recently, the application of deep learning in image colorization has received widespread attention. The maturation of diffusion models has further advanced the development of image colorization models. However, current mainstream image colorization models still face issues such as color bleeding and color binding errors, and cannot colorize images at the instance level. In this paper, we propose a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  33. arXiv:2505.08695  [pdf, other

    cs.CV

    SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model

    Authors: Zhanjie Zhang, Quanwei Zhang, Junsheng Luan, Mengyuan Yang, Yun Wang, Lei Zhao

    Abstract: Given an arbitrary content and style image, arbitrary style transfer aims to render a new stylized image which preserves the content image's structure and possesses the style image's style. Existing arbitrary style transfer methods are based on either small models or pre-trained large-scale models. The small model-based methods fail to generate high-quality stylized images, bringing artifact… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted by Neural Networks

  34. arXiv:2505.08687  [pdf, ps, other

    cs.LG cs.AI

    AC-PKAN: Attention-Enhanced and Chebyshev Polynomial-Based Physics-Informed Kolmogorov-Arnold Networks

    Authors: Hangwei Zhang, Zhimu Huang, Yan Wang

    Abstract: Kolmogorov-Arnold Networks (KANs) have recently shown promise for solving partial differential equations (PDEs). Yet their original formulation is computationally and memory intensive, motivating the introduction of Chebyshev Type-I-based KANs (Chebyshev1KANs). Although Chebyshev1KANs have outperformed the vanilla KANs architecture, our rigorous theoretical analysis reveals that they still suffer… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  35. arXiv:2505.08628  [pdf

    cs.AI cs.HC

    Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach

    Authors: Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang

    Abstract: Metabolic syndrome (MetS) is a medication condition characterized by abdominal obesity, insulin resistance, hypertension and hyperlipidemia. It increases the risk of majority of chronic diseases, including type 2 diabetes mellitus, and affects about one quarter of the global population. Therefore, early detection and timely intervention for MetS are crucial. Standard diagnosis for MetS components… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  36. arXiv:2505.08607  [pdf, other

    cs.CV

    Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World

    Authors: Yuran Wang, Yingping Liang, Ying Fu

    Abstract: Stereo matching methods rely on dense pixel-wise ground truth labels, which are laborious to obtain, especially for real-world datasets. The scarcity of labeled data and domain gaps between synthetic and real-world images also pose notable challenges. In this paper, we propose a novel framework, \textbf{BooSTer}, that leverages both vision foundation models and large-scale mixed image sources, inc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  37. arXiv:2505.08542  [pdf, other

    cs.AI

    Guiding LLM-based Smart Contract Generation with Finite State Machine

    Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang

    Abstract: Smart contract is a kind of self-executing code based on blockchain technology with a wide range of application scenarios, but the traditional generation method relies on manual coding and expert auditing, which has a high threshold and low efficiency. Although Large Language Models (LLMs) show great potential in programming tasks, they still face challenges in smart contract generation w.r.t. eff… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  38. arXiv:2505.08459  [pdf, ps, other

    cs.AI

    Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

    Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang

    Abstract: Efficiently modeling and exploiting opponents is a long-standing challenge in adversarial domains. Large Language Models (LLMs) trained on extensive textual data have recently demonstrated outstanding performance in general tasks, introducing new research directions for opponent modeling. Some studies primarily focus on directly using LLMs to generate decisions based on the elaborate prompt contex… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted to IJCNN 2025

  39. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  40. arXiv:2505.08402  [pdf, ps, other

    cs.CL

    TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers

    Authors: Aiyao He, Sijia Cui, Shuai Xu, Yanna Wang, Bo Xu

    Abstract: Recently, large language models(LLMs) have played an increasingly important role in solving a wide range of NLP tasks, leveraging their capabilities of natural language understanding and generating. Integration with external tools further enhances LLMs' effectiveness, providing more precise, timely, and specialized responses. However, LLMs still encounter difficulties with non-executable actions a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted to ICONIP 2024

  41. arXiv:2505.08316  [pdf, ps, other

    cs.CE cs.CV

    Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position Predictivity

    Authors: Dazhong Rong, Hao Dong, Xing Gao, Jiyu Wei, Di Hong, Yaoyao Hao, Qinming He, Yueming Wang

    Abstract: Based on the concept that ventral visual stream (VVS) mainly functions for object recognition, current unsupervised task-driven methods model VVS by contrastive learning, and have achieved good brain similarity. However, we believe functions of VVS extend beyond just object recognition. In this paper, we introduce an additional function involving VVS, named relative position (RP) prediction. We fi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted for full publication at CogSci 2025 (https://cognitivesciencesociety.org/cogsci-2025/)

  42. arXiv:2505.08239  [pdf, ps, other

    cs.GR cs.CV

    ACT-R: Adaptive Camera Trajectories for 3D Reconstruction from Single Image

    Authors: Yizhi Wang, Mingrui Zhao, Ali Mahdavi-Amiri, Hao Zhang

    Abstract: We introduce adaptive view planning to multi-view synthesis, aiming to improve both occlusion revelation and 3D consistency for single-view 3D reconstruction. Instead of generating an unordered set of views independently or simultaneously, we generate a sequence of views, leveraging temporal consistency to enhance 3D coherence. Most importantly, our view sequence is not determined by a pre-determi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  43. arXiv:2505.08220  [pdf

    cs.LG

    Deep Probabilistic Modeling of User Behavior for Anomaly Detection via Mixture Density Networks

    Authors: Lu Dai, Wenxuan Zhu, Xuehui Quan, Renzi Meng, Sheng Cai, Yichen Wang

    Abstract: To improve the identification of potential anomaly patterns in complex user behavior, this paper proposes an anomaly detection method based on a deep mixture density network. The method constructs a Gaussian mixture model parameterized by a neural network, enabling conditional probability modeling of user behavior. It effectively captures the multimodal distribution characteristics commonly presen… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  44. arXiv:2505.08168  [pdf, ps, other

    cs.CL cs.AI

    Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various graph-based augmentation techniques to train the node and text embeddings, while text-based augmentations are largely unexplored. In this paper, we propose Text Semantics… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  45. arXiv:2505.08162  [pdf, ps, other

    cs.CR

    GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM

    Authors: Hengyu Ding, Houran Ji, Jia Li, Jinhang Chen, Chin-Wing Sham, Yao Wang

    Abstract: With the rapid advancement of quantum computing technology, post-quantum cryptography (PQC) has emerged as a pivotal direction for next-generation encryption standards. Among these, lattice-based cryptographic schemes rely heavily on the fast Number Theoretic Transform (NTT) over polynomial rings, whose performance directly determines encryption/decryption throughput and energy efficiency. However… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  46. Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting

    Authors: Wenhan Lyu, Yimeng Wang, Yifan Sun, Yixuan Zhang

    Abstract: Generative AI (GenAI), especially Large Language Models (LLMs), is rapidly reshaping both programming workflows and computer science education. Many programmers now incorporate GenAI tools into their workflows, including for collaborative coding tasks such as pair programming. While prior research has demonstrated the benefits of traditional pair programming and begun to explore GenAI-assisted cod… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Accepted by Learning @ Scale 2025

  47. arXiv:2505.07895  [pdf, ps, other

    cs.LG cs.AI

    Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

    Authors: Jiafan Li, Jiaqi Zhu, Liang Chang, Yilin Li, Miaomiao Li, Yang Wang, Hongan Wang

    Abstract: Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban's movie networks and Amazon's product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either ea… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  48. arXiv:2505.07874  [pdf, ps, other

    cs.CL

    The Sound of Populism: Distinct Linguistic Features Across Populist Variants

    Authors: Yu Wang, Runxi Yu, Zhongyuan Wang, Jing He

    Abstract: This study explores the sound of populism by integrating the classic Linguistic Inquiry and Word Count (LIWC) features, which capture the emotional and stylistic tones of language, with a fine-tuned RoBERTa model, a state-of-the-art context-aware language model trained to detect nuanced expressions of populism. This approach allows us to uncover the auditory dimensions of political rhetoric in U.S… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  49. arXiv:2505.07754  [pdf

    q-bio.NC cs.CV

    Skeletonization of neuronal processes using Discrete Morse techniques from computational topology

    Authors: Samik Banerjee, Caleb Stam, Daniel J. Tward, Steven Savoia, Yusu Wang, Partha P. Mitra

    Abstract: To understand biological intelligence we need to map neuronal networks in vertebrate brains. Mapping mesoscale neural circuitry is done using injections of tracers that label groups of neurons whose axons project to different brain regions. Since many neurons are labeled, it is difficult to follow individual axons. Previous approaches have instead quantified the regional projections using the tota… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Under Review in Nature

  50. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.