Skip to main content

Showing 1–50 of 6,547 results for author: Zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05255  [pdf, ps, other

    cs.CV cs.CL

    Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

    Authors: Yana Wei, Liang Zhao, Jianjian Sun, Kangheng Lin, Jisheng Yin, Jingcheng Hu, Yinmin Zhang, En Yu, Haoran Lv, Zejia Weng, Jia Wang, Chunrui Han, Yuang Peng, Qi Han, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Vishal M. Patel

    Abstract: The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2.5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimoda… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.05081  [pdf, ps, other

    cs.AR

    ViPSN 2.0: A Reconfigurable Battery-free IoT Platform for Vibration Energy Harvesting

    Authors: Xin Li, Mianxin Xiao, Xi Shen, Jiaqing Chu, Weifeng Huang, Jiashun Li, Yaoyi Li, Mingjing Cai, Jiaming Chen, Xinming Zhang, Daxing Zhang, Congsi Wang, Hong Tang, Bao Zhao, Qitao Lu, Yilong Wang, Jianjun Wang, Minyi Xu, Shitong Fang, Xuanyu Huang. Chaoyang Zhao, Zicheng Liu, Yaowen Yang, Guobiao Hu, Junrui Liang, Wei-Hsin Liao

    Abstract: Vibration energy harvesting is a promising solution for powering battery-free IoT systems; however, the instability of ambient vibrations presents significant challenges, such as limited harvested energy, intermittent power supply, and poor adaptability to various applications. To address these challenges, this paper proposes ViPSN2.0, a modular and reconfigurable IoT platform that supports multip… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.04959  [pdf, ps, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Hear-Your-Click: Interactive Video-to-Audio Generation via Object-aware Contrastive Audio-Visual Fine-tuning

    Authors: Yingshan Liang, Keyu Fan, Zhicheng Du, Yiran Wang, Qingyang Shi, Xinyu Zhang, Jiasheng Lu, Peiwu Qin

    Abstract: Video-to-audio (V2A) generation shows great potential in fields such as film production. Despite significant advances, current V2A methods, which rely on global video information, struggle with complex scenes and often fail to generate audio tailored to specific objects or regions in the videos. To address these limitations, we introduce Hear-Your-Click, an interactive V2A framework that enables u… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.04880  [pdf

    cs.CV cs.AI

    HGNet: High-Order Spatial Awareness Hypergraph and Multi-Scale Context Attention Network for Colorectal Polyp Detection

    Authors: Xiaofang Liu, Lingling Sun, Xuqing Zhang, Yuannong Ye, Bin zhao

    Abstract: Colorectal cancer (CRC) is closely linked to the malignant transformation of colorectal polyps, making early detection essential. However, current models struggle with detecting small lesions, accurately localizing boundaries, and providing interpretable decisions. To address these issues, we propose HGNet, which integrates High-Order Spatial Awareness Hypergraph and Multi-Scale Context Attention.… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  5. arXiv:2507.04847  [pdf, ps, other

    cs.IT

    Fast and Provable Hankel Tensor Completion for Multi-measurement Spectral Compressed Sensing

    Authors: Jinsheng Li, Xu Zhang, Shuang Wu, Wei Cui

    Abstract: In this paper, we introduce a novel low-rank Hankel tensor completion approach to address the problem of multi-measurement spectral compressed sensing. By lifting the multiple signals to a Hankel tensor, we reformulate this problem into a low-rank Hankel tensor completion task, exploiting the spectral sparsity via the low multilinear rankness of the tensor. Furthermore, we design a scaled gradient… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  6. arXiv:2507.04789  [pdf, ps, other

    cs.RO

    Training-free Generation of Temporally Consistent Rewards from VLMs

    Authors: Yinuo Zhao, Jiale Yuan, Zhiyuan Xu, Xiaoshuai Hao, Xinyi Zhang, Kun Wu, Zhengping Che, Chi Harold Liu, Jian Tang

    Abstract: Recent advances in vision-language models (VLMs) have significantly improved performance in embodied tasks such as goal decomposition and visual comprehension. However, providing accurate rewards for robotic manipulation without fine-tuning VLMs remains challenging due to the absence of domain-specific robotic knowledge in pre-trained datasets and high computational costs that hinder real-time app… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  7. arXiv:2507.04752  [pdf, ps, other

    cs.CR cs.AI cs.NI

    Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions

    Authors: Shuo Yang, Xinran Zheng, Xinchen Zhang, Jinfeng Xu, Jinze Li, Donglin Xie, Weicai Long, Edith C. H. Ngai

    Abstract: Large Language Models (LLMs) have revolutionized various fields with their exceptional capabilities in understanding, processing, and generating human-like text. This paper investigates the potential of LLMs in advancing Network Intrusion Detection Systems (NIDS), analyzing current challenges, methodologies, and future opportunities. It begins by establishing a foundational understanding of NIDS a… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  8. arXiv:2507.04724  [pdf, ps, other

    cs.MA cs.AI

    Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems

    Authors: Yizhe Xie, Congcong Zhu, Xinyue Zhang, Minghao Wang, Chi Liu, Minglu Zhu, Tianqing Zhu

    Abstract: Multi-agent systems powered by Large Language Models (LLM-MAS) demonstrate remarkable capabilities in collaborative problem-solving. While LLM-MAS exhibit strong collaborative abilities, the security risks in their communication and coordination remain underexplored. We bridge this gap by systematically investigating intention-hiding threats in LLM-MAS, and design four representative attack paradi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  9. arXiv:2507.04607  [pdf, ps, other

    cs.CL cs.AI

    PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes

    Authors: Xinliang Frederick Zhang, Nick Beauchamp, Lu Wang

    Abstract: Large language model (LLM) personalization aims to align model outputs with individuals' unique preferences and opinions. While recent efforts have implemented various personalization methods, a unified theoretical framework that can systematically understand the drivers of effective personalization is still lacking. In this work, we integrate the well-established cognitive dual-memory model into… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  10. arXiv:2507.04377  [pdf, ps, other

    cs.CV cs.CL cs.MM

    Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions

    Authors: Xiao Zhang, Johan Bos

    Abstract: Tombstones are historically and culturally rich artifacts, encapsulating individual lives, community memory, historical narratives and artistic expression. Yet, many tombstones today face significant preservation challenges, including physical erosion, vandalism, environmental degradation, and political shifts. In this paper, we introduce a novel multi-modal framework for tombstones digitization,… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted by ACMMM 2025

  11. arXiv:2507.04302  [pdf, ps, other

    cs.CV cs.LG

    Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization

    Authors: Zuyu Zhang, Ning Chen, Yongshan Liu, Qinghua Zhang, Xu Zhang

    Abstract: Single Domain Generalization (SDG) aims to develop models capable of generalizing to unseen target domains using only one source domain, a task complicated by substantial domain shifts and limited data diversity. Existing SDG approaches primarily rely on data augmentation techniques, which struggle to effectively adapt training dynamics to accommodate large domain shifts. To address this, we propo… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025

  12. arXiv:2507.04107  [pdf, ps, other

    cs.CV

    VICI: VLM-Instructed Cross-view Image-localisation

    Authors: Xiaohan Zhang, Tavis Shore, Chen Chen, Oscar Mendez, Simon Hadfield, Safwan Wshah

    Abstract: In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic s… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  13. arXiv:2507.04008  [pdf, ps, other

    eess.IV cs.CV

    PASC-Net:Plug-and-play Shape Self-learning Convolutions Network with Hierarchical Topology Constraints for Vessel Segmentation

    Authors: Xiao Zhang, Zhuo Jin, Shaoxuan Wu, Fengyu Wang, Guansheng Peng, Xiang Zhang, Ying Huang, JingKun Chen, Jun Feng

    Abstract: Accurate vessel segmentation is crucial to assist in clinical diagnosis by medical experts. However, the intricate tree-like tubular structure of blood vessels poses significant challenges for existing segmentation algorithms. Small vascular branches are often overlooked due to their low contrast compared to surrounding tissues, leading to incomplete vessel segmentation. Furthermore, the c… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Journal ref: Biomedical Signal Processing and Control 2025

  14. arXiv:2507.03872  [pdf, ps, other

    eess.IV cs.CV

    PLUS: Plug-and-Play Enhanced Liver Lesion Diagnosis Model on Non-Contrast CT Scans

    Authors: Jiacheng Hao, Xiaoming Zhang, Wei Liu, Xiaoli Yin, Yuan Gao, Chunli Li, Ling Zhang, Le Lu, Yu Shi, Xu Han, Ke Yan

    Abstract: Focal liver lesions (FLL) are common clinical findings during physical examination. Early diagnosis and intervention of liver malignancies are crucial to improving patient survival. Although the current 3D segmentation paradigm can accurately detect lesions, it faces limitations in distinguishing between malignant and benign liver lesions, primarily due to its inability to differentiate subtle var… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025 (Early Accepted)

  15. arXiv:2507.03386  [pdf, ps, other

    cs.CV

    MRC-DETR: An Adaptive Multi-Residual Coupled Transformer for Bare Board PCB Defect Detection

    Authors: Jiangzhong Cao, Huanqi Wu, Xu Zhang, Lianghong Tan, Huan Zhang

    Abstract: In modern electronic manufacturing, defect detection on Printed Circuit Boards (PCBs) plays a critical role in ensuring product yield and maintaining the reliability of downstream assembly processes. However, existing methods often suffer from limited feature representation, computational redundancy, and insufficient availability of high-quality training data -- challenges that hinder their abilit… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  16. arXiv:2507.03295  [pdf, ps, other

    cs.CV

    CPKD: Clinical Prior Knowledge-Constrained Diffusion Models for Surgical Phase Recognition in Endoscopic Submucosal Dissection

    Authors: Xiangning Zhang, Jinnan Chen, Qingwei Zhang, Yaqi Wang, Chengfeng Zhou, Xiaobo Li, Dahong Qian

    Abstract: Gastrointestinal malignancies constitute a leading cause of cancer-related mortality worldwide, with advanced-stage prognosis remaining particularly dismal. Originating as a groundbreaking technique for early gastric cancer treatment, Endoscopic Submucosal Dissection has evolved into a versatile intervention for diverse gastrointestinal lesions. While computer-assisted systems significantly enhanc… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  17. arXiv:2507.03216  [pdf

    cs.CY cs.AI cs.DL cs.ET

    Disclosing Generative AI Use in Digital Humanities Research

    Authors: Rongqian Ma, Xuhan Zhang, Adrian Wisnicki

    Abstract: This survey study investigates how digital humanists perceive and approach generative AI disclosure in research. The results indicate that while digital humanities scholars acknowledge the importance of disclosing GenAI use, the actual rate of disclosure in research practice remains low. Respondents differ in their views on which activities most require disclosure and on the most appropriate metho… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  18. arXiv:2507.02912  [pdf, ps, other

    cs.LG cs.AI

    Multicollinearity Resolution Based on Machine Learning: A Case Study of Carbon Emissions

    Authors: Xuanming Zhang

    Abstract: This study proposes an analytical framework that integrates DBSCAN clustering with the Elastic Net regression model to address multifactorial problems characterized by structural complexity and multicollinearity, exemplified by carbon emissions analysis. DBSCAN is employed for unsupervised learning to objectively cluster features, while the Elastic Net is utilized for high-dimensional feature sele… ▽ More

    Submitted 24 June, 2025; originally announced July 2025.

    Comments: Vital Renew Update Based on Previous Version

  19. arXiv:2507.02870  [pdf, ps, other

    cs.CL

    Loki's Dance of Illusions: A Comprehensive Survey of Hallucination in Large Language Models

    Authors: Chaozhuo Li, Pengbo Wang, Chenxu Wang, Litian Zhang, Zheng Liu, Qiwei Ye, Yuanbo Xu, Feiran Huang, Xi Zhang, Philip S. Yu

    Abstract: Edgar Allan Poe noted, "Truth often lurks in the shadow of error," highlighting the deep complexity intrinsic to the interplay between truth and falsehood, notably under conditions of cognitive and informational asymmetry. This dynamic is strikingly evident in large language models (LLMs). Despite their impressive linguistic generation capabilities, LLMs sometimes produce information that appears… ▽ More

    Submitted 6 June, 2025; originally announced July 2025.

  20. arXiv:2507.02659  [pdf, ps, other

    cs.LG cs.CL

    OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

    Authors: Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, Xiaopeng Zhang

    Abstract: Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and ti… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  21. arXiv:2507.02447  [pdf, ps, other

    cs.RO

    HAC-LOCO: Learning Hierarchical Active Compliance Control for Quadruped Locomotion under Continuous External Disturbances

    Authors: Xiang Zhou, Xinyu Zhang, Qingrui Zhang

    Abstract: Despite recent remarkable achievements in quadruped control, it remains challenging to ensure robust and compliant locomotion in the presence of unforeseen external disturbances. Existing methods prioritize locomotion robustness over compliance, often leading to stiff, high-frequency motions, and energy inefficiency. This paper, therefore, presents a two-stage hierarchical learning framework that… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 8 pages, 7 Figures

  22. arXiv:2507.02419  [pdf, ps, other

    cs.CV

    AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars

    Authors: Yiming Zhong, Xiaolin Zhang, Ligang Liu, Yao Zhao, Yunchao Wei

    Abstract: Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance… ▽ More

    Submitted 7 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

  23. arXiv:2507.02399  [pdf, ps, other

    cs.CV cs.LG

    TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

    Authors: Peilin Zhang, Shaouxan Wua, Jun Feng, Zhuo Jin, Zhizezhang Gao, Jingkun Chen, Yaqiong Xing, Xiao Zhang

    Abstract: Background and objective: Medical image segmentation is a core task in various clinical applications. However, acquiring large-scale, fully annotated medical image datasets is both time-consuming and costly. Scribble annotations, as a form of sparse labeling, provide an efficient and cost-effective alternative for medical image segmentation. However, the sparsity of scribble annotations limits the… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Journal ref: Computer Methods and Programs in Biomedicine 2025

  24. arXiv:2507.02358  [pdf, ps, other

    cs.CV cs.AI

    Holistic Tokenizer for Autoregressive Image Generation

    Authors: Anlin Zheng, Haochen Wang, Yucheng Zhao, Weipeng Deng, Tiancai Wang, Xiangyu Zhang, Xiaojuan Qi

    Abstract: The vanilla autoregressive image generation model generates visual tokens in a step-by-step fashion, which limits the ability to capture holistic relationships among token sequences. Moreover, most visual tokenizers map local image patches into latent tokens, leading to limited global information. To address this, we introduce \textit{Hita}, a novel image tokenizer for autoregressive (AR) image ge… ▽ More

    Submitted 7 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

    Comments: 17 pages, 10 figures

  25. arXiv:2507.02350  [pdf, ps, other

    cs.HC

    From Coarse to Fine-Grained Emotion Annotation: An Immediate Recall Paradigm with Validation through Physiological Evidence and Recognition Performance

    Authors: Hao Tang, Songyun Xie, Xinzhou Xie, Can Liao, Xin Zhang, Bohan Li, Zhongyu Tian, Dalu Zheng

    Abstract: Traditional video-induced emotion physiological datasets often use whole-trial annotation, assigning a single emotion label to all data collected during an entire trial. This coarse-grained annotation approach misaligns with the dynamic and temporally localized nature of emotional responses as they unfold with video narratives, introducing label noise that limits emotion recognition algorithm eval… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  26. arXiv:2507.02345  [pdf, ps, other

    q-bio.BM cs.AI

    HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3

    Authors: Jie Gao, Jing Hu, Shanzhuo Zhang, Kunrui Zhu, Sheng Qian, Yueyang Huang, Xiaonan Zhang, Xiaomin Fang

    Abstract: Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction mo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  27. arXiv:2507.02307  [pdf, ps, other

    cs.CV

    Flow-CDNet: A Novel Network for Detecting Both Slow and Fast Changes in Bitemporal Images

    Authors: Haoxuan Li, Chenxu Wei, Haodong Wang, Xiaomeng Hu, Boyuan An, Lingyan Ran, Baosen Zhang, Jin Jin, Omirzhan Taukebayev, Amirkhan Temirbayev, Junrui Liu, Xiuwei Zhang

    Abstract: Change detection typically involves identifying regions with changes between bitemporal images taken at the same location. Besides significant changes, slow changes in bitemporal images are also important in real-life scenarios. For instance, weak changes often serve as precursors to major hazards in scenarios like slopes, dams, and tailings ponds. Therefore, designing a change detection network t… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 18 pages, 8 figures

  28. arXiv:2507.02187  [pdf, ps, other

    cs.HC

    VergeIO: Depth-Aware Eye Interaction on Glasses

    Authors: Xiyuxing Zhang, Duc Vu, Chengyi Shen, Yuntao Wang, Yuanchun Shi, Justin Chan

    Abstract: There is growing industry interest in creating unobtrusive designs for electrooculography (EOG) sensing of eye gestures on glasses (e.g. JINS MEME and Apple eyewear). We present VergeIO, the first EOG-based glasses that enables depth-aware eye interaction using vergence with an optimized electrode layout and novel smart glass prototype. It can distinguish between four and six depth-based eye gestu… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  29. arXiv:2507.02057  [pdf, ps, other

    cs.CR cs.AI

    MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation

    Authors: Lu Yan, Zhuo Zhang, Xiangzhe Xu, Shengwei An, Guangyu Shen, Zhou Xuan, Xuan Chen, Xiangyu Zhang

    Abstract: Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individ… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  30. arXiv:2507.02029  [pdf, ps, other

    cs.RO

    RoboBrain 2.0 Technical Report

    Authors: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Mengfei Du, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Junkai Zhao, Xiaojie Zhang, Sh/anyu Rong, Huaihai Lyu, Zhengliang Cai , et al. (26 additional authors not shown)

    Abstract: We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  31. arXiv:2507.01925  [pdf, ps, other

    cs.RO

    A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

    Authors: Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, Zhiquan Qi, Yitao Liang, Yuanpei Chen, Yaodong Yang

    Abstract: The remarkable advancements of vision and language foundation models in multimodal understanding, reasoning, and generation has sparked growing efforts to extend such intelligence to the physical world, fueling the flourishing of vision-language-action (VLA) models. Despite seemingly diverse approaches, we observe that current VLA models can be unified under a single framework: vision and language… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 70 pages, 5 figures

  32. arXiv:2507.01838  [pdf, ps, other

    cs.CV

    MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

    Authors: Hailong Yan, Ao Li, Xiangtao Zhang, Zhe Liu, Zenglin Shi, Ce Zhu, Le Zhang

    Abstract: Recent advancements in deep neural networks have driven significant progress in image enhancement (IE). However, deploying deep learning models on resource-constrained platforms, such as mobile devices, remains challenging due to high computation and memory demands. To address these challenges and facilitate real-time IE on mobile, we introduce an extremely lightweight Convolutional Neural Network… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  33. arXiv:2507.01635  [pdf, ps, other

    cs.CR

    EGNInfoLeaker: Unveiling the Risks of Public Key Reuse and User Identity Leakage in Blockchain

    Authors: Chenyu Li, Xueping Liang, Xiaorui Gong, Xiu Zhang

    Abstract: While Ethereum's discovery protocols (Discv4/ Discv5) incorporate robust cryptographic designs to protect user privacy, real-world deployment reveals critical vulnerabilities when users deviate from security guidelines. In this paper, we design a system called EGNInfoLeaker. Our study is the first work that uncovers widespread public key reuse across Ethereum's peer-to-peer networks - a practice t… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  34. arXiv:2507.01608  [pdf, ps, other

    cs.CV eess.IV

    Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

    Authors: Xu Zhang, Ming Lu, Yan Chen, Zhan Ma

    Abstract: In recent years, compressed domain semantic inference has primarily relied on learned image coding models optimized for mean squared error (MSE). However, MSE-oriented optimization tends to yield latent spaces with limited semantic richness, which hinders effective semantic inference in downstream tasks. Moreover, achieving high performance with these models often requires fine-tuning the entire v… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: International Conference on Multimedia and Expo (ICME), 2025

  35. A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

    Authors: Masood Jan, Wafa Njima, Xun Zhang

    Abstract: Location information serves as the fundamental element for numerous Internet of Things (IoT) applications. Traditional indoor localization techniques often produce significant errors and raise privacy concerns due to centralized data collection. In response, Machine Learning (ML) techniques offer promising solutions by capturing indoor environment variations. However, they typically require centra… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  36. arXiv:2507.01575  [pdf, ps, other

    eess.SP cs.LG

    Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability

    Authors: Masood Jan, Wafa Njima, Xun Zhang, Alexander Artemenko

    Abstract: Accurate indoor localization is crucial in industrial environments. Visible Light Communication (VLC) has emerged as a promising solution, offering high accuracy, energy efficiency, and minimal electromagnetic interference. However, VLC-based indoor localization faces challenges due to environmental variability, such as lighting fluctuations and obstacles. To address these challenges, we propose a… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted for publication in the IEEE VTC2025-Spring Conference, 7 pages

  37. arXiv:2507.01418  [pdf, ps, other

    cs.CY cs.AI

    Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing

    Authors: Inyoung Cheong, Alicia Guo, Mina Lee, Zhehui Liao, Kowe Kadoma, Dongyoung Go, Joseph Chee Chang, Peter Henderson, Mor Naaman, Amy X. Zhang

    Abstract: As AI integrates in various types of human writing, calls for transparency around AI assistance are growing. However, if transparency operates on uneven ground and certain identity groups bear a heavier cost for being honest, then the burden of openness becomes asymmetrical. This study investigates how AI disclosure statement affects perceptions of writing quality, and whether these effects vary b… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Presented at CHIWORK 2025 Workshop on Generative AI Disclosure, Ownership, and Accountability in Co-Creative Domains

    ACM Class: H.5.2; I.2

  38. arXiv:2507.01327  [pdf, ps, other

    cs.LG cs.AI

    Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

    Authors: Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Jiansong Chen, Ke Zeng, Xunliang Cai

    Abstract: Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value. In this work, we propose a novel Adaptive Perplexity-… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 15 pages, 6 figures, submitted to EMNLP

  39. arXiv:2507.00985  [pdf, ps, other

    cs.CL

    Discourse Heuristics For Paradoxically Moral Self-Correction

    Authors: Guangliang Liu, Zimo Qi, Xitong Zhang, Kristen Marie Johnson

    Abstract: Moral self-correction has emerged as a promising approach for aligning the output of Large Language Models (LLMs) with human moral values. However, moral self-correction techniques are subject to two primary paradoxes. First, despite empirical and theoretical evidence to support the effectiveness of self-correction, this LLM capability only operates at a superficial level. Second, while LLMs posse… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  40. arXiv:2507.00721  [pdf, ps, other

    cs.CV

    UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

    Authors: Xiao Zhang, Fei Wei, Yong Wang, Wenda Zhao, Feiyi Li, Xiangxiang Chu

    Abstract: Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain. Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge, exploiting their zero-shot learning capabilities. However, these methods primarily address domain distribution shifts and overlook the misalignment between the detection task and VLMs, which rely on m… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: ICCV2025

  41. arXiv:2507.00501  [pdf, ps, other

    cs.CV

    Laplace-Mamba: Laplace Frequency Prior-Guided Mamba-CNN Fusion Network for Image Dehazing

    Authors: Yongzhen Wang, Liangliang Chen, Bingwen Hu, Heng Liu, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: Recent progress in image restoration has underscored Spatial State Models (SSMs) as powerful tools for modeling long-range dependencies, owing to their appealing linear complexity and computational efficiency. However, SSM-based approaches exhibit limitations in reconstructing localized structures and tend to be less effective when handling high-dimensional data, frequently resulting in suboptimal… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 12 pages, 11 figures, 6 tables

  42. arXiv:2507.00373  [pdf, ps, other

    cs.CV eess.IV

    Customizable ROI-Based Deep Image Compression

    Authors: Jian Jin, Fanxin Xia, Feng Ding, Xinfeng Zhang, Meiqin Liu, Yao Zhao, Weisi Lin, Lili Meng

    Abstract: Region of Interest (ROI)-based image compression optimizes bit allocation by prioritizing ROI for higher-quality reconstruction. However, as the users (including human clients and downstream machine tasks) become more diverse, ROI-based image compression needs to be customizable to support various preferences. For example, different users may define distinct ROI or require different quality trade-… ▽ More

    Submitted 2 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

  43. arXiv:2507.00209  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.RO

    SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

    Authors: Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri

    Abstract: High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging a… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  44. arXiv:2507.00087  [pdf, ps, other

    cs.LG cs.AI

    pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation

    Authors: Jiale Zhao, Pengzhi Mao, Kaifei Wang, Yiming Li, Yaping Peng, Ranfei Chen, Shuqi Lu, Xiaohong Ji, Jiaxiang Ding, Xin Zhang, Yucheng Liao, Weinan E, Weijie Zhang, Han Wen, Hao Chi

    Abstract: Deep learning has advanced mass spectrometry data interpretation, yet most models remain feature extractors rather than unified scoring frameworks. We present pUniFind, the first large-scale multimodal pre-trained model in proteomics that integrates end-to-end peptide-spectrum scoring with open, zero-shot de novo sequencing. Trained on over 100 million open search-derived spectra, pUniFind aligns… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  45. arXiv:2506.23854  [pdf, ps, other

    cs.CV cs.GR

    HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity

    Authors: Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Xianpeng Lang

    Abstract: Neural surface reconstruction faces persistent challenges in reconciling geometric fidelity with photometric consistency under complex scene conditions. We present HiNeuS, a unified framework that holistically addresses three core limitations in existing approaches: multi-view radiance inconsistency, missing keypoints in textureless regions, and structural degradation from over-enforced Eikonal co… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Published in International Conference on Computer Vision (ICCV) 2025

  46. arXiv:2506.23563  [pdf, ps, other

    cs.AI cs.CL cs.CV

    MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

    Authors: Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao

    Abstract: Reasoning plays a crucial role in advancing Multimodal Large Language Models (MLLMs) toward Artificial General Intelligence. However, existing MLLM benchmarks often fall short in precisely and comprehensively evaluating long-chain reasoning abilities from three key aspects: (1) lack of difficulty and diversity, (2) susceptibility to guessability and memorization, (3) inadequate assessment of inter… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Technical report

  47. arXiv:2506.23481  [pdf, ps, other

    cs.CV eess.IV

    Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

    Authors: Xian Zhang, Xiang Cheng

    Abstract: Objectives: The rapid advancement of Multimodal Large Language Models (MLLMs) has significantly enhanced their reasoning capabilities, enabling a wide range of intelligent applications. However, these advancements also raise critical concerns regarding privacy and ethics. MLLMs are now capable of inferring the geographic location of images -- such as those shared on social media or captured from s… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  48. arXiv:2506.23325  [pdf, ps, other

    cs.SD cs.AI eess.AS

    XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

    Authors: Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

    Abstract: Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codec… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  49. arXiv:2506.23275  [pdf, ps, other

    cs.CV cs.AI

    Why Settle for One? Text-to-ImageSet Generation and Evaluation

    Authors: Chengyou Jia, Xin Shen, Zhuohang Dang, Zhuohang Dang, Changliang Xia, Weijia Wu, Xinyu Zhang, Hangwei Qian, Ivor W. Tsang, Minnan Luo

    Abstract: Despite remarkable progress in Text-to-Image models, many real-world applications require generating coherent image sets with diverse consistency requirements. Existing consistent methods often focus on a specific domain with specific aspects of consistency, which significantly constrains their generalizability to broader applications. In this paper, we propose a more challenging problem, Text-to-… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  50. arXiv:2506.23235  [pdf, ps, other

    cs.CL

    Generalist Reward Models: Found Inside Large Language Models

    Authors: Yi-Chen Li, Tian Xu, Yang Yu, Xuqin Zhang, Xiong-Hui Chen, Zhongxiang Ling, Ningjing Chao, Lei Yuan, Zhi-Hua Zhou

    Abstract: The alignment of Large Language Models (LLMs) is critically dependent on reward models trained on costly human preference data. While recent work explores bypassing this cost with AI feedback, these methods often lack a rigorous theoretical foundation. In this paper, we discover that a powerful generalist reward model is already latently present within any LLM trained via standard next-token predi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.