Skip to main content

Showing 1–50 of 411 results for author: Wan, J

.
  1. arXiv:2506.10328  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework

    Authors: Sadia Kamal, Tim Oates, Joy Wan

    Abstract: Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted at IEEE/CVF Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  2. arXiv:2506.07533  [pdf, ps, other

    cs.CV

    MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts

    Authors: Wei Tao, Haocheng Lu, Xiaoyang Qu, Bin Zhang, Kai Lu, Jiguang Wan, Jianzong Wang

    Abstract: One of the primary challenges in optimizing large language models (LLMs) for long-context inference lies in the high memory consumption of the Key-Value (KV) cache. Existing approaches, such as quantization, have demonstrated promising results in reducing memory usage. However, current quantization methods cannot take both effectiveness and efficiency into account. In this paper, we propose MoQAE,… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted by the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  3. arXiv:2506.05652  [pdf, ps, other

    math.RT

    Stability of the centers of group algebras of general affine groups $GA_n(q)$

    Authors: Jinkui Wan, Lan Zhou

    Abstract: The general affine group $GA_n(q)$ consisting of invertible affine transformations of an affine space of codimension one in the vector space $\mathbb{F}_q^n$ over a finite field $\mathbb{F}_q$, can be viewed as a subgroup of the general linear group $GL_{n}(q)$ over $\mathbb{F}_q$. In the article, we introduce the notion of the type of each matrix in $GA_n(q)$ and give an explicit representative f… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 32 pages

    MSC Class: 20G40; 05E15

  4. arXiv:2506.04982  [pdf, ps, other

    cs.RO

    GEX: Democratizing Dexterity with Fully-Actuated Dexterous Hand and Exoskeleton Glove

    Authors: Yunlong Dong, Xing Liu, Jun Wan, Zelin Deng

    Abstract: This paper introduces GEX, an innovative low-cost dexterous manipulation system that combines the GX11 tri-finger anthropomorphic hand (11 DoF) with the EX12 tri-finger exoskeleton glove (12 DoF), forming a closed-loop teleoperation framework through kinematic retargeting for high-fidelity control. Both components employ modular 3D-printed finger designs, achieving ultra-low manufacturing costs wh… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.03618  [pdf, ps, other

    cs.LG cs.AI

    GCFL: A Gradient Correction-based Federated Learning Framework for Privacy-preserving CPSS

    Authors: Jiayi Wan, Xiang Zhu, Fanzhen Liu, Wei Fan, Xiaolong Xu

    Abstract: Federated learning, as a distributed architecture, shows great promise for applications in Cyber-Physical-Social Systems (CPSS). In order to mitigate the privacy risks inherent in CPSS, the integration of differential privacy with federated learning has attracted considerable attention. Existing research mainly focuses on dynamically adjusting the noise added or discarding certain gradients to mit… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  6. arXiv:2506.02875  [pdf, ps, other

    cs.CV

    NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

    Authors: Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte , et al. (70 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking he… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: NTIRE 2025 XGC Quality Assessment Challenge Report. arXiv admin note: text overlap with arXiv:2404.16687

  7. arXiv:2506.02354  [pdf, ps, other

    cs.CV

    RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models

    Authors: Junjie Li, Nan Zhang, Xiaoyang Qu, Kai Lu, Guokuan Li, Jiguang Wan, Jianzong Wang

    Abstract: Object Navigation (ObjectNav) is a fundamental task in embodied artificial intelligence. Although significant progress has been made in semantic map construction and target direction prediction in current research, redundant exploration and exploration failures remain inevitable. A critical but underexplored direction is the timely termination of exploration to overcome these challenges. We observ… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted by the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  8. arXiv:2506.00475  [pdf, ps, other

    cs.CV

    BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation

    Authors: Wei Tao, Xiaoyang Qu, Kai Lu, Jiguang Wan, Shenglin He, Jianzong Wang

    Abstract: Since the point cloud data is inherently irregular and unstructured, point cloud semantic segmentation has always been a challenging task. The graph-based method attempts to model the irregular point cloud by representing it as a graph; however, this approach incurs substantial computational cost due to the necessity of constructing a graph for every point within a large-scale point cloud. In this… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by the 2025 International Joint Conference on Neural Networks (IJCNN 2025)

  9. arXiv:2505.24372  [pdf, ps, other

    cs.CV

    D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding

    Authors: Yichi Zhang, Gongwei Chen, Jun Zhu, Jia Wan

    Abstract: Visual Grounding is a task that aims to localize a target region in an image based on a free-form natural language description. With the rise of Transformer architectures, there is an increasing need for larger datasets to boost performance. However, the high cost of manual annotation poses a challenge, hindering the scale of data and the ability of large models to enhance their effectiveness. Pre… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 16pages, 8figures

  10. arXiv:2505.21822  [pdf, ps, other

    physics.optics eess.IV

    Compressive Fourier-Domain Intensity Coupling (C-FOCUS) enables near-millimeter deep imaging in the intact mouse brain in vivo

    Authors: Renzhi He, Yucheng Li, Brianna Urbina, Jiandi Wan, Yi Xue

    Abstract: Two-photon microscopy is a powerful tool for in vivo imaging, but its imaging depth is typically limited to a few hundred microns due to tissue scattering, even with existing scattering correction techniques. Moreover, most active scattering correction methods are restricted to small regions by the optical memory effect. Here, we introduce compressive Fourier-domain intensity coupling for scatteri… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  11. arXiv:2505.18039  [pdf, ps, other

    cs.CV

    Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation

    Authors: Li Zhong, Ahmed Ghazal, Jun-Jun Wan, Frederik Zilly, Patrick Mackens, Joachim E. Vollrath, Bogdan Sorin Coseriu

    Abstract: Foundation models like CLIP (Contrastive Language-Image Pretraining) have revolutionized vision-language tasks by enabling zero-shot and few-shot learning through cross-modal alignment. However, their computational complexity and large memory footprint make them unsuitable for deployment on resource-constrained edge devices, such as in-car cameras used for image collection and real-time processing… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  12. arXiv:2505.15784  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Large Language Models as Computable Approximations to Solomonoff Induction

    Authors: Jun Wan, Lingrui Mei

    Abstract: The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behaviors, existing theoretical frameworks remain fragmented in explaining emergent phenomena through a unified mathematical lens. We establish the first formal connection between LLM architectures and Alg… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Both authors contributed equally

  13. arXiv:2505.14892  [pdf, ps, other

    cs.CL cs.AI

    Scaling Laws for State Dynamics in Large Language Models

    Authors: Jacob X Li, Shreyas S Raman, Jessica Wan, Fahad Samman, Jazlyn Lin

    Abstract: Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 16 pages; 23 figures

    ACM Class: I.2.7; I.2.1; I.2.4; I.5.4

  14. arXiv:2505.13327  [pdf, other

    cs.CV

    Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning

    Authors: Ajian Liu, Haocheng Yuan, Xiao Guo, Hui Ma, Wanyi Zhuang, Changtao Miao, Yan Hong, Chuanbiao Song, Jun Lan, Qi Chu, Tao Gong, Yanyan Liang, Weiqiang Wang, Jun Wan, Xiaoming Liu, Zhen Lei

    Abstract: Presentation Attack Detection and Face Forgery Detection are designed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes respectively. But separate training of these two models makes them vulnerable to unknown attacks and burdens deployment environments. The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly… ▽ More

    Submitted 19 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  15. arXiv:2505.12240  [pdf, ps, other

    math.AP

    Dynamics and leapfrogging phenomena of multiple helical vortices for 3D incompressible Euler equations

    Authors: Daomin Cao, Junhong Fan, Guolin Qin, Jie Wan

    Abstract: In this paper, we investigate the time evolution of multiple interacted helical vortices without swirl for the incompressible Euler equations in $\mathbb R^3$. Assuming that the initial helical symmetric vorticity is concentrated within an $\ep$ neighborhood of $N$ distinct helices with vanishing mutual distance of order $O(\frac{1}{|\ln \ep|})$, and each vortex core possesses a vorticity mass of… ▽ More

    Submitted 5 June, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    MSC Class: Primary: 76B47; Secondary: 37N10

  16. arXiv:2505.08221  [pdf, ps, other

    eess.SP

    Performance Analysis of Cooperative Integrated Sensing and Communications for 6G Networks

    Authors: Dongsheng Sui, Cunhua Pan, Hong Ren, Jiahua Wan, Liuchang Zhuo, Jing Jin, Qixing Wang, Jiangzhou Wang

    Abstract: In this work, we aim to effectively characterize the performance of cooperative integrated sensing and communication (ISAC) networks and to reveal how performance metrics relate to network parameters. To this end, we introduce a generalized stochastic geometry framework to model the cooperative ISAC networks, which approximates the spatial randomness of the network deployment. Based on this framew… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  17. arXiv:2505.05192  [pdf, other

    cs.LG

    Long-Term Individual Causal Effect Estimation via Identifiable Latent Representation Learning

    Authors: Ruichu Cai, Junjie Wan, Weilin Chen, Zeqin Yang, Zijian Li, Peng Zhen, Jiecheng Guo

    Abstract: Estimating long-term causal effects by combining long-term observational and short-term experimental data is a crucial but challenging problem in many real-world scenarios. In existing methods, several ideal assumptions, e.g. latent unconfoundedness assumption or additive equi-confounding bias assumption, are proposed to address the latent confounder problem raised by the observational data. Howev… ▽ More

    Submitted 8 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  18. arXiv:2505.05148  [pdf, ps, other

    cs.CL

    A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

    Authors: Hussain Ahmad, Qingyang Zeng, Jing Wan

    Abstract: The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of anno… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 16 pages, 5 figures. Preprint

  19. arXiv:2505.03126  [pdf, ps, other

    cond-mat.mes-hall

    The unidirectional Seebeck detection of the Néel vector in the two-dimensional tetragonal $\mathcal{PT}$-symmetric antiferromagnetic materials

    Authors: Ya-Ting Xiao, Ying-Li Wu, Jia-Liang Wan, Xiao-Qin Yu

    Abstract: The efficient detection of the reversal (180$^{\circ}$ rotation) of the Néel vector is one of the crucial tasks in antiferromagnetic spintronics. Here, we propose a thermal approach to detect the reversal of the Néel vector in the tetragonal $\mathcal{PT}$ antiferromagnetic materials through the unidirectional Seebeck effect (USE). Being different from the previous works in which USE stems from th… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 9 pages, 2 figures,Accepted by PRB

  20. arXiv:2504.19976  [pdf, ps, other

    math.AP gr-qc math.DG

    Formation of trapped surfaces for the Einstein--Maxwell--charged scalar field system

    Authors: Dawei Shen, Jingbo Wan

    Abstract: In this paper, we prove a scale-critical trapped surface formation result for the Einstein--Maxwell--charged scalar field (EMCSF) system, without any symmetry assumptions. Specifically, we establish a scale-critical semi-global existence theorem from past null infinity and show that the focusing of gravitational waves, the concentration of electromagnetic fields, or the condensation of complex sca… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 99 pages, 3 figures. All comments are welcome

  21. arXiv:2504.18576  [pdf, other

    cs.RO

    DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment

    Authors: Xiaofan Li, Chenming Wu, Zhao Yang, Zhihao Xu, Dingkang Liang, Yumeng Zhang, Ji Wan, Jun Wang

    Abstract: This paper presents DriVerse, a generative model for simulating navigation-driven driving scenes from a single image and a future trajectory. Previous autonomous driving world models either directly feed the trajectory or discrete control signals into the generation pipeline, leading to poor alignment between the control inputs and the implicit features of the 2D base generative model, which resul… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures

  22. arXiv:2504.17144  [pdf

    physics.optics physics.app-ph

    Physics-informed Transformer Model for the Design of Wavelength-filtering Ring Resonator

    Authors: Yu Dian Lim, Feng Shuo Wan, Ren Jie Wan, Chuan Seng Tan

    Abstract: We have developed a physics-informed transformer model to suggest design parameters in wavelength-filtering ring resonator, that suit a given pair of resonant wavelengths with <6 nm errors. The model provides a versatile method for rapid and accurate design of resonators corresponding to various resonant wavelengths.

    Submitted 23 April, 2025; originally announced April 2025.

  23. arXiv:2504.15918  [pdf, other

    cs.CV cs.AI cs.HC

    Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions

    Authors: Chang Zong, Bin Li, Shoujun Zhou, Jian Wan, Lei Zhang

    Abstract: Locating specific segments within an instructional video is an efficient way to acquire guiding knowledge. Generally, the task of obtaining video segments for both verbal explanations and visual demonstrations is known as visual answer localization (VAL). However, users often need multiple interactions to obtain answers that align with their expectations when using the system. During these interac… ▽ More

    Submitted 22 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 16 pages, 8 figures

    MSC Class: 68T45; 68T20

  24. arXiv:2504.15609  [pdf, other

    cs.CV

    SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking

    Authors: Yunfeng Li, Bo Wang, Jiahao Wan, Xueyi Wu, Ye Li

    Abstract: Underwater observation systems typically integrate optical cameras and imaging sonar systems. When underwater visibility is insufficient, only sonar systems can provide stable data, which necessitates exploration of the underwater acoustic object tracking (UAOT) task. Previous studies have explored traditional methods and Siamese networks for UAOT. However, the absence of a unified evaluation benc… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  25. arXiv:2504.15517  [pdf, other

    cs.RO cs.LG

    Few-Shot Vision-Language Action-Incremental Policy Learning

    Authors: Mingchen Song, Xiang Deng, Guoqiang Zhong, Qi Lv, Jia Wan, Yinchuan Li, Jianye Hao, Weili Guan

    Abstract: Recently, Transformer-based robotic manipulation methods utilize multi-view spatial representations and language instructions to learn robot motion trajectories by leveraging numerous robot demonstrations. However, the collection of robot data is extremely challenging, and existing methods lack the capability for continuous learning on new tasks with only a few demonstrations. In this paper, we fo… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.14597  [pdf, other

    cs.CL

    a1: Steep Test-time Scaling Law via Environment Augmented Generation

    Authors: Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi, Yuyao Ge, Jun Wan, Yurong Wu, Xueqi Cheng

    Abstract: Large Language Models (LLMs) have made remarkable breakthroughs in reasoning, yet continue to struggle with hallucinations, logical errors, and inability to self-correct during complex multi-step tasks. Current approaches like chain-of-thought prompting offer limited reasoning capabilities that fail when precise step validation is required. We propose Environment Augmented Generation (EAG), a fram… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  27. arXiv:2504.09819  [pdf, other

    cs.CV

    Density-based Object Detection in Crowded Scenes

    Authors: Chenyang Zhao, Jia Wan, Antoni B. Chan

    Abstract: Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses obj… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  28. arXiv:2504.09504  [pdf, other

    cs.CL

    MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs

    Authors: Wei Tao, Xiaoyang Qu, Kai Lu, Jiguang Wan, Guokuan Li, Jianzong Wang

    Abstract: When applying pre-trained large language models (LLMs) to address anomaly detection tasks, the multivariate time series (MTS) modality of anomaly detection does not align with the text modality of LLMs. Existing methods simply transform the MTS data into multiple univariate time series sequences, which can cause many problems. This paper introduces MADLLM, a novel multivariate anomaly detection me… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE International Conference on Multimedia & Expo 2025 (ICME 2025)

  29. arXiv:2504.05281  [pdf, other

    astro-ph.GA

    Decoding the variability in the star-formation histories of z ~ 0.8 galaxies

    Authors: Jenny T. Wan, Sandro Tacchella, Francesco D'Eugenio, Benjamin D. Johnson, Arjen van der Wel

    Abstract: The scatter of the star-forming main sequence (SFMS) holds a wealth of information about how galaxies evolve. The timescales encoded in this scatter can provide valuable insight into the relative importance of the physical processes regulating star formation. In this paper, we present a detailed observational analysis of the timescales imprinted in galaxy star-formation history (SFH) fluctuations… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Submitted to MNRAS; comments welcome

  30. arXiv:2504.00458  [pdf, other

    cs.CV

    Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection

    Authors: Shunxin Chen, Ajian Liu, Junze Zheng, Jun Wan, Kailai Peng, Sergio Escalera, Zhen Lei

    Abstract: Facial recognition systems in real-world scenarios are susceptible to both digital and physical attacks. Previous methods have attempted to achieve classification by learning a comprehensive feature space. However, these methods have not adequately accounted for the inherent characteristics of physical and digital attack data, particularly the large intra class variation in attacks and the small i… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 9 pages, 5 figures, accepted by AAAI-2025 (Oral)

  31. arXiv:2504.00454  [pdf, other

    cs.CV

    FA^{3}-CLIP: Frequency-Aware Cues Fusion and Attack-Agnostic Prompt Learning for Unified Face Attack Detection

    Authors: Yongze Li, Ning Li, Ajian Liu, Hui Ma, Liying Yang, Xihong Chen, Zhiyao Liang, Yanyan Liang, Jun Wan, Zhen Lei

    Abstract: Facial recognition systems are vulnerable to physical (e.g., printed photos) and digital (e.g., DeepFake) face attacks. Existing methods struggle to simultaneously detect physical and digital attacks due to: 1) significant intra-class variations between these attack types, and 2) the inadequacy of spatial information alone to comprehensively capture live and fake cues. To address these issues, we… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures

  32. arXiv:2503.23866  [pdf, other

    cs.CR cs.LG

    A Channel-Triggered Backdoor Attack on Wireless Semantic Image Reconstruction

    Authors: Jialin Wan, Jinglong Shen, Nan Cheng, Zhisheng Yin, Yiliang Liu, Wenchao Xu, Xuemin, Shen

    Abstract: This paper investigates backdoor attacks in image-oriented semantic communications. The threat of backdoor attacks on symbol reconstruction in semantic communication (SemCom) systems has received limited attention. Previous research on backdoor attacks targeting SemCom symbol reconstruction primarily focuses on input-level triggers, which are impractical in scenarios with strict input constraints.… ▽ More

    Submitted 20 May, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  33. arXiv:2503.23294  [pdf, other

    cs.CL

    Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference

    Authors: Wei Tao, Bin Zhang, Xiaoyang Qu, Jiguang Wan, Jianzong Wang

    Abstract: Recently, large language models (LLMs) have been able to handle longer and longer contexts. However, a context that is too long may cause intolerant inference latency and GPU memory usage. Existing methods propose mixed-precision quantization to the key-value (KV) cache in LLMs based on token granularity, which is time-consuming in the search process and hardware inefficient during computation. Th… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: Accepted by the Design, Automation, and Test in Europe 2025 (DATE 2025)

  34. arXiv:2503.22291  [pdf, other

    cs.CV

    VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection

    Authors: Bin Zhang, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang

    Abstract: As object detectors are increasingly deployed as black-box cloud services or pre-trained models with restricted access to the original training data, the challenge of zero-shot object-level out-of-distribution (OOD) detection arises. This task becomes crucial in ensuring the reliability of detectors in open-world settings. While existing methods have demonstrated success in image-level OOD detecti… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 5 pages, 4 figures

  35. arXiv:2503.22285  [pdf, other

    cs.CV

    RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations

    Authors: Bin Zhang, Jinggang Chen, Xiaoyang Qu, Guokuan Li, Kai Lu, Jiguang Wan, Jing Xiao, Jianzong Wang

    Abstract: Enabling object detectors to recognize out-of-distribution (OOD) objects is vital for building reliable systems. A primary obstacle stems from the fact that models frequently do not receive supervisory signals from unfamiliar data, leading to overly confident predictions regarding OOD objects. Despite previous progress that estimates OOD uncertainty based on the detection model and in-distribution… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 9 pages, 5 figures

  36. arXiv:2503.15337  [pdf, other

    cs.CV

    Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

    Authors: Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei

    Abstract: Identifying multiple novel classes in an image, known as open-vocabulary multi-label recognition, is a challenging task in computer vision. Recent studies explore the transfer of powerful vision-language models such as CLIP. However, these approaches face two critical challenges: (1) The local semantics of CLIP are disrupted due to its global pre-training objectives, resulting in unreliable region… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  37. arXiv:2503.10701  [pdf, other

    cs.CV cs.RO

    Video Individual Counting for Moving Drones

    Authors: Yaowu Fan, Jia Wan, Tao Han, Antoni B. Chan, Andy J. Ma

    Abstract: Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance. Existing works are limited in two aspects, i.e., dataset and method. Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals, restricting evaluation for a highly varying view and time in crowded scenes. Whi… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  38. arXiv:2503.08367  [pdf, other

    cs.CV

    Embodied Crowd Counting

    Authors: Runling Long, Yunlong Wang, Jia Wan, Xiang Deng, Xinting Zhu, Weili Guan, Antoni B. Chan, Liqiang Nie

    Abstract: Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navig… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  39. arXiv:2503.01643  [pdf, ps, other

    math.NA

    Error estimates of asymptotic-preserving neural networks in approximating stochastic linearized Boltzmann equation

    Authors: Jiayu Wan, Liu Liu

    Abstract: In this paper, we construct an asymptotic-preserving neural networks (APNNs) [21] for the linearized Boltzmann equation in the acoustic scaling and with uncertain parameters. Utilizing the micro-macro decomposition, we design the loss function based on the stochastic-Galerkin system conducted from the micro-macro equations. Rigorous analysis is provided to show the capability of neural networks in… ▽ More

    Submitted 23 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    MSC Class: 35Q20; 68T07; 82C40; 65F99

  40. arXiv:2503.00619  [pdf, other

    cs.IR cs.AI cs.LG

    PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery

    Authors: Faye Zhang, Jasmine Wan, Qianyu Cheng, Jinfeng Rao

    Abstract: Online platforms like Pinterest hosting vast content collections traditionally rely on manual curation or user-generated search logs to create keyword landing pages (KLPs) -- topic-centered collection pages that serve as entry points for content discovery. While manual curation ensures quality, it doesn't scale to millions of collections, and search log approaches result in limited topic coverage… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  41. arXiv:2502.18960  [pdf, other

    cs.LG

    Nonparametric Heterogeneous Long-term Causal Effect Estimation via Data Combination

    Authors: Weilin Chen, Ruichu Cai, Junjie Wan, Zeqin Yang, José Miguel Hernández-Lobato

    Abstract: Long-term causal inference has drawn increasing attention in many scientific domains. Existing methods mainly focus on estimating average long-term causal effects by combining long-term observational data and short-term experimental data. However, it is still understudied how to robustly and effectively estimate heterogeneous long-term causal effects, significantly limiting practical applications.… ▽ More

    Submitted 2 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  42. arXiv:2502.16161  [pdf, other

    cs.CV cs.CL

    OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

    Authors: Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai

    Abstract: Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding and the emergence of large language models capable of processing document-based questions. While various methods have been proposed to tackle the complexities of VsTP, existing solutions often rely on task-specific architectures and objectives for individu… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  43. arXiv:2502.13923  [pdf, other

    cs.CV cs.CL

    Qwen2.5-VL Technical Report

    Authors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang , et al. (2 additional authors not shown)

    Abstract: We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the world through enhanced visual recognition, precise object localization, robust document parsing, and long-video comprehensio… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  44. arXiv:2502.02827  [pdf, other

    cs.SE

    COFFE: A Code Efficiency Benchmark for Code Generation

    Authors: Yun Peng, Jun Wan, Yichen Li, Xiaoxue Ren

    Abstract: Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions given detailed descriptions in natural language. Many research efforts are being devoted to improving the correctness of LLM-generated code, and many benchmarks are proposed to evaluate the correctne… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: This paper has been accepted by FSE 2025

  45. arXiv:2501.12746  [pdf, other

    cs.CL cs.AI

    EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering

    Authors: Chang Zong, Jian Wan, Siliang Tang, Lei Zhang

    Abstract: When addressing professional questions in the biomedical domain, humans typically acquire multiple pieces of information as evidence and engage in multifaceted analysis to provide high-quality answers. Current LLM-based question answering methods lack a detailed definition and learning process for evidence analysis, leading to the risk of error propagation and hallucinations while using evidence.… ▽ More

    Submitted 13 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: 13 pages, 6 figures

    MSC Class: 68T50

  46. arXiv:2501.06802  [pdf, other

    cs.AI

    Unifying Two Types of Scaling Laws from the Perspective of Conditional Kolmogorov Complexity

    Authors: Jun Wan

    Abstract: In 2020, OpenAI proposed the first type of Scaling Laws, describing the relationships between model loss and the scale of parameters, data, and training computation. In 2024, OpenAI proposed the second type of Scaling Laws, describing the relationship between model inference performance and inference computation. In this paper, we analyze LLMs training and inference processes from the perspective… ▽ More

    Submitted 10 February, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  47. arXiv:2501.06763  [pdf, ps, other

    math.RT

    On representation theory of cyclotomic Hecke-Clifford algebras

    Authors: Lei Shi, Jinkui Wan

    Abstract: In this article, we give an explicit construction of the simple modules for both non-degenerate and degenerate cyclotomic Hecke-Clifford superalgebras over an algebraically closed field of characteristic not equal to $2$ under certain condition in terms of parameters in defining these algebras. As an application, we obtain a sufficient condition on the semi-simplicity of these cyclotomic Hecke-Cli… ▽ More

    Submitted 26 March, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

    Comments: 34 pages. We fixed several typos and modified the definition of separate condition

    Report number: MPIM-Bonn-2025

  48. arXiv:2501.05698  [pdf, ps, other

    cond-mat.mes-hall

    Extrinsic nonlinear acoustic valley Hall effect in the massive Dirac materials

    Authors: Jia-Liang Wan, Ying-Li Wu, Ke-Qiu Chen, Xiao-Qin Yu

    Abstract: The nonlinear acoustic valley Hall effect (AVHE), a recently discovered novel acoustically driven phenomena, has sparked extensive interests in valleytronics. So far, only the intrinsic contributions from band structure (Berry curvature or asymmetric energy dispersions) to nonlinear AVHE have been investigated. Here, we theoretically investigate the nonlinear AVHE from both intrinsic and extrinsic… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 16 pages,3 figures, accepted for publication in Physical Review B

  49. arXiv:2412.20119  [pdf, other

    gr-qc math-ph math.AP math.DG

    A canonical foliation on null infinity in perturbations of Kerr

    Authors: Sergiu Klainerman, Dawei Shen, Jingbo Wan

    Abstract: Kerr stability for small angular momentum has been proved in the series of works by Klainerman-Szeftel, Giorgi-Klainerman-Szeftel and Shen. Some of the most basic conclusions of the result, concerning various physical quantities on the future null infinity are derived in the work of Klainerman-Szeftel. Further important conclusions were later derived in An-He-Shen and Chen-Klainerman. In this pape… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

    Comments: 82 pages, 5 figures, 2 tables. Comments are welcome

  50. arXiv:2412.15632  [pdf, other

    cs.CV

    A New Method to Capturing Compositional Knowledge in Linguistic Space

    Authors: Jiahe Wan

    Abstract: Compositional understanding allows visual language models to interpret complex relationships between objects, attributes, and relations in images and text. However, most existing methods often rely on hard negative examples and fine-tuning, which can overestimate improvements and are limited by the difficulty of obtaining hard negatives. In this work, we introduce Zero-Shot Compositional Understan… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.