Skip to main content

Showing 1–50 of 284 results for author: Gong, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03394  [pdf, ps, other

    cs.CV

    Learning Normals of Noisy Points by Local Gradient-Aware Surface Filtering

    Authors: Qing Li, Huifang Feng, Xun Gong, Yu-Shen Liu

    Abstract: Estimating normals for noisy point clouds is a persistent challenge in 3D geometry processing, particularly for end-to-end oriented normal estimation. Existing methods generally address relatively clean data and rely on supervised priors to fit local surfaces within specific neighborhoods. In this paper, we propose a novel approach for learning normals from noisy point clouds through local gradien… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025. Code: https://github.com/LeoQLi/LGSF

  2. arXiv:2507.01635  [pdf, ps, other

    cs.CR

    EGNInfoLeaker: Unveiling the Risks of Public Key Reuse and User Identity Leakage in Blockchain

    Authors: Chenyu Li, Xueping Liang, Xiaorui Gong, Xiu Zhang

    Abstract: While Ethereum's discovery protocols (Discv4/ Discv5) incorporate robust cryptographic designs to protect user privacy, real-world deployment reveals critical vulnerabilities when users deviate from security guidelines. In this paper, we design a system called EGNInfoLeaker. Our study is the first work that uncovers widespread public key reuse across Ethereum's peer-to-peer networks - a practice t… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2506.23440  [pdf, ps, other

    cs.CV

    PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

    Authors: Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai, Yunjie Tian, Samuel Border, Nan Xi, Pinaki Sarder, Junsong Yuan, David Doermann, Xuan Gong

    Abstract: Diffusion-based generative models have shown promise in synthesizing histopathology images to address data scarcity caused by privacy constraints. Diagnostic text reports provide high-level semantic descriptions, and masks offer fine-grained spatial structures essential for representing distinct morphological regions. However, public datasets lack paired text and mask data for the same histopathol… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  4. arXiv:2506.09443  [pdf, ps, other

    cs.CR

    LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

    Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji

    Abstract: Large Language Models (LLMs) have demonstrated remarkable intelligence across various tasks, which has inspired the development and widespread adoption of LLM-as-a-Judge systems for automated model testing, such as red teaming and benchmarking. However, these systems are susceptible to adversarial attacks that can manipulate evaluation outcomes, raising concerns about their robustness and, consequ… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  5. arXiv:2506.08986  [pdf, ps, other

    cs.CL

    Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder

    Authors: Yuejiao Wang, Xianmin Gong, Xixin Wu, Patrick Wong, Hoi-lam Helene Fung, Man Wai Mak, Helen Meng

    Abstract: Early detection is crucial for timely intervention aimed at preventing and slowing the progression of neurocognitive disorder (NCD), a common and significant health problem among the aging population. Recent evidence has suggested that language-related functional magnetic resonance imaging (fMRI) may be a promising approach for detecting cognitive decline and early NCD. In this paper, we proposed… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 5 pages,3 figures, accepted by ISCSLP 2024

  6. arXiv:2506.03548  [pdf, ps, other

    cs.AI

    SUMO-MCP: Leveraging the Model Context Protocol for Autonomous Traffic Simulation and Optimization

    Authors: Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, Yisheng Lv

    Abstract: Traffic simulation tools, such as SUMO, are essential for urban mobility research. However, such tools remain challenging for users due to complex manual workflows involving network download, demand generation, simulation setup, and result analysis. In this paper, we introduce SUMO-MCP, a novel platform that not only wraps SUMO' s core utilities into a unified tool suite but also provides addition… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  7. arXiv:2506.02854  [pdf, ps, other

    cs.CV

    Hierarchical Self-Prompting SAM: A Prompt-Free Medical Image Segmentation Framework

    Authors: Mengmeng Zhang, Xingyuan Dai, Yicheng Sun, Jing Wang, Yueyang Yao, Xiaoyan Gong, Fuze Cong, Feiyue Wang, Yisheng Lv

    Abstract: Although the Segment Anything Model (SAM) is highly effective in natural image segmentation, it requires dependencies on prompts, which limits its applicability to medical imaging where manual prompts are often unavailable. Existing efforts to fine-tune SAM for medical segmentation typically struggle to remove this dependency. We propose Hierarchical Self-Prompting SAM (HSP-SAM), a novel self-prom… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  8. arXiv:2505.23410  [pdf, ps, other

    cs.CL

    From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs

    Authors: Xuan Gong, Hanbo Huang, Shiyu Liang

    Abstract: Factual knowledge extraction aims to explicitly extract knowledge parameterized in pre-trained language models for application in downstream tasks. While prior work has been investigating the impact of supervised fine-tuning data on the factuality of large language models (LLMs), its mechanism remains poorly understood. We revisit this impact through systematic experiments, with a particular focus… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: The code of this paper will be released soon

  9. arXiv:2505.22067  [pdf, ps, other

    cs.CV cs.AI cs.RO

    From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving

    Authors: Xinyu Xia, Xingjun Ma, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong

    Abstract: Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textb… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  10. arXiv:2505.19624  [pdf

    cs.CV cs.AI

    Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat

    Authors: Pusheng Xu, Xia Gong, Xiaolan Chen, Weiyi Zhang, Jiancheng Yang, Bingjie Yan, Meng Yuan, Yalin Zheng, Mingguang He, Danli Shi

    Abstract: Purpose: To develop a bilingual multimodal visual question answering (VQA) benchmark for evaluating VLMs in ophthalmology. Methods: Ophthalmic image posts and associated captions published between January 1, 2016, and December 31, 2024, were collected from WeChat Official Accounts. Based on these captions, bilingual question-answer (QA) pairs in Chinese and English were generated using GPT-4o-mini… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  11. arXiv:2505.19179  [pdf, ps, other

    cs.SD eess.AS eess.SP

    BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM

    Authors: Xun Gong, Anqi Lv, Zhiming Wang, Huijia Zhu, Yanmin Qian

    Abstract: While speech large language models (SpeechLLMs) have advanced standard automatic speech recognition (ASR), contextual biasing for named entities and rare words remains challenging, especially at scale. To address this, we propose BR-ASR: a Bias Retrieval framework for large-scale contextual biasing (up to 200k entries) via two innovations: (1) speech-and-bias contrastive learning to retrieve seman… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted by InterSpeech 2025

  12. arXiv:2505.17038  [pdf, other

    cs.CL cs.SI

    Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion

    Authors: Xian Gong, Paul X. McCarthy, Lin Tian, Marian-Andrei Rizoiu

    Abstract: Massive and diverse web data are increasingly vital for government disaster response, as demonstrated by the 2022 floods in New South Wales (NSW), Australia. This study examines how X (formerly Twitter) and public inquiry submissions provide insights into public behaviour during crises. We analyse more than 55,000 flood-related tweets and 1,450 submissions to identify behavioural patterns during e… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  13. arXiv:2505.10774  [pdf, ps, other

    cs.LG cs.AI

    Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting

    Authors: Yueyang Yao, Jiajun Li, Xingyuan Dai, MengMeng Zhang, Xiaoyan Gong, Fei-Yue Wang, Yisheng Lv

    Abstract: Time series forecasting is important for applications spanning energy markets, climate analysis, and traffic management. However, existing methods struggle to effectively integrate exogenous texts and align them with the probabilistic nature of large language models (LLMs). Current approaches either employ shallow text-time series fusion via basic prompts or rely on deterministic numerical decodin… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 13 pages, 2 figures

  14. arXiv:2505.10591  [pdf, other

    cs.CY cs.DL cs.SI econ.GN

    Cosmos 1.0: a multidimensional map of the emerging technology frontier

    Authors: Xian Gong, Paul X. McCarthy, Colin Griffith, Claire McFarland, Marian-Andrei Rizoiu

    Abstract: This paper describes a novel methodology to map the universe of emerging technologies, utilising various source data that contain a rich diversity and breadth of contemporary knowledge to create a new dataset and multiple indices that provide new insights into these technologies. The Cosmos 1.0 dataset is a comprehensive collection of 23,544 technologies (ET23k) structured into a hierarchical mode… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  15. arXiv:2505.06105  [pdf, other

    eess.IV cs.CV

    S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram

    Authors: Xilin Gong, Yongkai Chen, Shushan Wu, Fang Wang, Ping Ma, Wenxuan Zhong

    Abstract: Echocardiogram is the most commonly used imaging modality in cardiac assessment duo to its non-invasive nature, real-time capability, and cost-effectiveness. Despite its advantages, most clinical echocardiograms provide only two-dimensional views, limiting the ability to fully assess cardiac anatomy and function in three dimensions. While three-dimensional echocardiography exists, it often suffers… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  16. Statistical CSI Acquisition for Multi-frequency Massive MIMO Systems

    Authors: Jinke Tang, Li You, Xinrui Gong, Chenjie Xie, Xiqi Gao, Xiang-Gen Xia, Xueyuan Shi

    Abstract: Multi-frequency massive multi-input multi-output (MIMO) communication is a promising strategy for both 5G and future 6G systems, ensuring reliable transmission while enhancing frequency resource utilization. Statistical channel state information (CSI) has been widely adopted in multi-frequency massive MIMO transmissions to reduce overhead and improve transmission performance. In this paper, we pro… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figures. Accepted for publication on IEEE Transactions on Communications

  17. GNN-enabled Precoding for Massive MIMO LEO Satellite Communications

    Authors: Huibin Zhou, Xinrui Gong, Christos G. Tsinos, Li You, Xiqi Gao, Björn Ottersten

    Abstract: Low Earth Orbit (LEO) satellite communication is a critical component in the development of sixth generation (6G) networks. The integration of massive multiple-input multiple-output (MIMO) technology is being actively explored to enhance the performance of LEO satellite communications. However, the limited power of LEO satellites poses a significant challenge in improving communication energy effi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 14 pages, 13 figures

  18. arXiv:2504.17449  [pdf, other

    cs.LG cs.AI cs.CL

    HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models

    Authors: Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, Qin Xie, Guiming Xie, Xuejian Gong

    Abstract: The significant computational demands of pretrained language models (PLMs), which often require dedicated hardware, present a substantial challenge in serving them efficiently, especially in multi-tenant environments. To address this, we introduce HMI, a Hierarchical knowledge management-based Multi-tenant Inference system, designed to manage tenants with distinct PLMs resource-efficiently. Our ap… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted by VLDBJ 2025

  19. arXiv:2504.15817  [pdf, other

    cs.CR cs.AR

    EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform

    Authors: Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu

    Abstract: Fully Homomorphic Encryption (FHE) is a set of powerful cryptographic schemes that allows computation to be performed directly on encrypted data with an unlimited depth. Despite FHE's promising in privacy-preserving computing, yet in most FHE schemes, ciphertext generally blows up thousands of times compared to the original message, and the massive amount of data load from off-chip memory for boot… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted by HPCA 2025

  20. arXiv:2504.15674  [pdf, other

    cs.CR cs.LG

    TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

    Authors: Yanbo Dai, Songze Li, Zihan Gan, Xueluan Gong

    Abstract: Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rel… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  21. arXiv:2504.14772  [pdf, other

    cs.CL cs.LG stat.ML

    Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

    Authors: Luyang Fang, Xiaowei Yu, Jiazhang Cai, Yongkai Chen, Shushan Wu, Zhengliang Liu, Zhenyuan Yang, Haoran Lu, Xilin Gong, Yufang Liu, Terry Ma, Wei Ruan, Ali Abbasi, Jing Zhang, Tao Wang, Ehsan Latif, Wei Liu, Wei Zhang, Soheil Kolouri, Xiaoming Zhai, Dajiang Zhu, Wenxuan Zhong, Tianming Liu, Ping Ma

    Abstract: The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and lingui… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  22. arXiv:2504.09713  [pdf, other

    cs.ET

    A Full Spectrum of 3D Ferroelectric Memory Architectures Shaped by Polarization Sensing

    Authors: Jiahui Duan, Asif Khan, Xiao Gong, Vijaykrishnan Narayanan, Kai Ni

    Abstract: Ferroelectric memories have attracted significant interest due to their non-volatile storage, energy efficiency, and fast operation, making them prime candidates for future memory technologies. As commercial Dynamic Random Access Memory (DRAM) and NAND flash memory are transiting or have moved toward three-dimensional (3D) integration, 3D ferroelectric memory architectures are also emerging, provi… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 65 pages, 5 figures

  23. arXiv:2503.23685  [pdf, other

    cs.ET

    An In-Situ Spatial-Temporal Sequence Detector for Neuromorphic Vision Sensor Empowered by High Density Vertical NAND Storage

    Authors: Zijian Zhao, Varun Darshana Parekh, Po-Kai Hsu, Yixin Qin, Yiming Song, A N M Nafiul Islam, Ningyuan Cao, Siddharth Joshi, Thomas Kämpfe, Moonyoung Jung, Kwangyou Seo, Kwangsoo Kim, Wanki Kim, Daewon Ha, Sourav Dutta, Abhronil Sengupta, Xiao Gong, Shimeng Yu, Vijaykrishnan Narayanan, Kai Ni

    Abstract: Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements,… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 26 pages, 7 figures

  24. arXiv:2503.16450  [pdf, other

    cs.HC cs.RO

    Do Looks Matter? Exploring Functional and Aesthetic Design Preferences for a Robotic Guide Dog

    Authors: Aviv L. Cohav, A. Xinran Gong, J. Taery Kim, Clint Zeagler, Sehoon Ha, Bruce N. Walker

    Abstract: Dog guides offer an effective mobility solution for blind or visually impaired (BVI) individuals, but conventional dog guides have limitations including the need for care, potential distractions, societal prejudice, high costs, and limited availability. To address these challenges, we seek to develop a robot dog guide capable of performing the tasks of a conventional dog guide, enhanced with addit… ▽ More

    Submitted 18 February, 2025; originally announced March 2025.

    Comments: 7 pages, 8 figures, accepted for 2025 IEEE International Conference on Robotics and Automation (ICRA)

  25. arXiv:2503.10080  [pdf, ps, other

    cs.CV

    Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

    Authors: Zhen Qu, Xian Tao, Xinyi Gong, Shichen Qu, Qiyu Chen, Zhengtao Zhang, Xingang Wang, Guiguang Ding

    Abstract: Recently, vision-language models (e.g. CLIP) have demonstrated remarkable performance in zero-shot anomaly detection (ZSAD). By leveraging auxiliary data during training, these models can directly perform cross-category anomaly detection on target datasets, such as detecting defects on industrial product surfaces or identifying tumors in organ tissues. Existing approaches typically construct text… ▽ More

    Submitted 3 June, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  26. arXiv:2503.06839  [pdf, other

    cs.CV cs.AI

    AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

    Authors: Zhuowen Zheng, Yain-Whar Si, Xiaochen Yuan, Junwei Duan, Ke Wang, Xiaofan Li, Xinyuan Zhang, Xueyuan Gong

    Abstract: Nowadays, with the advancement of deep neural networks (DNNs) and the availability of large-scale datasets, the face recognition (FR) model has achieved exceptional performance. However, since the parameter magnitude of the fully connected (FC) layer directly depends on the number of identities in the dataset. If training the FR model on large-scale datasets, the size of the model parameter will b… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  27. arXiv:2503.06187  [pdf, other

    cs.CV cs.AI

    MSConv: Multiplicative and Subtractive Convolution for Face Recognition

    Authors: Si Zhou, Yain-Whar Si, Xiaochen Yuan, Xiaofan Li, Xiaoxiang Liu, Xinyuan Zhang, Cong Lin, Xueyuan Gong

    Abstract: In Neural Networks, there are various methods of feature fusion. Different strategies can significantly affect the effectiveness of feature representation, consequently influencing the ability of model to extract representative and discriminative features. In the field of face recognition, traditional feature fusion methods include feature concatenation and feature addition. Recently, various atte… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  28. arXiv:2503.06053  [pdf, other

    cs.CV cs.AI

    DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

    Authors: Runze Zhang, Guoguang Du, Xiaochuan Li, Qi Jia, Liang Jin, Lu Liu, Jingjing Wang, Cong Xu, Zhenhua Guo, Yaqian Zhao, Xiaoli Gong, Rengang Li, Baoyu Fan

    Abstract: Spatio-temporal consistency is a critical research topic in video generation. A qualified generated video segment must ensure plot plausibility and coherence while maintaining visual consistency of objects and scenes across varying viewpoints. Prior research, especially in open-source projects, primarily focuses on either temporal or spatial consistency, or their basic combination, such as appendi… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  29. arXiv:2503.03908  [pdf, other

    cs.LG math.OC

    On the Convergence of Adam-Type Algorithm for Bilevel Optimization under Unbounded Smoothness

    Authors: Xiaochuan Gong, Jie Hao, Mingrui Liu

    Abstract: Adam has become one of the most popular optimizers for training modern deep neural networks, such as transformers. However, its applicability is largely restricted to single-level optimization problems. In this paper, we aim to extend vanilla Adam to tackle bilevel optimization problems, which have important applications in machine learning, such as meta-learning. In particular, we study stochasti… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 49 pages, 5 figures

  30. arXiv:2503.03528  [pdf, other

    cs.CV cs.AI

    AdaSin: Enhancing Hard Sample Metrics with Dual Adaptive Penalty for Face Recognition

    Authors: Qiqi Guo, Zhuowen Zheng, Guanghua Yang, Zhiquan Liu, Xiaofan Li, Jianqing Li, Jinyu Tian, Xueyuan Gong

    Abstract: In recent years, the emergence of deep convolutional neural networks has positioned face recognition as a prominent research focus in computer vision. Traditional loss functions, such as margin-based, hard-sample mining-based, and hybrid approaches, have achieved notable performance improvements, with some leveraging curriculum learning to optimize training. However, these methods often fall short… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  31. arXiv:2503.03104  [pdf, other

    cs.CV cs.AI

    RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition

    Authors: Jinhui Zheng, Zhiquan Liu, Yain-Whar Si, Jianqing Li, Xinyuan Zhang, Xiaofan Li, Haozhi Huang, Xueyuan Gong

    Abstract: Handwritten Paragraph Text Recognition (HPTR) is a challenging task in Computer Vision, requiring the transformation of a paragraph text image, rich in handwritten text, into text encoding sequences. One of the most advanced models for this task is Vertical Attention Network (VAN), which utilizes a Vertical Attention Module (VAM) to implicitly segment paragraph text images into text lines, thereby… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  32. arXiv:2503.00929  [pdf, ps, other

    cs.LG

    Parameter-Adaptive Dynamic Pricing

    Authors: Xueping Gong, Jiheng Zhang

    Abstract: Dynamic pricing is crucial in sectors like e-commerce and transportation, balancing exploration of demand patterns and exploitation of pricing strategies. Existing methods often require precise knowledge of the demand function, e.g., the H{ö}lder smoothness level and Lipschitz constant, limiting practical utility. This paper introduces an adaptive approach to address these challenges without prior… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 44 pages

  33. arXiv:2502.20050  [pdf

    physics.app-ph cs.NE

    A Novel P-bit-based Probabilistic Computing Approach for Solving the 3-D Protein Folding Problem

    Authors: Chao Fang, Yihan He, Xiao Gong, Gengchiau Liang

    Abstract: In the post-Moore era, the need for efficient solutions to non-deterministic polynomial-time (NP) problems is becoming more pressing. In this context, the Ising model implemented by the probabilistic computing systems with probabilistic bits (p-bits) has attracted attention due to the widespread availability of p-bits and support for large-scale simulations. This study marks the first work to appl… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 14pages, 6 fingures

  34. CalliSense: An Interactive Educational Tool for Process-based Learning in Chinese Calligraphy

    Authors: Xinya Gong, Wenhui Tao, Yuxin Ma

    Abstract: Process-based learning is crucial for the transmission of intangible cultural heritage, especially in complex arts like Chinese calligraphy, where mastering techniques cannot be achieved by merely observing the final work. To explore the challenges faced in calligraphy heritage transmission, we conducted semi-structured interviews (N=8) as a formative study. Our findings indicate that the lack of… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  35. arXiv:2502.13533  [pdf, other

    cs.LG cs.AI cs.CL

    Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

    Authors: Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming Xie, Xuejian Gong, Kunlong Zhou

    Abstract: Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model parameters and training only lightweight, low-rank adapter matrices. However, the memory footprint of LoRA is largely dominated by the original model parameters. To… ▽ More

    Submitted 15 March, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR 2025

  36. arXiv:2502.11711  [pdf, other

    cs.LG cs.AI

    Knowledge-aware contrastive heterogeneous molecular graph learning

    Authors: Mukun Chen, Jia Wu, Shirui Pan, Fu Lin, Bo Du, Xiuwen Gong, Wenbin Hu

    Abstract: Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate external knowledge and represent molecular structures across different levels of granularity. To address these limitations, we propose a paradigm shift by encoding mo… ▽ More

    Submitted 20 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  37. arXiv:2502.09872  [pdf, other

    cs.CV cs.LG

    Learning to Calibrate for Reliable Visual Fire Detection

    Authors: Ziqi Zhang, Xiuzhuang Zhou, Xiangyang Gong

    Abstract: Fire is characterized by its sudden onset and destructive power, making early fire detection crucial for ensuring human safety and protecting property. With the advancement of deep learning, the application of computer vision in fire detection has significantly improved. However, deep learning models often exhibit a tendency toward overconfidence, and most existing works focus primarily on enhanci… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  38. arXiv:2502.09501  [pdf, other

    cs.CV

    Prior-Constrained Association Learning for Fine-Grained Generalized Category Discovery

    Authors: Menglin Wang, Zhun Zhong, Xiaojin Gong

    Abstract: This paper addresses generalized category discovery (GCD), the task of clustering unlabeled data from potentially known or unknown categories with the help of labeled instances from each known category. Compared to traditional semi-supervised learning, GCD is more challenging because unlabeled data could be from novel categories not appearing in labeled data. Current state-of-the-art methods typic… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: Accepted to AAAI 2025

  39. arXiv:2502.06817  [pdf, other

    eess.IV cs.GR cs.LG

    Diffusion-empowered AutoPrompt MedSAM

    Authors: Peng Huang, Shu Hu, Bo Peng, Xun Gong, Penghang Yin, Hongtu Zhu, Xi Wu, Xin Wang

    Abstract: MedSAM, a medical foundation model derived from the SAM architecture, has demonstrated notable success across diverse medical domains. However, its clinical application faces two major challenges: the dependency on labor-intensive manual prompt generation, which imposes a significant burden on clinicians, and the absence of semantic labeling in the generated segmentation masks for organs or lesion… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  40. Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification

    Authors: Jiachen Li, Xiaojin Gong

    Abstract: Domain-generalizable re-identification (DG Re-ID) aims to train a model on one or more source domains and evaluate its performance on unseen target domains, a task that has attracted growing attention due to its practical relevance. While numerous methods have been proposed, most rely on discriminative or contrastive learning frameworks to learn generalizable feature representations. However, thes… ▽ More

    Submitted 11 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  41. Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design

    Authors: Yongquan 'Owen' Hu, Jingyu Tang, Xinya Gong, Zhongyi Zhou, Shuning Zhang, Don Samitha Elvitigala, Florian 'Floyd' Mueller, Wen Hu, Aaron J. Quigley

    Abstract: The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data pe… ▽ More

    Submitted 17 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: The ACM CHI Conference on Human Factors in Computing Systems 2025 (CHI 2025)

    MSC Class: 68U35; 68T07 ACM Class: H.5.2; H.5.3

  42. Deep Distance Map Regression Network with Shape-aware Loss for Imbalanced Medical Image Segmentation

    Authors: Huiyu Li, Xiabi Liu, Said Boumaraf, Xiaopeng Gong, Donghai Liao, Xiaohong Ma

    Abstract: Small object segmentation, like tumor segmentation, is a difficult and critical task in the field of medical image analysis. Although deep learning based methods have achieved promising performance, they are restricted to the use of binary segmentation mask. Inspired by the rigorous mapping between binary segmentation mask and distance map, we adopt distance map as a novel ground truth and employ… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: Conference

    Journal ref: International Workshop on Machine Learning in Medical Imaging. Springer, Cham, 2020

  43. arXiv:2501.08862  [pdf, other

    cs.LG cs.AI cs.CR

    ARMOR: Shielding Unlearnable Examples against Data Augmentation

    Authors: Xueluan Gong, Yuji Wang, Yanjiao Chen, Haocheng Dong, Yiming Li, Mengyuan Sun, Shuaike Li, Qian Wang, Chen Chen

    Abstract: Private data, when published online, may be collected by unauthorized parties to train deep neural networks (DNNs). To protect privacy, defensive noises can be added to original samples to degrade their learnability by DNNs. Recently, unlearnable examples are proposed to minimize the training loss such that the model learns almost nothing. However, raw data are often pre-processed before being use… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  44. arXiv:2501.08665  [pdf, other

    cs.CV

    A Survey on Facial Image Privacy Preservation in Cloud-Based Services

    Authors: Chen Chen, Mengyuan Sun, Xueluan Gong, Yanjiao Chen, Qian Wang

    Abstract: Facial recognition models are increasingly employed by commercial enterprises, government agencies, and cloud service providers for identity verification, consumer services, and surveillance. These models are often trained using vast amounts of facial data processed and stored in cloud-based platforms, raising significant privacy concerns. Users' facial images may be exploited without their consen… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  45. arXiv:2501.07394  [pdf

    cs.HC

    Exploring the distribution of connectivity weights in resting-state EEG networks

    Authors: Shiang Hu, Xiao Gong, Xiaolong Huang, Jie Ruan, Pedro Antonio Valdes-Sosa

    Abstract: The resting-state brain networks (RSNs) reflects the functional connectivity patterns between brain modules, providing essential foundations for decoding intrinsic neural information within the brain. It serves as one of the primary tools for describing the spatial dynamics of the brain using various neuroimaging techniques, such as electroencephalography (EEG) and magnetoencephalography (MEG). Ho… ▽ More

    Submitted 18 January, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  46. arXiv:2501.06869  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    A Foundational Generative Model for Breast Ultrasound Image Analysis

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired ex… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: Peking University; Stanford University; Peking University Cancer Hospital & Institute; Peking Union Medical College Hospital; Cancer Hospital, Chinese Academy of Medical Sciences

  47. arXiv:2501.06271  [pdf, other

    q-bio.QM cs.AI cs.CE

    Large Language Models for Bioinformatics

    Authors: Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao Lu, Shang Gao, Yue Wang, Peilong Wang, Lin Zhao, Tao Wang, Yufang Liu, Luyang Fang, Ziyu Liu, Zhengliang Liu, Yiwei Li, Zihao Wu, Junhao Chen, Hanqi Jiang, Yi Pan, Zhenyuan Yang, Jingyuan Chen, Shizhe Liang, Wei Zhang , et al. (30 additional authors not shown)

    Abstract: With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification,… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 64 pages, 1 figure

  48. arXiv:2501.03727  [pdf, ps, other

    eess.AS cs.LG

    Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

    Authors: Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Simon Wong, Brian Mak, Helene Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick Wong, Helen Meng

    Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., pauses, lexical diversity), which ar… ▽ More

    Submitted 18 June, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 13 pages, 7 figures, submitted to JSTSP

  49. arXiv:2412.20017  [pdf, other

    cs.LG math.OC

    A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

    Authors: Xiaochuan Gong, Jie Hao, Mingrui Liu

    Abstract: This paper studies the problem of stochastic bilevel optimization where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level function is strongly convex. This problem is motivated by meta-learning applied to sequential data, such as text classification using recurrent neural networks, where the smoothness constant of the upper-level loss function scales l… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    Comments: ICML 2024

  50. arXiv:2412.18281  [pdf, other

    cs.IT cs.LG eess.SP

    GDM4MMIMO: Generative Diffusion Models for Massive MIMO Communications

    Authors: Zhenzhou Jin, Li You, Huibin Zhou, Yuanshuo Wang, Xiaofeng Liu, Xinrui Gong, Xiqi Gao, Derrick Wing Kwan Ng, Xiang-Gen Xia

    Abstract: Massive multiple-input multiple-output (MIMO) offers significant advantages in spectral and energy efficiencies, positioning it as a cornerstone technology of fifth-generation (5G) wireless communication systems and a promising solution for the burgeoning data demands anticipated in sixth-generation (6G) networks. In recent years, with the continuous advancement of artificial intelligence (AI), a… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures