Skip to main content

Showing 1–50 of 142 results for author: Kong, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23205  [pdf, ps, other

    cs.CV

    BridgeShape: Latent Diffusion Schrödinger Bridge for 3D Shape Completion

    Authors: Dequan Kong, Zhe Zhu, Honghua Chen, Mingqiang Wei

    Abstract: Existing diffusion-based 3D shape completion methods typically use a conditional paradigm, injecting incomplete shape information into the denoising network via deep feature interactions (e.g., concatenation, cross-attention) to guide sampling toward complete shapes, often represented by voxel-based distance functions. However, these approaches fail to explicitly model the optimal global transport… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2506.22056  [pdf, ps, other

    cs.AI

    Universal Retrieval for Multimodal Trajectory Modeling

    Authors: Xuan Zhang, Ziyan Jiang, Rui Meng, Yifei Leng, Zhenbang Xiao, Zora Zhiruo Wang, Yanyi Shang, Dehan Kong

    Abstract: Trajectory data, capturing human actions and environmental states across various modalities, holds significant potential for enhancing AI agent capabilities, particularly in GUI environments. However, how to model the representation of trajectory-level data presents a significant challenge that has not been systematically addressed amid explosive trajectory data growth. In this work, we introduce… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 18 pages, 3 figures, accepted by Workshop on Computer-use Agents @ ICML 2025

  3. arXiv:2506.22050  [pdf, ps, other

    cs.CL

    Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs

    Authors: Delu Kong, Lieve Macken

    Abstract: This study explores Machine Translationese (MTese) -- the linguistic peculiarities of machine translation outputs -- focusing on the under-researched English-to-Chinese language pair in news texts. We construct a large dataset consisting of 4 sub-corpora and employ a comprehensive five-layer feature set. Then, a chi-square ranking algorithm is applied for feature selection in both classification a… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, 6 tables. Accpeted in MT Summit 2025, Research: Technical track. Official version may be accessed later in the ACL Anthology

  4. arXiv:2506.22038  [pdf, ps, other

    cs.CL

    Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation

    Authors: Delu Kong, Lieve Macken

    Abstract: This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in English-to-Chinese children's literature translation (CLT) from a stylometric perspective. The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 19 pages, 8 figures, 4 tables. Accepted in 2nd Workshop on Creative-text Translation and Technology Co-located with MT Summit 2025. Official paper may later be accessed from ACL Anthology

  5. arXiv:2506.19676  [pdf, ps, other

    cs.CR

    A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

    Authors: Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Muhammad Khurram Khan, Meng Han

    Abstract: In recent years, Large-Language-Model-driven AI agents have exhibited unprecedented intelligence and adaptability, and are rapidly changing human production and life. Nowadays, agents are undergoing a new round of evolution. They no longer act as an isolated island like LLMs. Instead, they start to communicate with diverse external entities, such as other agents and tools, to perform more complex… ▽ More

    Submitted 2 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

    Comments: 41 pages, 13 figures, submitted to IEEE COMST

  6. arXiv:2506.12835  [pdf, ps, other

    cs.CV

    DiffS-NOCS: 3D Point Cloud Reconstruction through Coloring Sketches to NOCS Maps Using Diffusion Models

    Authors: Di Kong, Qianhui Wan

    Abstract: Reconstructing a 3D point cloud from a given conditional sketch is challenging. Existing methods often work directly in 3D space, but domain variability and difficulty in reconstructing accurate 3D structures from 2D sketches remain significant obstacles. Moreover, ideal models should also accept prompts for control, in addition with the sparse sketch, posing challenges in multi-modal fusion. We p… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  7. arXiv:2506.12475  [pdf, ps, other

    eess.IV cs.CV

    Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution

    Authors: Fangwei Hao, Ji Du, Desheng Kong, Jiesheng Wu, Jing Xu, Ping Li

    Abstract: In recent years, the performance of lightweight Single-Image Super-Resolution (SISR) has been improved significantly with the application of Convolutional Neural Networks (CNNs) and Large Kernel Attention (LKA). However, existing information distillation modules for lightweight SISR struggle to map inputs into High-Dimensional Non-Linear (HDNL) feature spaces, limiting their representation learnin… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  8. arXiv:2506.07837  [pdf, ps, other

    cs.AI

    HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains

    Authors: Shijie Wang, Yilun Zhang, Zeyu Lai, Dexing Kong

    Abstract: Multimodal large language models (MLLMs) have shown great potential in general domains but perform poorly in some specific domains due to a lack of domain-specific data, such as image-text data or vedio-text data. In some specific domains, there is abundant graphic and textual data scattered around, but lacks standardized arrangement. In the field of medical ultrasound, there are ultrasonic diagno… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  9. arXiv:2506.06712  [pdf, ps, other

    cs.CV math.AP

    Active Contour Models Driven by Hyperbolic Mean Curvature Flow for Image Segmentation

    Authors: Saiyu Hu, Chunlei He, Jianfeng Zhang, Dexing Kong, Shoujun Huang

    Abstract: Parabolic mean curvature flow-driven active contour models (PMCF-ACMs) are widely used in image segmentation, which however depend heavily on the selection of initial curve configurations. In this paper, we firstly propose several hyperbolic mean curvature flow-driven ACMs (HMCF-ACMs), which introduce tunable initial velocity fields, enabling adaptive optimization for diverse segmentation scenario… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  10. arXiv:2505.20246  [pdf, ps, other

    cs.AI cs.CL

    On Path to Multimodal Historical Reasoning: HistBench and HistAgent

    Authors: Jiahao Qiu, Fulian Xiao, Yimin Wang, Yuchen Mao, Yijia Chen, Xinzhe Juan, Shu Zhang, Siran Wang, Xuan Qi, Tongcheng Zhang, Zixin Yao, Jiacheng Guo, Yifu Lu, Charles Argon, Jundi Cui, Daixin Chen, Junran Zhou, Shuyao Zhou, Zhanpeng Zhou, Ling Yang, Shilong Liu, Hongru Wang, Kaixuan Huang, Xun Jiang, Yuming Cao , et al. (74 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have led to remarkable progress across domains, yet their capabilities in the humanities, particularly history, remain underexplored. Historical reasoning poses unique challenges for AI, involving multimodal source interpretation, temporal inference, and cross-linguistic analysis. While general-purpose agents perform well on many existing benchmarks,… ▽ More

    Submitted 19 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 17 pages, 7 figures

  11. arXiv:2505.17652  [pdf, ps, other

    cs.LG cs.AI

    Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

    Authors: Deyang Kong, Qi Guo, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye

    Abstract: Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties. However, these approaches suffer from unstable and biased estimations of problem difficulty and fail to capture th… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  12. arXiv:2505.14806  [pdf, ps, other

    q-bio.NC cs.LG stat.ML

    Place Cells as Proximity-Preserving Embeddings: From Multi-Scale Random Walk to Straight-Forward Path Planning

    Authors: Minglu Zhao, Dehong Xu, Deqian Kong, Wen-Hao Zhang, Ying Nian Wu

    Abstract: The hippocampus enables spatial navigation through place cell populations forming cognitive maps. We propose proximity-preserving neural embeddings to encode multi-scale random walk transitions, where the inner product $\langle h(x, t), h(y, t) \rangle = q(y|x, t)$ represents normalized transition probabilities, with $h(x, t)$ as the embedding at location $x$ and $q(y|x, t)$ as the transition prob… ▽ More

    Submitted 2 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  13. arXiv:2505.12624  [pdf, other

    cs.RO

    EndoForce: Development of an Intuitive Axial Force Measurement Device for Endoscopic Robotic Systems

    Authors: Hansoul Kim, Dong-Ho Lee, Dukyoo Kong, Dong-Soo Kwon, Byungsik Cheon

    Abstract: Robotic endoscopic systems provide intuitive control and eliminate radiation exposure, making them a promising alternative to conventional methods. However, the lack of axial force measurement from the robot remains a major challenge, as it can lead to excessive colonic elongation, perforation, or ureteral complications. Although various methods have been proposed in previous studies, limitations… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  14. arXiv:2505.03077  [pdf, other

    cs.RO cs.AI cs.LG

    Latent Adaptive Planner for Dynamic Manipulation

    Authors: Donghun Noh, Deqian Kong, Minglu Zhao, Andrew Lizarraga, Jianwen Xie, Ying Nian Wu, Dennis Hong

    Abstract: This paper presents Latent Adaptive Planner (LAP), a novel approach for dynamic nonprehensile manipulation tasks that formulates planning as latent space inference, effectively learned from human demonstration videos. Our method addresses key challenges in visuomotor policy learning through a principled variational replanning framework that maintains temporal consistency while efficiently adapting… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  15. arXiv:2504.21367  [pdf, other

    cs.CE

    Implementation and Security Analysis of Cryptocurrencies Based on Ethereum

    Authors: Pengfei Gao, Dechao Kong, Xiaoqi Li

    Abstract: Blockchain technology has set off a wave of decentralization in the world since its birth. The trust system constructed by blockchain technology based on cryptography algorithm and computing power provides a practical and powerful solution to solve the trust problem in human society. In order to make more convenient use of the characteristics of blockchain and build applications on it, smart contr… ▽ More

    Submitted 6 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  16. arXiv:2504.21053  [pdf, other

    cs.LG cs.AI

    NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models

    Authors: Yi Zhou, Wenpeng Xing, Dezhang Kong, Changting Lin, Meng Han

    Abstract: Safety alignment in large language models (LLMs) is achieved through fine-tuning mechanisms that regulate neuron activations to suppress harmful content. In this work, we propose a novel approach to induce disalignment by identifying and modifying the neurons responsible for safety constraints. Our method consists of three key steps: Neuron Activation Analysis, where we examine activation patterns… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  17. arXiv:2504.17825  [pdf, other

    cs.CV cs.AI

    Dual Prompting Image Restoration with Diffusion Transformers

    Authors: Dehong Kong, Fan Li, Zhixin Wang, Jiaqi Xu, Renjing Pei, Wenbo Li, WenQi Ren

    Abstract: Recent state-of-the-art image restoration methods mostly adopt latent diffusion models with U-Net backbones, yet still facing challenges in achieving high-quality restoration due to their limited capabilities. Diffusion transformers (DiTs), like SD3, are emerging as a promising alternative because of their better quality with scalability. In this paper, we introduce DPIR (Dual Prompting Image Rest… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: CVPR2025

  18. arXiv:2504.09072  [pdf, other

    cs.AR cs.LG

    MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation

    Authors: Vikas Natesh, H. T. Kung, David Kong

    Abstract: We offer a novel approach, MGS (Markov Greedy Sums), to improve the accuracy of low-bitwidth floating-point dot products in neural network computations. In conventional 32-bit floating-point summation, adding values with different exponents may lead to loss of precision in the mantissa of the smaller term, which is right-shifted to align with the larger term's exponent. Such shifting (a.k.a. 'swam… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  19. arXiv:2504.08257  [pdf, other

    physics.app-ph cs.AI

    Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions

    Authors: Yingqian Xu, Xiaohan Li, Caihua Wan, Ran Zhang, Bin He, Shiqiang Liu, Jihao Xia, Dehao Kong, Shilong Xiong, Guoqiang Yu, Xiufeng Han

    Abstract: Bayesian networks play an increasingly important role in data mining, inference, and reasoning with the rapid development of artificial intelligence. In this paper, we present proof-of-concept experiments demonstrating the use of spin-orbit torque magnetic tunnel junctions (SOT-MTJs) in Bayesian network reasoning. Not only can the target probability distribution function (PDF) of a Bayesian networ… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  20. Innovative Automated Stretch Elastic Waistband Sewing Machine for Garment Manufacturing

    Authors: Prof Dr Ray Wai Man Kong

    Abstract: There is applied research for the development of the Automated Stretch Elastic Waistband Sewing Machine represents a significant advancement in garment manufacturing, addressing the industry's need for increased efficiency, precision, and adaptability. This machine integrates innovative features such as a sensor-based automatic waistband expansion system, synchronized sewing speed and rolling whee… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 13 pages, 10 Figures

    Journal ref: 2025, International Research Journal of Modernization in Engineering Technology and Science

  21. arXiv:2503.01506  [pdf, other

    cs.CL

    SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity

    Authors: Xiangyu Xi, Deyang Kong, Jian Yang, Jiawei Yang, Zhengyu Chen, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye

    Abstract: Existing pretraining data mixing methods for large language models (LLMs) typically follow a domain-wise methodology, a top-down process that first determines domain weights and then performs uniform data sampling across each domain. However, these approaches neglect significant inter-domain overlaps and commonalities, failing to control the global diversity of the constructed training dataset. Fu… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  22. arXiv:2502.17941  [pdf, other

    cs.CV cs.AI cs.LG

    Optimal Brain Apoptosis

    Authors: Mingyuan Sun, Zheng Fang, Jiaxu Wang, Junjie Jiang, Delei Kong, Chenming Hu, Yuetong Fang, Renjing Xu

    Abstract: The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromi… ▽ More

    Submitted 3 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Accepted to ICLR 2025

  23. arXiv:2502.10406  [pdf, other

    cs.CY cs.AI

    FishBargain: An LLM-Empowered Bargaining Agent for Online Fleamarket Platform Sellers

    Authors: Dexin Kong, Xu Yan, Ming Chen, Shuguang Han, Jufeng Chen, Fei Huang

    Abstract: Different from traditional Business-to-Consumer e-commerce platforms~(e.g., Amazon), online fleamarket platforms~(e.g., Craigslist) mainly focus on individual sellers who are lack of time investment and business proficiency. Individual sellers often struggle with the bargaining process and thus the deal is unaccomplished. Recent advancements in Large Language Models(LLMs) demonstrate huge potentia… ▽ More

    Submitted 22 January, 2025; originally announced February 2025.

  24. arXiv:2502.01567  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Latent Thought Models with Variational Bayes Inference-Time Computation

    Authors: Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu

    Abstract: We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast lear… ▽ More

    Submitted 6 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  25. arXiv:2412.14226  [pdf, other

    cs.LG stat.ML

    FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning

    Authors: Jordan Slessor, Dezheng Kong, Xiaofen Tang, Zheng En Than, Linglong Kong

    Abstract: Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how to sample participating clients in each round effectively and in a privacy-preserving manner. In this paper, we propose… ▽ More

    Submitted 29 December, 2024; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures

    MSC Class: 68T05 (Primary) 62H30; 62J05 (Secondary)

  26. arXiv:2412.05467  [pdf, other

    cs.LG cs.AI cs.SE

    The BrowserGym Ecosystem for Web Agent Research

    Authors: Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, Léo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig, Ruslan Salakhutdinov, Nicolas Chapados, Alexandre Lacoste

    Abstract: The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) i… ▽ More

    Submitted 28 February, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  27. arXiv:2411.17052  [pdf, other

    cs.RO

    Dynamic Programming-Based Offline Redundancy Resolution of Redundant Manipulators Along Prescribed Paths with Real-Time Adjustment

    Authors: Zhihang Yin, Fa Wu, Ziqian Wang, Jianmin Yang, Jiyong Tan, Dexing Kong

    Abstract: Traditional offline redundancy resolution of trajectories for redundant manipulators involves computing inverse kinematic solutions for Cartesian space paths, constraining the manipulator to a fixed path without real-time adjustments. Online redundancy resolution can achieve real-time adjustment of paths, but it cannot consider subsequent path points, leading to the possibility of the manipulator… ▽ More

    Submitted 18 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  28. arXiv:2411.17034  [pdf, other

    cs.RO

    Dynamic Programming-Based Redundancy Resolution for Path Planning of Redundant Manipulators Considering Breakpoints

    Authors: Zhihang Yin, Fa Wu, Ruofan Bian, Ziqian Wang, Jianmin Yang, Jiyong Tan, Dexing Kong

    Abstract: This paper proposes a redundancy resolution algorithm for a redundant manipulator based on dynamic programming. This algorithm can compute the desired joint angles at each point on a pre-planned discrete path in Cartesian space, while ensuring that the angles, velocities, and accelerations of each joint do not exceed the manipulator's constraints. We obtain the analytical solution to the inverse k… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  29. arXiv:2411.10596  [pdf, other

    q-bio.NC cs.AI cs.CV stat.ML

    A minimalistic representation model for head direction system

    Authors: Minglu Zhao, Dehong Xu, Deqian Kong, Wen-Hao Zhang, Ying Nian Wu

    Abstract: We present a minimalistic representation model for the head direction (HD) system, aiming to learn a high-dimensional representation of head direction that captures essential properties of HD cells. Our model is a representation of rotation group $U(1)$, and we study both the fully connected version and convolutional version. We demonstrate the emergence of Gaussian-like tuning profiles and a 2D c… ▽ More

    Submitted 2 June, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci 2025)

  30. arXiv:2410.21069  [pdf

    cs.LG cs.AI q-bio.BM

    EMOCPD: Efficient Attention-based Models for Computational Protein Design Using Amino Acid Microenvironment

    Authors: Xiaoqi Ling, Cheng Cai, Demin Kong, Zhisheng Wei, Jing Wu, Lei Wang, Zhaohong Deng

    Abstract: Computational protein design (CPD) refers to the use of computational methods to design proteins. Traditional methods relying on energy functions and heuristic algorithms for sequence design are inefficient and do not meet the demands of the big data era in biomolecules, with their accuracy limited by the energy functions and search algorithms. Existing deep learning methods are constrained by the… ▽ More

    Submitted 29 October, 2024; v1 submitted 28 October, 2024; originally announced October 2024.

  31. arXiv:2410.12262  [pdf, other

    cs.RO

    3D Gaussian Splatting in Robotics: A Survey

    Authors: Siting Zhu, Guangming Wang, Xin Kong, Dezhi Kong, Hesheng Wang

    Abstract: Dense 3D representations of the environment have been a long-term goal in the robotics field. While previous Neural Radiance Fields (NeRF) representation have been prevalent for its implicit, coordinate-based model, the recent emergence of 3D Gaussian Splatting (3DGS) has demonstrated remarkable potential in its explicit radiance field representation. By leveraging 3D Gaussian primitives for expli… ▽ More

    Submitted 18 December, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

  32. arXiv:2410.11359  [pdf, other

    cs.LG cs.RO stat.ML

    DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting

    Authors: Eric Hanchen Jiang, Zhi Zhang, Dinghuai Zhang, Andrew Lizarraga, Chenheng Xu, Yasi Zhang, Siyan Zhao, Zhengjie Xu, Peiyu Yu, Yuer Tang, Deqian Kong, Ying Nian Wu

    Abstract: Advancements in reinforcement learning have led to the development of sophisticated models capable of learning complex decision-making tasks. However, efficiently integrating world models with decision transformers remains a challenge. In this paper, we introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive learning strength… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  33. arXiv:2410.10601  [pdf, other

    cs.RO

    Fully Asynchronous Neuromorphic Perception for Mobile Robot Dodging with Loihi Chips

    Authors: Junjie Jiang, Delei Kong, Chenming Hu, Zheng Fang

    Abstract: Sparse and asynchronous sensing and processing in natural organisms lead to ultra low-latency and energy-efficient perception. Event cameras, known as neuromorphic vision sensors, are designed to mimic these characteristics. However, fully utilizing the sparse and asynchronous event stream remains challenging. Influenced by the mature algorithms of standard cameras, most existing event-based algor… ▽ More

    Submitted 21 December, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

  34. arXiv:2410.07630  [pdf, other

    cs.RO

    Simplified POMDP Planning with an Alternative Observation Space and Formal Performance Guarantees

    Authors: Da Kong, Vadim Indelman

    Abstract: Online planning under uncertainty in partially observable domains is an essential capability in robotics and AI. The partially observable Markov decision process (POMDP) is a mathematically principled framework for addressing decision-making problems in this challenging setting. However, finding an optimal solution for POMDPs is computationally expensive and is feasible only for small problems. In… ▽ More

    Submitted 11 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: Accepted to ISRR 2024

  35. Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

    Authors: Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, Wenqi Ren

    Abstract: Visual language pre-training (VLP) models have demonstrated significant success across various domains, yet they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multimodal learning. Traditionally, adversarial methods targeting VLP models involve simultaneously perturbing images and text. However, this approach faces notabl… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: accepted by Visual Intelligence

    Journal ref: Visual Intelligence, 2024, Vol 2, article no.17

  36. arXiv:2409.14181  [pdf

    cs.AI

    Democratising Artificial Intelligence for Pandemic Preparedness and Global Governance in Latin American and Caribbean Countries

    Authors: Andre de Carvalho, Robson Bonidia, Jude Dzevela Kong, Mariana Dauhajre, Claudio Struchiner, Guilherme Goedert, Peter F. Stadler, Maria Emilia Walter, Danilo Sanches, Troy Day, Marcia Castro, John Edmunds, Manuel Colome-Hidalgo, Demian Arturo Herrera Morban, Edian F. Franco, Cesar Ugarte-Gil, Patricia Espinoza-Lopez, Gabriel Carrasco-Escobar, Ulisses Rocha

    Abstract: Infectious diseases, transmitted directly or indirectly, are among the leading causes of epidemics and pandemics. Consequently, several open challenges exist in predicting epidemic outbreaks, detecting variants, tracing contacts, discovering new drugs, and fighting misinformation. Artificial Intelligence (AI) can provide tools to deal with these scenarios, demonstrating promising results in the fi… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  37. arXiv:2409.12421  [pdf, other

    cs.CV

    Frequency-Guided Spatial Adaptation for Camouflaged Object Detection

    Authors: Shizhou Zhang, Dexuan Kong, Yinghui Xing, Yue Lu, Lingyan Ran, Guoqiang Liang, Hexu Wang, Yanning Zhang

    Abstract: Camouflaged object detection (COD) aims to segment camouflaged objects which exhibit very similar patterns with the surrounding environment. Recent research works have shown that enhancing the feature representation via the frequency information can greatly alleviate the ambiguity problem between the foreground objects and the background.With the emergence of vision foundation models, like InternI… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: The paper has been accepted for publication as a regular paper in the IEEE Transactions on Multimedia

  38. arXiv:2409.03845  [pdf, other

    cs.LG stat.ML

    Latent Space Energy-based Neural ODEs

    Authors: Sheng Cheng, Deqian Kong, Jianwen Xie, Kookjin Lee, Ying Nian Wu, Yezhou Yang

    Abstract: This paper introduces novel deep dynamical models designed to represent continuous-time sequences. Our approach employs a neural emission model to generate each data point in the time series through a non-linear transformation of a latent state vector. The evolution of these latent states is implicitly defined by a neural ordinary differential equation (ODE), with the initial state drawn from an i… ▽ More

    Submitted 5 February, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  39. arXiv:2408.14431  [pdf, other

    cs.SE

    Towards Better Comprehension of Breaking Changes in the NPM Ecosystem

    Authors: Dezhen Kong, Jiakun Liu, Lingfeng Bao, David Lo

    Abstract: Breaking changes cause a lot of effort to both downstream and upstream developers: downstream developers need to adapt to breaking changes and upstream developers are responsible for identifying and documenting them. In the NPM ecosystem, characterized by frequent code changes and a high tolerance for making breaking changes, the effort is larger. For better comprehension of breaking changes in… ▽ More

    Submitted 14 October, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  40. arXiv:2408.05452  [pdf, other

    cs.CV cs.RO

    EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

    Authors: Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

    Abstract: Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representatio… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  41. arXiv:2407.12239  [pdf, other

    cs.CV

    Motion and Structure from Event-based Normal Flow

    Authors: Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou

    Abstract: Recovering the camera motion and scene geometry from visual data is a fundamental problem in the field of computer vision. Its success in standard vision is attributed to the maturity of feature extraction, data association and multi-view geometry. The recent emergence of neuromorphic event-based cameras places great demands on approaches that use raw event data as input to solve this fundamental… ▽ More

    Submitted 9 October, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by ECCV 2024

  42. arXiv:2407.01607  [pdf, other

    cs.LG cs.IR stat.ML

    Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

    Authors: Zhongxiang Fan, Zhaocheng Liu, Jian Liang, Dongying Kong, Han Li, Peng Jiang, Shuang Li, Kun Gai

    Abstract: This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary is… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  43. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  44. arXiv:2406.03868  [pdf, other

    cs.DC

    PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

    Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin

    Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages

  45. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  46. arXiv:2405.08337  [pdf

    cs.CV cs.AI

    Perivascular space Identification Nnunet for Generalised Usage (PINGU)

    Authors: Benjamin Sinclair, Lucy Vivash, Jasmine Moses, Miranda Lynch, William Pham, Karina Dorfman, Cassandra Marotta, Shaun Koh, Jacob Bunyamin, Ella Rowsthorn, Alex Jarema, Himashi Peiris, Zhaolin Chen, Sandy R Shultz, David K Wright, Dexiao Kong, Sharon L. Naismith, Terence J. OBrien, Meng Law

    Abstract: Perivascular spaces(PVSs) form a central component of the brainś waste clearance system, the glymphatic system. These structures are visible on MRI images, and their morphology is associated with aging and neurological disease. Manual quantification of PVS is time consuming and subjective. Numerous deep learning methods for PVS segmentation have been developed, however the majority have been devel… ▽ More

    Submitted 17 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  47. arXiv:2405.07595  [pdf, other

    cs.CV cs.AI

    Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection

    Authors: Dehong Kong, Siyuan Liang, Wenqi Ren

    Abstract: Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  48. arXiv:2405.07090  [pdf, other

    cs.HC

    MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

    Authors: Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

    Abstract: The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  49. arXiv:2405.00254  [pdf, other

    cs.AI cs.LG

    RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

    Authors: Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman Ozdaglar

    Abstract: Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issu… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

    Comments: Added experiments

  50. arXiv:2403.15853  [pdf

    eess.IV cs.CV

    An edge detection-based deep learning approach for tear meniscus height measurement

    Authors: Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

    Abstract: Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures