Skip to main content

Showing 1–50 of 233 results for author: Zou, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04321  [pdf, ps, other

    cs.RO

    Lidar Variability: A Novel Dataset and Comparative Study of Solid-State and Spinning Lidars

    Authors: Doumegna Mawuto Koudjo Felix, Xianjia Yu, Jiaqiang Zhang, Sier Ha, Zhuo Zou, Tomi Westerlund

    Abstract: Lidar technology has been widely employed across various applications, such as robot localization in GNSS-denied environments and 3D reconstruction. Recent advancements have introduced different lidar types, including cost-effective solid-state lidars such as the Livox Avia and Mid-360. The Mid-360, with its dome-like design, is increasingly used in portable mapping and unmanned aerial vehicle (UA… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 2 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  3. arXiv:2506.18443  [pdf, ps, other

    cs.RO cs.CV

    Radar and Event Camera Fusion for Agile Robot Ego-Motion Estimation

    Authors: Yang Lyu, Zhenghao Zou, Yanfeng Li, Chunhui Zhao, Quan Pan

    Abstract: Achieving reliable ego motion estimation for agile robots, e.g., aerobatic aircraft, remains challenging because most robot sensors fail to respond timely and clearly to highly dynamic robot motions, often resulting in measurement blurring, distortion, and delays. In this paper, we propose an IMU-free and feature-association-free framework to achieve aggressive ego-motion velocity estimation of a… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  4. arXiv:2506.12738  [pdf, ps, other

    cs.CV cs.AI

    Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

    Authors: Hang Xu, Wei Yu, Jiangtong Tan, Zhen Zou, Feng Zhao

    Abstract: Blind Super-Resolution (blind SR) aims to enhance the model's generalization ability with unknown degradation, yet it still encounters severe overfitting issues. Some previous methods inspired by dropout, which enhances generalization by regularizing features, have shown promising results in blind SR. Nevertheless, these methods focus solely on regularizing features before the final layer and over… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 8 pages, 8 figures, CVPR2025

  5. arXiv:2506.02461  [pdf, ps, other

    cs.CL

    XToM: Exploring the Multilingual Theory of Mind for Large Language Models

    Authors: Chunkit Chan, Yauwai Yim, Hongchuan Zeng, Zhiying Zou, Xinyuan Cheng, Zhifan Sun, Zheye Deng, Kawai Chung, Yuzhuo Ao, Yixiang Fan, Cheng Jiayang, Ercong Nie, Ginny Y. Wong, Helmut Schmid, Hinrich Schütze, Simon See, Yangqiu Song

    Abstract: Theory of Mind (ToM), the ability to infer mental states in others, is pivotal for human social cognition. Existing evaluations of ToM in LLMs are largely limited to English, neglecting the linguistic diversity that shapes human cognition. This limitation raises a critical question: can LLMs exhibit Multilingual Theory of Mind, which is the capacity to reason about mental states across diverse lin… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  6. arXiv:2506.02038  [pdf, ps, other

    cs.CR cs.LG

    Blockchain Powered Edge Intelligence for U-Healthcare in Privacy Critical and Time Sensitive Environment

    Authors: Anum Nawaz, Hafiz Humza Mahmood Ramzan, Xianjia Yu, Zhuo Zou, Tomi Westerlund

    Abstract: Edge Intelligence (EI) serves as a critical enabler for privacy-preserving systems by providing AI-empowered computation and distributed caching services at the edge, thereby minimizing latency and enhancing data privacy. The integration of blockchain technology further augments EI frameworks by ensuring transactional transparency, auditability, and system-wide reliability through a decentralized… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  7. arXiv:2506.00416  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Blockchain-Enabled Privacy-Preserving Second-Order Federated Edge Learning in Personalized Healthcare

    Authors: Anum Nawaz, Muhammad Irfan, Xianjia Yu, Zhuo Zou, Tomi Westerlund

    Abstract: Federated learning (FL) has attracted increasing attention to mitigate security and privacy challenges in traditional cloud-centric machine learning models specifically in healthcare ecosystems. FL methodologies enable the training of global models through localized policies, allowing independent operations at the edge clients' level. Conventional first-order FL approaches face several challenges… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2505.23010  [pdf, ps, other

    cs.CV

    SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model

    Authors: Bowen Chen, Keyan Chen, Mohan Yang, Zhengxia Zou, Zhenwei Shi

    Abstract: High-resolution (HR) remote sensing imagery plays a vital role in a wide range of applications, including urban planning and environmental monitoring. However, due to limitations in sensors and data transmission links, the images acquired in practice often suffer from resolution degradation. Remote Sensing Image Super-Resolution (RSISR) aims to reconstruct HR images from low-resolution (LR) inputs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.16239  [pdf, ps, other

    cs.CV

    DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

    Authors: Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, Yulun Zhang

    Abstract: Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fide… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Code is available at: https://github.com/zhengchen1999/DOVE

  10. arXiv:2505.08235  [pdf, other

    cs.CV

    EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

    Authors: Hanle Zheng, Xujie Han, Zegang Peng, Shangbin Zhang, Guangxun Du, Zhuo Zou, Xilin Wang, Jibin Wu, Hao Guo, Lei Deng

    Abstract: Video Frame Interpolation (VFI) is a fundamental yet challenging task in computer vision, particularly under conditions involving large motion, occlusion, and lighting variation. Recent advancements in event cameras have opened up new opportunities for addressing these challenges. While existing event-based VFI methods have succeeded in recovering large and complex motions by leveraging handcrafte… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.03599  [pdf, other

    cs.CV

    From Pixels to Polygons: A Survey of Deep Learning Approaches for Medical Image-to-Mesh Reconstruction

    Authors: Fengming Lin, Arezoo Zakeri, Yidan Xue, Michael MacRaild, Haoran Dou, Zherui Zhou, Ziwei Zou, Ali Sarrami-Foroushani, Jinming Duan, Alejandro F. Frangi

    Abstract: Deep learning-based medical image-to-mesh reconstruction has rapidly evolved, enabling the transformation of medical imaging data into three-dimensional mesh models that are critical in computational medicine and in silico trials for advancing our understanding of disease mechanisms, and diagnostic and therapeutic techniques in modern medicine. This survey systematically categorizes existing appro… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  12. arXiv:2504.20314  [pdf, other

    cs.LG cs.AI

    Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

    Authors: Qitao Tan, Sung-En Chang, Rui Xia, Huidong Ji, Chence Yang, Ci Zhang, Jun Liu, Zheng Zhan, Zhou Zou, Yanzhi Wang, Jin Lu, Geng Yuan

    Abstract: Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms,… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  13. arXiv:2504.18406  [pdf, other

    cs.CL

    HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

    Authors: Yusen Zhang, Wenliang Zheng, Aashrith Madasu, Peng Shi, Ryo Kamoi, Hao Zhou, Zhuoyang Zou, Shu Zhao, Sarkar Snigdha Sarathi Das, Vipul Gupta, Xiaoxin Lu, Nan Zhang, Ranran Haoran Zhang, Avitej Iyer, Renze Lou, Wenpeng Yin, Rui Zhang

    Abstract: High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a… ▽ More

    Submitted 29 April, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: 22 pages, 8 figures

  14. arXiv:2504.18249  [pdf, other

    cs.CV cs.AI cs.LG

    Event-Based Eye Tracking. 2025 Event-based Vision Workshop

    Authors: Qinyu Chen, Chang Gao, Min Liu, Daniele Perrone, Yan Ru Pei, Zuowen Wang, Zhuo Zou, Shihang Tan, Tao Han, Guorui Lu, Zhen Xu, Junyuan Ding, Ziteng Wang, Zongwei Wu, Han Han, Yuliang Wu, Jinze Chen, Wei Zhai, Yang Cao, Zheng-jun Zha, Nuwan Bandara, Thivya Kandappu, Archan Misra, Xiaopeng Lin, Hongxiang Huang , et al. (7 additional authors not shown)

    Abstract: This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research.… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  15. arXiv:2504.10540  [pdf, other

    stat.ML cs.AI cs.LG

    AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

    Authors: Zichao Yu, Zhen Zou, Guojiang Shao, Chengwei Zhang, Shengze Xu, Jie Huang, Feng Zhao, Xiaodong Cun, Wenyi Zhang

    Abstract: Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to p… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  16. arXiv:2504.09048  [pdf, other

    cs.CV

    BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

    Authors: Yongchang Wu, Zipeng Qi, Zhenwei Shi, Zhengxia Zou

    Abstract: The recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated remarkable potential in novel view synthesis tasks. The divide-and-conquer paradigm has enabled large-scale scene reconstruction, but significant challenges remain in scene partitioning, optimization, and merging processes. This paper introduces BlockGaussian, a novel framework incorporating a content-aware scene partition s… ▽ More

    Submitted 15 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: https://github.com/SunshineWYC/BlockGaussian

  17. arXiv:2504.07943  [pdf, other

    cs.CV

    HoloPart: Generative 3D Part Amodal Segmentation

    Authors: Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu

    Abstract: 3D part amodal segmentation--decomposing a 3D shape into complete, semantically meaningful parts, even when occluded--is a challenging but crucial task for 3D content creation and understanding. Existing 3D part segmentation methods only identify visible surface patches, limiting their utility. Inspired by 2D amodal segmentation, we introduce this novel task to the 3D domain and propose a practica… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Project Page: https://vast-ai-research.github.io/HoloPart

  18. arXiv:2504.03354  [pdf, ps, other

    cs.DS

    A Generalized Binary Tree Mechanism for Differentially Private Approximation of All-Pair Distances

    Authors: Michael Dinitz, Chenglin Fan, Jingcheng Liu, Jalaj Upadhyay, Zongrui Zou

    Abstract: We study the problem of approximating all-pair distances in a weighted undirected graph with differential privacy, introduced by Sealfon [Sea16]. Given a publicly known undirected graph, we treat the weights of edges as sensitive information, and two graphs are neighbors if their edge weights differ in one edge by at most one. We obtain efficient algorithms with significantly improved bounds on a… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 27 pages

  19. arXiv:2503.23044  [pdf, other

    cs.CV

    CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction

    Authors: Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han

    Abstract: Despite its significant achievements in large-scale scene reconstruction, 3D Gaussian Splatting still faces substantial challenges, including slow processing, high computational costs, and limited geometric accuracy. These core issues arise from its inherently unstructured design and the absence of efficient parallelization. To overcome these challenges simultaneously, we introduce CityGS-X, a sca… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: Project page: https://lifuguan.github.io/CityGS-X/

  20. arXiv:2503.22349  [pdf, other

    cs.CV

    GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

    Authors: Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang

    Abstract: Accurate surface reconstruction from unposed images is crucial for efficient 3D object or scene creation. However, it remains challenging, particularly for the joint camera pose estimation. Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings, but could easily fail for sparse-view scenarios without sufficient visual overlap. In this paper, we… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  21. arXiv:2503.21732  [pdf, other

    cs.CV

    SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

    Authors: Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li

    Abstract: Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentia… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Project page: https://xianglonghe.github.io/TripoSF

  22. arXiv:2503.16426  [pdf, other

    cs.CV

    DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding

    Authors: Keyan Chen, Chenyang Liu, Bowen Chen, Wenyuan Li, Zhengxia Zou, Zhenwei Shi

    Abstract: The advancement of remote sensing technology has improved the spatial resolution of satellite imagery, facilitating more detailed visual representations for diverse interpretations. However, existing methods exhibit limited generalization capabilities across varied applications. While some contemporary foundation models demonstrate potential, they are hindered by insufficient cross-task adaptabili… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  23. arXiv:2503.13347  [pdf, other

    cs.CV

    TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis

    Authors: Jiaming Kang, Keyan Chen, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  24. arXiv:2503.07120  [pdf, other

    cs.CV cs.LG

    Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching

    Authors: Zhen Zou, Hu Yu, Jie Xiao, Feng Zhao

    Abstract: Diffusion Transformer (DiT) has exhibited impressive generation capabilities but faces great challenges due to its high computational complexity. To address this problem, various methods, notably feature caching, have been introduced. However, these approaches focus on aligning non-cache diffusion without analyzing the impact of caching on the generation of intermediate processes. So the lack of e… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  25. arXiv:2503.06320  [pdf, other

    cs.LG physics.comp-ph

    Learning and discovering multiple solutions using physics-informed neural networks with random initialization and deep ensemble

    Authors: Zongren Zou, Zhicheng Wang, George Em Karniadakis

    Abstract: We explore the capability of physics-informed neural networks (PINNs) to discover multiple solutions. Many real-world phenomena governed by nonlinear differential equations (DEs), such as fluid flow, exhibit multiple solutions under the same conditions, yet capturing this solution multiplicity remains a significant challenge. A key difficulty is giving appropriate initial conditions or initial gue… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  26. arXiv:2503.05163  [pdf, other

    cs.PL cs.DB

    Vbox: Efficient Black-Box Serializability Verification

    Authors: Weihua Sun, Zhaonian Zou

    Abstract: Verifying the serializability of transaction histories is essential for users to know if the DBMS ensures the claimed serializable isolation level without potential bugs. Black-box serializability verification is a promising approach. Existing verification methods often have one or more limitations such as incomplete detection of data anomalies, long verification time, high memory usage, or depend… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  27. LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera

    Authors: Weiyi Xiong, Zean Zou, Qiuchi Zhao, Fengchun He, Bing Zhu

    Abstract: As the previous state-of-the-art 4D radar-camera fusion-based 3D object detection method, LXL utilizes the predicted image depth distribution maps and radar 3D occupancy grids to assist the sampling-based image view transformation. However, the depth prediction lacks accuracy and consistency, and the concatenation-based fusion in LXL impedes the model robustness. In this work, we propose LXLv2, wh… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by IEEE Robotics and Automation Letters

  28. arXiv:2502.09654  [pdf, other

    eess.IV cs.CV

    Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution

    Authors: Bowen Chen, Keyan Chen, Mohan Yang, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote sensing image super-resolution (SR) aims to reconstruct high-resolution remote sensing images from low-resolution inputs, thereby addressing limitations imposed by sensors and imaging conditions. However, the inherent characteristics of remote sensing images, including diverse ground object types and complex details, pose significant challenges to achieving high-quality reconstruction. Exis… ▽ More

    Submitted 2 April, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  29. arXiv:2502.07295  [pdf, other

    cs.LG

    Treatment Effect Estimation for Exponential Family Outcomes using Neural Networks with Targeted Regularization

    Authors: Jiahong Li, Zeqin Yang, Jiayi Dan, Jixing Xu, Zhichao Zou, Peng Zhen, Jiecheng Guo

    Abstract: Neural Networks (NNs) have became a natural choice for treatment effect estimation due to their strong approximation capabilities. Nevertheless, how to design NN-based estimators with desirable properties, such as low bias and doubly robustness, still remains a significant challenge. A common approach to address this is targeted regularization, which modifies the objective function of NNs. However… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  30. arXiv:2502.06608  [pdf, other

    cs.CV cs.AI

    TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

    Authors: Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao

    Abstract: Recent advancements in diffusion techniques have propelled image and video generation to unprecedented levels of quality, significantly accelerating the deployment and application of generative AI. However, 3D shape generation technology has so far lagged behind, constrained by limitations in 3D data scale, complexity of 3D data processing, and insufficient exploration of advanced techniques in th… ▽ More

    Submitted 27 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  31. arXiv:2502.04630  [pdf, other

    cs.CV cs.GR

    High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting

    Authors: Zihao Zou, Ziyuan Qu, Xi Peng, Vivek Boominathan, Adithya Pediredla, Praneeth Chakravarthula

    Abstract: Capturing and reconstructing high-speed dynamic 3D scenes has numerous applications in computer graphics, vision, and interdisciplinary fields such as robotics, aerodynamics, and evolutionary biology. However, achieving this using a single imaging modality remains challenging. For instance, traditional RGB cameras suffer from low frame rates, limited exposure times, and narrow baselines. To addres… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  32. arXiv:2502.00031  [pdf, ps, other

    cs.SI cs.DB

    GNN-based Anchor Embedding for Efficient Exact Subgraph Matching

    Authors: Bin Yang, Zhaonian Zou, Jianxiong Ye

    Abstract: Subgraph matching query is a fundamental problem in graph data management and has a variety of real-world applications. Several recent works utilize deep learning (DL) techniques to process subgraph matching queries. Most of them find approximate subgraph matching results without accuracy guarantees. Unlike these DL-based inexact subgraph matching methods, we propose a learning-based exact subgrap… ▽ More

    Submitted 18 June, 2025; v1 submitted 23 January, 2025; originally announced February 2025.

  33. arXiv:2501.18876  [pdf

    physics.chem-ph cs.LG

    QMe14S, A Comprehensive and Efficient Spectral Dataset for Small Organic Molecules

    Authors: Mingzhi Yuan, Zihan Zou, Wei Hu

    Abstract: Developing machine learning protocols for molecular simulations requires comprehensive and efficient datasets. Here we introduce the QMe14S dataset, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: 11 pages, 4figures

  34. arXiv:2501.14404  [pdf, other

    cs.CV

    Kolmogorov Arnold Neural Interpolator for Downscaling and Correcting Meteorological Fields from In-Situ Observations

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Zhengxia Zou, Zhenwei Shi

    Abstract: Obtaining accurate weather forecasts at station locations is a critical challenge due to systematic biases arising from the mismatch between multi-scale, continuous atmospheric characteristic and their discrete, gridded representations. Previous works have primarily focused on modeling gridded meteorological data, inherently neglecting the off-grid, continuous nature of atmospheric states and leav… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  35. arXiv:2501.11238  [pdf, other

    cs.LG cs.AI physics.ao-ph

    WSSM: Geographic-enhanced hierarchical state-space model for global station weather forecast

    Authors: Songru Yang, Zili Liu, Zhenwei Shi, Zhengxia Zou

    Abstract: Global Station Weather Forecasting (GSWF), a prominent meteorological research area, is pivotal in providing timely localized weather predictions. Despite the progress existing models have made in the overall accuracy of the GSWF, executing high-precision extreme event prediction still presents a substantial challenge. The recent emergence of state-space models, with their ability to efficiently c… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

  36. arXiv:2501.06809  [pdf, other

    cs.CV

    RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models

    Authors: Keyan Chen, Jiafan Zhang, Chenyang Liu, Zhengxia Zou, Zhenwei Shi

    Abstract: Referring remote sensing image segmentation is crucial for achieving fine-grained visual understanding through free-format textual input, enabling enhanced scene and object extraction in remote sensing applications. Current research primarily utilizes pre-trained language models to encode textual descriptions and align them with visual modalities, thereby facilitating the expression of relevant vi… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  37. arXiv:2501.02969  [pdf, other

    cs.LG

    LOHA: Direct Graph Spectral Contrastive Learning Between Low-pass and High-pass Views

    Authors: Ziyun Zou, Yinghui Jiang, Lian Shen, Juan Liu, Xiangrong Liu

    Abstract: Spectral Graph Neural Networks effectively handle graphs with different homophily levels, with low-pass filter mining feature smoothness and high-pass filter capturing differences. When these distinct filters could naturally form two opposite views for self-supervised learning, the commonalities between the counterparts for the same node remain unexplored, leading to suboptimal performance. In thi… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted at AAAI2025

  38. arXiv:2501.01874  [pdf, other

    cs.LG

    DFF: Decision-Focused Fine-tuning for Smarter Predict-then-Optimize with Limited Data

    Authors: Jiaqi Yang, Enming Liang, Zicheng Su, Zhichao Zou, Peng Zhen, Jiecheng Guo, Wanjing Ma, Kun An

    Abstract: Decision-focused learning (DFL) offers an end-to-end approach to the predict-then-optimize (PO) framework by training predictive models directly on decision loss (DL), enhancing decision-making performance within PO contexts. However, the implementation of DFL poses distinct challenges. Primarily, DL can result in deviation from the physical significance of the predictions under limited data. Addi… ▽ More

    Submitted 13 April, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: 12 pages, 4 figures, The 39th Annual AAAI Conference on Artificial Intelligence

  39. arXiv:2501.00895  [pdf, other

    cs.CV

    Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

    Authors: Chenyang Liu, Keyan Chen, Rui Zhao, Zhengxia Zou, Zhenwei Shi

    Abstract: Generative foundation models have advanced large-scale text-driven natural image generation, becoming a prominent research trend across various vertical domains. However, in the remote sensing field, there is still a lack of research on large-scale text-to-image (text2image) generation technology. Existing remote sensing image-text datasets are small in scale and confined to specific geographic ar… ▽ More

    Submitted 20 March, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

  40. arXiv:2412.17635  [pdf, other

    cs.CV

    LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

    Authors: Hao Li, Roy Qin, Zhengyu Zou, Diqi He, Bohan Li, Bingquan Dai, Dingewn Zhang, Junwei Han

    Abstract: Applying Gaussian Splatting to perception tasks for 3D scene understanding is becoming increasingly popular. Most existing works primarily focus on rendering 2D feature maps from novel viewpoints, which leads to an imprecise 3D language field with outlier languages, ultimately failing to align objects in 3D space. By utilizing masked images for feature extraction, these approaches also lack essent… ▽ More

    Submitted 23 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: \url{https://langsurf.github.io}

  41. arXiv:2412.06191  [pdf, other

    cs.CV

    Event fields: Capturing light fields at high speed, resolution, and dynamic range

    Authors: Ziyuan Qu, Zihao Zou, Vivek Boominathan, Praneeth Chakravarthula, Adithya Pediredla

    Abstract: Event cameras, which feature pixels that independently respond to changes in brightness, are becoming increasingly popular in high-speed applications due to their lower latency, reduced bandwidth requirements, and enhanced dynamic range compared to traditional frame-based cameras. Numerous imaging and vision techniques have leveraged event cameras for high-speed scene understanding by capturing hi… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  42. arXiv:2412.05969  [pdf, other

    cs.CV

    Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

    Authors: Zipeng Qi, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

    Abstract: In this paper, we propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results. Leveraging the explicit structure of point clouds and a one-time rendering strategy, our approach sign… ▽ More

    Submitted 12 December, 2024; v1 submitted 8 December, 2024; originally announced December 2024.

  43. arXiv:2412.03558  [pdf, other

    cs.CV

    MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

    Authors: Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, Lu Sheng

    Abstract: This paper introduces MIDI, a novel paradigm for compositional 3D scene generation from a single image. Unlike existing methods that rely on reconstruction or retrieval techniques or recent approaches that employ multi-stage object-by-object generation, MIDI extends pre-trained image-to-3D object generation models to multi-instance diffusion models, enabling the simultaneous generation of multiple… ▽ More

    Submitted 27 May, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

    Comments: Project page: https://huanngzh.github.io/MIDI-Page/

  44. arXiv:2412.02573  [pdf, other

    cs.CV

    Remote Sensing Spatio-Temporal Vision-Language Models: A Comprehensive Survey

    Authors: Chenyang Liu, Jiafan Zhang, Keyan Chen, Man Wang, Zhengxia Zou, Zhenwei Shi

    Abstract: The interpretation of multi-temporal remote sensing imagery is critical for monitoring Earth's dynamic processes-yet previous change detection methods, which produce binary or semantic masks, fall short of providing human-readable insights into changes. Recent advances in Vision-Language Models (VLMs) have opened a new frontier by fusing visual and linguistic modalities, enabling spatio-temporal v… ▽ More

    Submitted 22 May, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  45. arXiv:2412.02447  [pdf, other

    cs.CV

    Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations

    Authors: Conghao Wong, Ziqian Zou, Beihao Xia, Xinge You

    Abstract: Learning to forecast trajectories of intelligent agents has caught much more attention recently. However, it remains a challenge to accurately account for agents' intentions and social behaviors when forecasting, and in particular, to simulate the unique randomness within each of those components in an explainable and decoupled way. Inspired by vibration systems and their resonance properties, we… ▽ More

    Submitted 9 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  46. arXiv:2412.02395  [pdf, other

    cs.CV

    Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction

    Authors: Ziqian Zou, Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You

    Abstract: Understanding and anticipating human movement has become more critical and challenging in diverse applications such as autonomous driving and surveillance. The complex interactions brought by different relations between agents are a crucial reason that poses challenges to this task. Researchers have put much effort into designing a system using rule-based or data-based models to extract and valida… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: 15 pages, 10 figures, submitted to CVPR 2025

  47. arXiv:2411.16820  [pdf, other

    cs.CV cs.GR

    DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow

    Authors: Ken Deng, Yuan-Chen Guo, Jingxiang Sun, Zi-Xin Zou, Yangguang Li, Xin Cai, Yan-Pei Cao, Yebin Liu, Ding Liang

    Abstract: Modern 3D generation methods can rapidly create shapes from sparse or single views, but their outputs often lack geometric detail due to computational constraints. We present DetailGen3D, a generative approach specifically designed to enhance these generated 3D shapes. Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space, avoiding the… ▽ More

    Submitted 1 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  48. TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs

    Authors: Fan Wang, Zhilin Zou, Nicole Sakla, Luke Partyka, Nil Rawal, Gagandeep Singh, Wei Zhao, Haibin Ling, Chuan Huang, Prateek Prasanna, Chao Chen

    Abstract: Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a no… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: 22 pages, 8 figures, 8 tables, accepted by Medical Image Analysis ( https://www.sciencedirect.com/science/article/abs/pii/S1361841524002986 )

    Journal ref: Volume 99, 2025, 103373

  49. arXiv:2410.23225  [pdf, ps, other

    cs.DS cs.DM

    Deterministic counting from coupling independence

    Authors: Xiaoyu Chen, Weiming Feng, Heng Guo, Xinyuan Zhang, Zongrui Zou

    Abstract: We show that spin systems with bounded degrees and coupling independence admit fully polynomial time approximation schemes (FPTAS). We design a new recursive deterministic counting algorithm to achieve this. As applications, we give the first FPTASes for $q$-colourings on graphs of bounded maximum degree $Δ\ge 3$, when $q\ge (11/6-\varepsilon_0)Δ$ for some small $\varepsilon_0\approx 10^{-5}$, or… ▽ More

    Submitted 5 April, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

  50. arXiv:2410.22394  [pdf, other

    cs.CL

    AAAR-1.0: Assessing AI's Potential to Assist Research

    Authors: Renze Lou, Hanzi Xu, Sijia Wang, Jiangshu Du, Ryo Kamoi, Xiaoxin Lu, Jian Xie, Yuxuan Sun, Yusen Zhang, Jihyun Janice Ahn, Hongchao Fang, Zhuoyang Zou, Wenchao Ma, Xi Li, Kai Zhang, Congying Xia, Lifu Huang, Wenpeng Yin

    Abstract: Numerous studies have assessed the proficiency of AI systems, particularly large language models (LLMs), in facilitating everyday tasks such as email writing, question answering, and creative content generation. However, researchers face unique challenges and opportunities in leveraging LLMs for their own work, such as brainstorming research ideas, designing experiments, and writing or reviewing p… ▽ More

    Submitted 25 May, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: ICML 2025. Project Webpage: https://renzelou.github.io/AAAR-1.0/