Skip to main content

Showing 1–50 of 155 results for author: Zou, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22518  [pdf, ps, other

    cs.CL cs.AI

    Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

    Authors: Deyu Zou, Yongqiang Chen, Mufei Li, Siqi Miao, Chenxi Liu, Bo Han, James Cheng, Pan Li

    Abstract: Graph-based retrieval-augmented generation (RAG) enables large language models (LLMs) to ground responses with structured external knowledge from up-to-date knowledge graphs (KGs) and reduce hallucinations. However, LLMs often rely on a weak retriever in graph-based RAG: I) Due to the lack of ground truth, the retriever is often trained on weak supervision, which often introduces spurious signals… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.18656  [pdf, ps, other

    stat.ML cs.LG math.ST

    A Random Matrix Analysis of In-context Memorization for Nonlinear Attention

    Authors: Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling

    Abstract: Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theore… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 40 pages, 7 pages

  3. arXiv:2506.17740  [pdf, ps, other

    eess.SP cs.LG

    Rethinking the Role of Operating Conditions for Learning-based Multi-condition Fault Diagnosis

    Authors: Pengyu Han, Zeyi Liu, Shijin Chen, Dongliang Zou, Xiao He

    Abstract: Multi-condition fault diagnosis is prevalent in industrial systems and presents substantial challenges for conventional diagnostic approaches. The discrepancy in data distributions across different operating conditions degrades model performance when a model trained under one condition is applied to others. With the recent advancements in deep learning, transfer learning has been introduced to the… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures, conference

  4. arXiv:2506.11697  [pdf, ps, other

    cs.SE

    SoK: Automated Vulnerability Repair: Methods, Tools, and Assessments

    Authors: Yiwei Hu, Zhen Li, Kedie Shu, Shenghua Guan, Deqing Zou, Shouhuai Xu, Bin Yuan, Hai Jin

    Abstract: The increasing complexity of software has led to the steady growth of vulnerabilities. Vulnerability repair investigates how to fix software vulnerabilities. Manual vulnerability repair is labor-intensive and time-consuming because it relies on human experts, highlighting the importance of Automated Vulnerability Repair (AVR). In this SoK, we present the systematization of AVR methods through the… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: The full version of "SoK: Automated Vulnerability Repair: Methods, Tools, and Assessments" accepted by the 34th USENIX Security Symposium (USENIX Security 2025)

  5. arXiv:2506.04695  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

    Authors: Xingwu Chen, Tianle Li, Difan Zou

    Abstract: Reinforcement learning (RL) has demonstrated remarkable success in enhancing model capabilities, including instruction-following, preference learning, and reasoning. Yet despite its empirical successes, the mechanisms by which RL improves reasoning abilities remain poorly understood. We present a systematic study of Reinforcement Learning with Verifiable Rewards (RLVR), showing that its primary be… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 30 pages, 6 figures, 1 table

  6. arXiv:2505.24428  [pdf, other

    cs.CL cs.LG

    Model Unlearning via Sparse Autoencoder Subspace Guided Projections

    Authors: Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou

    Abstract: Large language models (LLMs) store vast amounts of information, making them powerful yet raising privacy and safety concerns when selective knowledge removal is required. Existing unlearning strategies, ranging from gradient-based fine-tuning and model editing to sparse autoencoder (SAE) steering, either lack interpretability or fail to provide a robust defense against adversarial prompts. We prop… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  7. arXiv:2505.22391  [pdf, ps, other

    cs.LG cs.AI cs.CE math.NA

    Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation

    Authors: Yi Zhang, Difan Zou

    Abstract: Modeling physical systems in a generative manner offers several advantages, including the ability to handle partial observations, generate diverse solutions, and address both forward and inverse problems. Recently, diffusion models have gained increasing attention in the modeling of physical systems, particularly those governed by partial differential equations (PDEs). However, diffusion models on… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 23 pages, 5 figures, 4 tables

  8. arXiv:2505.21892  [pdf, ps, other

    stat.ML cs.LG

    Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion

    Authors: Xunpeng Huang, Yingyu Lin, Nikki Lijing Kuang, Hanze Dong, Difan Zou, Yian Ma, Tong Zhang

    Abstract: Continuous diffusion models have demonstrated remarkable performance in data generation across various domains, yet their efficiency remains constrained by two critical limitations: (1) the local adjacency structure of the forward Markov process, which restricts long-range transitions in the data space, and (2) inherent biases introduced during the simulation of time-inhomogeneous reverse denoisin… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 37 pages, 3 figures, 3 tables

  9. arXiv:2504.21314  [pdf, other

    cs.LG stat.ML

    Capturing Conditional Dependence via Auto-regressive Diffusion Models

    Authors: Xunpeng Huang, Yujin Han, Difan Zou, Yian Ma, Tong Zhang

    Abstract: Diffusion models have demonstrated appealing performance in both image and video generation. However, many works discover that they struggle to capture important, high-level relationships that are present in the real world. For example, they fail to learn physical laws from data, and even fail to understand that the objects in the world exist in a stable fashion. This is due to the fact that impor… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  10. arXiv:2504.09261  [pdf, other

    cs.CV

    Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling

    Authors: Ziran Qin, Youru Lv, Mingbao Lin, Zeren Zhang, Danping Zou, Weiyao Lin

    Abstract: Visual Autoregressive (VAR) models have emerged as a powerful approach for multi-modal content creation, offering high efficiency and quality across diverse multimedia applications. However, they face significant memory bottlenecks due to extensive KV cache accumulation during inference. Existing KV cache compression techniques for large language models are suboptimal for VAR models due to, as we… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  11. arXiv:2504.08628  [pdf, other

    stat.ML cs.LG

    Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

    Authors: Chenyang Zhang, Peifeng Gao, Difan Zou, Yuan Cao

    Abstract: Modern neural networks are usually highly over-parameterized. Behind the wide usage of over-parameterized networks is the belief that, if the data are simple, then the trained network will be automatically equivalent to a simple predictor. Following this intuition, many existing works have studied different notions of "ranks" of neural networks and their relation to the rank of data. In this work,… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 43 pages, 4 figures

  12. arXiv:2504.07418  [pdf, other

    cs.CV

    ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement

    Authors: Anning Hu, Ang Li, Xirui Jin, Danping Zou

    Abstract: We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scal… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 7 pages, 6 figures, 4 tables. Accepted to IEEE ICRA 2025. This is the preprint version

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2025

  13. arXiv:2504.01873  [pdf, other

    cs.CV

    A Diffusion-Based Framework for Occluded Object Movement

    Authors: Zheng-Peng Duan, Jiawei Zhang, Siyu Liu, Zheng Lin, Chun-Le Guo, Dongqing Zou, Jimmy Ren, Chongyi Li

    Abstract: Seamlessly moving objects within a scene is a common requirement for image editing, but it is still a challenge for existing editing methods. Especially for real-world images, the occlusion situation further increases the difficulty. The main difficulty is that the occluded portion needs to be completed before movement can proceed. To leverage the real-world knowledge embedded in the pre-trained d… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  14. arXiv:2503.23580  [pdf, ps, other

    cs.CV

    DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

    Authors: Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S. Ren, Chun-Le Guo, Chongyi Li

    Abstract: Large-scale pre-trained diffusion models are becoming increasingly popular in solving the Real-World Image Super-Resolution (Real-ISR) problem because of their rich generative priors. The recent development of diffusion transformer (DiT) has witnessed overwhelming performance over the traditional UNet-based architecture in image generation, which also raises the question: Can we adopt the advanced… ▽ More

    Submitted 6 July, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  15. arXiv:2503.10141  [pdf, other

    cs.RO

    Mapless Collision-Free Flight via MPC using Dual KD-Trees in Cluttered Environments

    Authors: Linzuo Zhang, Yu Hu, Yang Deng, Feng Yu, Danping Zou

    Abstract: Collision-free flight in cluttered environments is a critical capability for autonomous quadrotors. Traditional methods often rely on detailed 3D map construction, trajectory generation, and tracking. However, this cascade pipeline can introduce accumulated errors and computational delays, limiting flight agility and safety. In this paper, we propose a novel method for enabling collision-free flig… ▽ More

    Submitted 20 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  16. arXiv:2503.07065  [pdf, other

    cs.CV

    Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning

    Authors: Huilin Deng, Ding Zou, Rui Ma, Hongchen Luo, Yang Cao, Yu Kang

    Abstract: While state-of-the-art vision-language models (VLMs) have demonstrated remarkable capabilities in complex visual-text tasks, their success heavily relies on massive model scaling, limiting their practical deployment. Small-scale VLMs offer a more practical alternative but face significant challenges when trained with traditional supervised fine-tuning (SFT), particularly in two aspects: out-of-dom… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  17. arXiv:2503.06136  [pdf, other

    cs.CV cs.AI

    GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation

    Authors: Ye Tao, Jiawei Zhang, Yahao Shi, Dongqing Zou, Bin Zhou

    Abstract: Image-based 3D generation has vast applications in robotics and gaming, where high-quality, diverse outputs and consistent 3D representations are crucial. However, existing methods have limitations: 3D diffusion models are limited by dataset scarcity and the absence of strong pre-trained priors, while 2D diffusion-based approaches struggle with geometric consistency. We propose a method that lever… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  18. arXiv:2503.01152  [pdf, other

    cs.LG cs.AI

    STGAN: Spatial-temporal Graph Autoregression Network for Pavement Distress Deterioration Prediction

    Authors: Shilin Tong, Difei Wu, Xiaona Liu, Le Zheng, Yuchuan Du, Difan Zou

    Abstract: Pavement distress significantly compromises road integrity and poses risks to drivers. Accurate prediction of pavement distress deterioration is essential for effective road management, cost reduction in maintenance, and improvement of traffic safety. However, real-world data on pavement distress is usually collected irregularly, resulting in uneven, asynchronous, and sparse spatial-temporal datas… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 16 pages, 16 figures, 4 tables, accepted by IEEE Transactions on Intelligent Transportation Systems (TITS)

  19. arXiv:2502.15609  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    On the Robustness of Transformers against Context Hijacking for Linear Classification

    Authors: Tianle Li, Chenyang Zhang, Xingwu Chen, Yuan Cao, Difan Zou

    Abstract: Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  20. arXiv:2502.11890  [pdf, other

    cs.CL

    Revisiting Classification Taxonomy for Grammatical Errors

    Authors: Deqing Zou, Jingheng Ye, Yulu Liu, Yu Wu, Zishan Xu, Yinghui Li, Hai-Tao Zheng, Bingxu An, Zhao Wei, Yong Xu

    Abstract: Grammatical error classification plays a crucial role in language learning systems, but existing classification taxonomies often lack rigorous validation, leading to inconsistencies and unreliable feedback. In this paper, we revisit previous classification taxonomies for grammatical errors by introducing a systematic and qualitative evaluation framework. Our approach examines four aspects of a tax… ▽ More

    Submitted 17 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 26 pages, 4 figures and 5 tables

  21. arXiv:2502.11812  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

    Authors: Xu Wang, Yan Hu, Wenyu Du, Reynold Cheng, Benyou Wang, Difan Zou

    Abstract: Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024; Chhabra et al. 2024) that focus on tasks where pre-tra… ▽ More

    Submitted 13 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 25 pages

  22. arXiv:2502.11646  [pdf, ps, other

    cs.LG

    Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization

    Authors: Yunzhe Hu, Difan Zou, Dong Xu

    Abstract: Transformer-based models have achieved remarkable success, but their core components, Transformer layers, are largely heuristics-driven and engineered from the bottom up, calling for a prototypical model with high interpretability and practical competence. To this end, we conceptualize a principled, top-down approach grounded in energy-based interpretation. Specifically, we formalize token dynamic… ▽ More

    Submitted 30 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 31 pages

  23. arXiv:2502.05467  [pdf, other

    cs.CL cs.AI

    Position: LLMs Can be Good Tutors in Foreign Language Education

    Authors: Jingheng Ye, Shen Wang, Deqing Zou, Yibo Yan, Kun Wang, Hai-Tao Zheng, Zenglin Xu, Irwin King, Philip S. Yu, Qingsong Wen

    Abstract: While recent efforts have begun integrating large language models (LLMs) into foreign language education (FLE), they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to language learning. To address this gap, we argue that LLMs have the potential to serve as effective tutors in FLE. Specifically, LLMs can play three… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: 18 pages, 4 figures

  24. arXiv:2502.04725  [pdf, other

    cs.CV cs.AI

    Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

    Authors: Yujin Han, Andi Han, Wei Huang, Chaochao Lu, Difan Zou

    Abstract: Despite the remarkable success of diffusion models (DMs) in data generation, they exhibit specific failure cases with unsatisfactory outputs. We focus on one such limitation: the ability of DMs to learn hidden rules between image features. Specifically, for image data with dependent features ($\mathbf{x}$) and ($\mathbf{y}$) (e.g., the height of the sun ($\mathbf{x}$) and the length of the shadow… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 25 pages, 18 figures, 3 tables

  25. arXiv:2502.03810  [pdf, other

    cs.CV

    DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

    Authors: Lingshun Kong, Jiawei Zhang, Dongqing Zou, Jimmy Ren, Xiaohe Wu, Jiangxin Dong, Jinshan Pan

    Abstract: Diffusion models have achieved significant progress in image generation. The pre-trained Stable Diffusion (SD) models are helpful for image deblurring by providing clear image priors. However, directly using a blurry image or pre-deblurred one as a conditional control for SD will either hinder accurate structure extraction or make the results overly dependent on the deblurring network. In this wor… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  26. arXiv:2502.03587  [pdf, other

    cs.LG stat.ML

    Stein Discrepancy for Unsupervised Domain Adaptation

    Authors: Anneke von Seeger, Dongmian Zou, Gilad Lerman

    Abstract: Unsupervised domain adaptation (UDA) leverages information from a labeled source dataset to improve accuracy on a related but unlabeled target dataset. A common approach to UDA is aligning representations from the source and target domains by minimizing the distance between their data distributions. Previous methods have employed distances such as Wasserstein distance and maximum mean discrepancy.… ▽ More

    Submitted 21 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 24 pages, 9 figures

  27. arXiv:2502.03444  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Masked Autoencoders Are Effective Tokenizers for Diffusion Models

    Authors: Hao Chen, Yujin Han, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou, Bhiksha Raj

    Abstract: Recent advances in latent diffusion models have demonstrated their effectiveness for high-resolution image synthesis. However, the properties of the latent space from tokenizer for better learning and generation of diffusion models remain under-explored. Theoretically and empirically, we find that improved generation quality is closely tied to the latent distributions with better structure, such a… ▽ More

    Submitted 30 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  28. arXiv:2501.14513  [pdf, other

    cs.RO cs.AI cs.LG

    ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards

    Authors: Fanxing Li, Fangyu Sun, Tianbao Zhang, Danping Zou

    Abstract: Quadrotor control policies can be trained with high performance using the exact gradients of the rewards to directly optimize policy parameters via backpropagation-through-time (BPTT). However, designing a fully differentiable reward architecture is often challenging. Partially differentiable rewards will result in biased gradient propagation that degrades training performance. To overcome this li… ▽ More

    Submitted 21 May, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  29. arXiv:2501.09876  [pdf, other

    math.NA cs.LG

    Geometry-Preserving Encoder/Decoder in Latent Generative Models

    Authors: Wonjun Lee, Riley C. W. O'Neill, Dongmian Zou, Jeff Calder, Gilad Lerman

    Abstract: Generative modeling aims to generate new data samples that resemble a given dataset, with diffusion models recently becoming the most popular generative model. One of the main challenges of diffusion models is solving the problem in the input space, which tends to be very high-dimensional. Recently, solving diffusion models in the latent space through an encoder that maps from the data space to a… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 41 pages

  30. arXiv:2501.05040  [pdf, other

    cs.CL

    SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

    Authors: Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen

    Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks. One significant application of LLMs is in tackling software engineering challenges, particularly in resolving real-world tasks on GitHub by fixing code based on the issues reported by the users. However, many current approaches rely on proprietary LLMs, which limits reproducibility, accessibili… ▽ More

    Submitted 7 May, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

    Comments: Our code, data, and model will be released at https://github.com/InternLM/SWE-Fixer

  31. arXiv:2412.15119  [pdf, other

    cs.CV

    Parallelized Autoregressive Visual Generation

    Authors: Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu

    Abstract: Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for parallelized autoregressive visual generation that improves generation efficiency while preserving the advantages of autoregressive modeling. Our key insight is t… ▽ More

    Submitted 2 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

    Comments: CVPR 2025 Accepted - Project Page: https://yuqingwang1029.github.io/PAR-project

  32. arXiv:2412.02960  [pdf, other

    cs.CV

    Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution

    Authors: Jiahua Xiao, Jiawei Zhang, Dongqing Zou, Xiaodan Zhang, Jimmy Ren, Xing Wei

    Abstract: Real-world image super-resolution (Real-ISR) has achieved a remarkable leap by leveraging large-scale text-to-image models, enabling realistic image restoration from given recognition textual prompts. However, these methods sometimes fail to recognize some salient objects, resulting in inaccurate semantic restoration in these regions. Additionally, the same region may have a strong response to mor… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  33. arXiv:2412.01021  [pdf, other

    stat.ML cs.LG

    On the Feature Learning in Diffusion Models

    Authors: Andi Han, Wei Huang, Yuan Cao, Difan Zou

    Abstract: The predominant success of diffusion models in generative modeling has spurred significant interest in understanding their theoretical foundations. In this work, we propose a feature learning framework aimed at analyzing and comparing the training dynamics of diffusion models with those of traditional classification models. Our theoretical analysis demonstrates that diffusion models, due to the de… ▽ More

    Submitted 2 March, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

  34. arXiv:2411.19456  [pdf, other

    cs.CL cs.AI

    Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability

    Authors: Yujin Han, Lei Xu, Sirui Chen, Difan Zou, Chaochao Lu

    Abstract: Large language models (LLMs) have shown remarkable capability in natural language tasks, yet debate persists on whether they truly comprehend deep structure (i.e., core semantics) or merely rely on surface structure (e.g., presentation format). Prior studies observe that LLMs' performance declines when intervening on surface structure, arguing their success relies on surface structure recognition.… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

    Comments: 28 pages, 14 figures, 10 tables

  35. arXiv:2411.17182  [pdf, other

    cs.LG

    An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models

    Authors: Yunzhe Hu, Difan Zou, Dong Xu

    Abstract: Deep neural networks have long been criticized for being black-box. To unveil the inner workings of modern neural architectures, a recent work \cite{yu2024white} proposed an information-theoretic objective function called Sparse Rate Reduction (SRR) and interpreted its unrolled optimization as a Transformer-like model called Coding Rate Reduction Transformer (CRATE). However, the focus of the stud… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  36. arXiv:2411.04413  [pdf, other

    cs.RO

    Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

    Authors: Yu Hu, Yuang Zhang, Yunlong Song, Yang Deng, Feng Yu, Linzuo Zhang, Weiyao Lin, Danping Zou, Wenxian Yu

    Abstract: Optical flow captures the motion of pixels in an image sequence over time, providing information about movement, depth, and environmental structure. Flying insects utilize this information to navigate and avoid obstacles, allowing them to execute highly agile maneuvers even in complex environments. Despite its potential, autonomous flying robots have yet to fully leverage this motion information t… ▽ More

    Submitted 19 April, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

  37. arXiv:2410.21676  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    How Does Critical Batch Size Scale in Pre-training?

    Authors: Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham Kakade

    Abstract: Training large-scale models under given resources requires careful design of parallelism strategies. In particular, the efficiency notion of critical batch size (CBS), concerning the compromise between time and compute, marks the threshold beyond which greater data parallelism leads to diminishing returns. To operationalize it, we propose a measure of CBS and pre-train a series of auto-regressive… ▽ More

    Submitted 21 April, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: ICLR 2025, Blog post: https://kempnerinstitute.harvard.edu/research/deeper-learning/how-does-critical-batch-size-scale-in-pre-training-decoupling-data-and-model-size

  38. arXiv:2410.19933  [pdf, other

    cs.LG cs.AI cs.CY

    Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

    Authors: Xiyue Peng, Hengquan Guo, Jiawei Zhang, Dongqing Zou, Ziyu Shao, Honghao Wei, Xin Liu

    Abstract: Balancing helpfulness and safety (harmlessness) is a critical challenge in aligning large language models (LLMs). Current approaches often decouple these two objectives, training separate preference models for helpfulness and safety, while framing safety as a constraint within a constrained Markov Decision Process (CMDP) framework. This paper identifies a potential issue when using the widely adop… ▽ More

    Submitted 27 February, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

  39. arXiv:2410.19139  [pdf, other

    cs.LG stat.ML

    Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

    Authors: Shuning Shang, Xuran Meng, Yuan Cao, Difan Zou

    Abstract: Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully tr… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 80 pages, 3 figures, 1 table

  40. arXiv:2410.16813  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Klein Model for Hyperbolic Neural Networks

    Authors: Yidan Mao, Jing Gu, Marcus C. Werner, Dongmian Zou

    Abstract: Hyperbolic neural networks (HNNs) have been proved effective in modeling complex data structures. However, previous works mainly focused on the Poincaré ball model and the hyperboloid model as coordinate representations of the hyperbolic space, often neglecting the Klein model. Despite this, the Klein model offers its distinct advantages thanks to its straight-line geodesics, which facilitates the… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024 Symmetry and Geometry in Neural Representations Workshop

  41. arXiv:2410.02467  [pdf, other

    cs.LG cs.CR cs.CV

    Extracting Training Data from Unconditional Diffusion Models

    Authors: Yunhao Chen, Shujie Wang, Difan Zou, Xingjun Ma

    Abstract: As diffusion probabilistic models (DPMs) are being employed as mainstream models for Generative Artificial Intelligence (GenAI), the study of their memorization has attracted growing attention. Existing works in this field aim to establish an understanding of whether or to what extent DPMs learn via memorization. Such an understanding is crucial for identifying potential risks of data leakage and… ▽ More

    Submitted 10 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

  42. arXiv:2408.04532  [pdf, other

    cs.LG

    How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

    Authors: Xingwu Chen, Lei Zhao, Difan Zou

    Abstract: Despite the remarkable success of transformer-based models in various real-world tasks, their underlying mechanisms remain poorly understood. Recent studies have suggested that transformers can implement gradient descent as an in-context learner for linear regression problems and have developed various theoretical analyses accordingly. However, these works mostly focus on the expressive power of t… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  43. arXiv:2408.04138  [pdf, other

    cs.CL cs.AI

    Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering

    Authors: Haoran Yu, Chang Yu, Zihan Wang, Dongxian Zou, Hao Qin

    Abstract: In recent years, the application of Large Language Models (LLMs) in healthcare has shown significant promise in improving the accessibility and dissemination of medical knowledge. This paper presents a detailed study of various LLMs trained on the MedQuAD medical question-answering dataset, with a focus on identifying the most effective model for providing accurate medical information. Among the m… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: received by IEEE ICPICS

  44. arXiv:2407.14783  [pdf, other

    cs.RO

    VisFly: An Efficient and Versatile Simulator for Training Vision-based Flight

    Authors: Fanxing Li, Fangyu Sun, Tianbao Zhang, Danping Zou

    Abstract: We present VisFly, a quadrotor simulator designed to efficiently train vision-based flight policies using reinforcement learning algorithms. VisFly offers a user-friendly framework and interfaces, leveraging Habitat-Sim's rendering engines to achieve frame rates exceeding 10,000 frames per second for rendering motion and sensor data. The simulator incorporates differentiable physics and is seamles… ▽ More

    Submitted 9 September, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  45. Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

    Authors: Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, Weiyao Lin

    Abstract: Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Journal ref: Nature Machine Intellegence 2025

  46. arXiv:2407.10495  [pdf, other

    cs.LG cs.CV

    Improving Hyperbolic Representations via Gromov-Wasserstein Regularization

    Authors: Yifei Yang, Wonjun Lee, Dongmian Zou, Gilad Lerman

    Abstract: Hyperbolic representations have shown remarkable efficacy in modeling inherent hierarchies and complexities within data structures. Hyperbolic neural networks have been commonly applied for learning such representations from data, but they often fall short in preserving the geometric structures of the original feature spaces. In response to this challenge, our work applies the Gromov-Wasserstein (… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  47. arXiv:2407.03757  [pdf, other

    cs.CV

    DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts

    Authors: Zheng-Peng Duan, Jiawei zhang, Zheng Lin, Xin Jin, Dongqing Zou, Chunle Guo, Chongyi Li

    Abstract: Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  48. arXiv:2406.12752   

    cs.CR cs.CV cs.LG

    Extracting Training Data from Unconditional Diffusion Models

    Authors: Yunhao Chen, Xingjun Ma, Difan Zou, Yu-Gang Jiang

    Abstract: As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential r… ▽ More

    Submitted 14 October, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: This is an old version of arXiv:2410.02467. Please refer to the new version

  49. arXiv:2406.10650  [pdf, other

    stat.ML cs.LG

    The Implicit Bias of Adam on Separable Data

    Authors: Chenyang Zhang, Difan Zou, Yuan Cao

    Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  50. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Dan Zhang, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose SelfControl, an inference-time model control method utilizing gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a desired behavior expressed in a natural language suffix string concatenated to the input prompt, SelfControl computes gradients of the LLM's self-evaluation of the suffix with respect to its latent representations. Th… ▽ More

    Submitted 12 October, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Website: https://llm-self-control.github.io/