Skip to main content

Showing 1–50 of 96 results for author: Zha, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06291  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Improvement of Optimization using Learning Based Models in Mixed Integer Linear Programming Tasks

    Authors: Xiaoke Wang, Batuhan Altundas, Zhaoxin Li, Aaron Zhao, Matthew Gombolay

    Abstract: Mixed Integer Linear Programs (MILPs) are essential tools for solving planning and scheduling problems across critical industries such as construction, manufacturing, and logistics. However, their widespread adoption is limited by long computational times, especially in large-scale, real-time scenarios. To address this, we present a learning-based framework that leverages Behavior Cloning (BC) and… ▽ More

    Submitted 16 May, 2025; originally announced June 2025.

    Comments: 4 pages, 4 figures

  2. arXiv:2506.04179  [pdf, other

    cs.CL

    SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling

    Authors: Anhao Zhao, Fanghua Ye, Yingqi Fan, Junlong Tong, Zhiwei Fei, Hui Su, Xiaoyu Shen

    Abstract: Large language models (LLMs) achieve remarkable performance across tasks but incur substantial computational costs due to their deep, multi-layered architectures. Layer pruning has emerged as a strategy to alleviate these inefficiencies, but conventional static pruning methods overlook two critical dynamics inherent to LLM inference: (1) horizontal dynamics, where token-level heterogeneity demands… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  3. arXiv:2506.01939  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Authors: Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well understood. In this work, we undertake a pioneering exploration of RLVR through the novel perspective of token entropy patterns, comprehensively analyzing how different tokens influence reasoning perf… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 25 pages, 17 figures, 2 tables

  4. arXiv:2506.01302  [pdf, ps, other

    cs.LG q-bio.QM

    Recent Developments in GNNs for Drug Discovery

    Authors: Zhengyu Fang, Xiaoge Zhang, Anyin Zhao, Xiao Li, Huiyuan Chen, Jing Li

    Abstract: In this paper, we review recent developments and the role of Graph Neural Networks (GNNs) in computational drug discovery, including molecule generation, molecular property prediction, and drug-drug interaction prediction. By summarizing the most recent developments in this area, we underscore the capabilities of GNNs to comprehend intricate molecular patterns, while exploring both their current a… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  5. arXiv:2505.22194  [pdf, ps, other

    cs.AR

    Refining Datapath for Microscaling ViTs

    Authors: Can Xiao, Jianyi Cheng, Aaron Zhao

    Abstract: Vision Transformers (ViTs) leverage the transformer architecture to effectively capture global context, demonstrating strong performance in computer vision tasks. A major challenge in ViT hardware acceleration is that the model family contains complex arithmetic operations that are sensitive to model accuracy, such as the Softmax and LayerNorm operations, which cannot be mapped onto efficient hard… ▽ More

    Submitted 15 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted at FPL'2025

  6. arXiv:2505.20872  [pdf, ps, other

    cs.CV cs.AI cs.LG

    In Context Learning with Vision Transformers: Case Study

    Authors: Antony Zhao, Alex Proshkin, Fergal Hennessy, Francesco Crivelli

    Abstract: Large transformer models have been shown to be capable of performing in-context learning. By using examples in a prompt as well as a query, they are capable of performing tasks such as few-shot, one-shot, or zero-shot learning to output the corresponding answer to this query. One area of interest to us is that these transformer models have been shown to be capable of learning the general class of… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 12 pages, 16 figures. UC Berkeley research project

    ACM Class: I.2.6; I.2.10; I.4.8

  7. arXiv:2505.18270  [pdf, ps, other

    cs.RO eess.SY

    MorphEUS: Morphable Omnidirectional Unmanned System

    Authors: Ivan Bao, José C. Díaz Peón González Pacheco, Atharva Navsalkar, Andrew Scheffer, Sashreek Shankar, Andrew Zhao, Hongyu Zhou, Vasileios Tzoumas

    Abstract: Omnidirectional aerial vehicles (OMAVs) have opened up a wide range of possibilities for inspection, navigation, and manipulation applications using drones. In this paper, we introduce MorphEUS, a morphable co-axial quadrotor that can control position and orientation independently with high efficiency. It uses a paired servo motor mechanism for each rotor arm, capable of pointing the vectored-thru… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  8. arXiv:2505.16983  [pdf, ps, other

    cs.CL

    LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

    Authors: Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen

    Abstract: Large Language Models (LLMs) are primarily designed for batch processing. Existing methods for adapting LLMs to streaming rely either on expensive re-encoding or specialized architectures with limited scalability. This work identifies three key mismatches in adapting batch-oriented LLMs to streaming: (1) input-attention, (2) output-attention, and (3) position-ID mismatches. While it is commonly as… ▽ More

    Submitted 29 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: ACL 2025 Findings

  9. arXiv:2505.16782  [pdf, ps, other

    cs.CL

    Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning

    Authors: Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen

    Abstract: Large Language Models (LLMs) have achieved impressive performance on complex reasoning tasks with Chain-of-Thought (CoT) prompting. However, conventional CoT relies on reasoning steps explicitly verbalized in natural language, introducing inefficiencies and limiting its applicability to abstract reasoning. To address this, there has been growing research interest in latent CoT reasoning, where inf… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  10. arXiv:2505.16369  [pdf, ps, other

    cs.SD eess.AS

    X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

    Authors: Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan

    Abstract: We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech, environmental sounds, and music, X-ARES provides two evaluation approaches for evaluating audio representations: linear fine-tuning and unparameterized evaluation. The fra… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  11. arXiv:2505.16242  [pdf, ps, other

    cs.LG eess.SY

    Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

    Authors: Runze Yan, Xun Shen, Akifumi Wachi, Sebastien Gros, Anni Zhao, Xiao Hu

    Abstract: When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful recommendations. While existing methods like conservative Q-learning (CQL) attempt to address the OOD issue, their effectiveness is limited by only constraining action sele… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  12. arXiv:2505.12327  [pdf, ps, other

    cs.RO cs.AI cs.LG

    Robust Planning for Autonomous Driving via Mixed Adversarial Diffusion Predictions

    Authors: Albert Zhao, Stefano Soatto

    Abstract: We describe a robust planning method for autonomous driving that mixes normal and adversarial agent predictions output by a diffusion model trained for motion prediction. We first train a diffusion model to learn an unbiased distribution of normal agent behaviors. We then generate a distribution of adversarial predictions by biasing the diffusion model at test time to generate predictions that are… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2025

  13. arXiv:2505.10018  [pdf, ps, other

    cs.RO

    LEMON-Mapping: Loop-Enhanced Large-Scale Multi-Session Point Cloud Merging and Optimization for Globally Consistent Mapping

    Authors: Lijie Wang, Xiaoyi Zhong, Ziyi Xu, Kaixin Chai, Anke Zhao, Tianyu Zhao, Changjian Jiang, Qianhao Wang, Fei Gao

    Abstract: Multi-robot collaboration is becoming increasingly critical and presents significant challenges in modern robotics, especially for building a globally consistent, accurate map. Traditional multi-robot pose graph optimization (PGO) methods ensure basic global consistency but ignore the geometric structure of the map, and only use loop closures as constraints between pose nodes, leading to divergenc… ▽ More

    Submitted 4 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  14. arXiv:2505.03335  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Absolute Zero: Reinforced Self-play Reasoning with Zero Data

    Authors: Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang

    Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of hig… ▽ More

    Submitted 7 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  15. arXiv:2504.15138  [pdf, other

    cs.RO

    Automatic Generation of Aerobatic Flight in Complex Environments via Diffusion Models

    Authors: Yuhang Zhong, Anke Zhao, Tianyue Wu, Tingrui Zhang, Fei Gao

    Abstract: Performing striking aerobatic flight in complex environments demands manual designs of key maneuvers in advance, which is intricate and time-consuming as the horizon of the trajectory performed becomes long. This paper presents a novel framework that leverages diffusion models to automate and scale up aerobatic trajectory generation. Our key innovation is the decomposition of complex maneuvers int… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  16. arXiv:2504.13837  [pdf, other

    cs.AI cs.CL cs.CV

    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Authors: Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and programming tasks. Similar to how traditional RL helps agents explore and learn new strategies, RLVR is believed to enable LLMs to continuously self-improve, thus acquiring novel reasoning abilities b… ▽ More

    Submitted 16 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: 30 pages, 27 figures

  17. arXiv:2504.11447  [pdf, other

    cs.CV

    Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion

    Authors: An Zhao, Shengyuan Zhang, Ling Yang, Zejian Li, Jiale Wu, Haoran Xu, AnYang Wei, Perry Pengyun GU, Lingyun Sun

    Abstract: The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene compl… ▽ More

    Submitted 15 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Our code is public available on https://github.com/happyw1nd/DistillationDPO

  18. arXiv:2503.21841  [pdf

    cs.CV

    HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

    Authors: Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhendong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, Anran Zhao, Yanfei Zhong

    Abstract: Advanced interpretation of hyperspectral remote sensing images benefits many precise Earth observation tasks. Recently, visual foundation models have promoted the remote sensing interpretation but concentrating on RGB and multispectral images. Due to the varied hyperspectral channels,existing foundation models would face image-by-image tuning situation, imposing great pressure on hardware and time… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR2025

  19. arXiv:2503.14512  [pdf

    q-bio.QM cs.LG stat.AP stat.ML

    Machine learning algorithms to predict stroke in China based on causal inference of time series analysis

    Authors: Qizhi Zheng, Ayang Zhao, Xinzhu Wang, Yanhong Bai, Zikun Wang, Xiuying Wang, Xianzhang Zeng, Guanghui Dong

    Abstract: Participants: This study employed a combination of Vector Autoregression (VAR) model and Graph Neural Networks (GNN) to systematically construct dynamic causal inference. Multiple classic classification algorithms were compared, including Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting, and Multi Layer Perceptron (MLP). The SMO… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 17 pages

  20. arXiv:2503.13068  [pdf, other

    cs.CV

    Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

    Authors: Henghui Du, Guangyao Li, Chang Zhou, Chunjie Zhang, Alan Zhao, Di Hu

    Abstract: In recent years, numerous tasks have been proposed to encourage model to develop specified capability in understanding audio-visual scene, primarily categorized into temporal localization, spatial localization, spatio-temporal reasoning, and pixel-level understanding. Instead, human possesses a unified understanding ability for diversified tasks. Therefore, designing an audio-visual model with gen… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  21. arXiv:2503.00868  [pdf, other

    cs.GR

    Vid2Fluid: 3D Dynamic Fluid Assets from Single-View Videos with Generative Gaussian Splatting

    Authors: Zhiwei Zhao, Alan Zhao, Minchen Li, Yixin Hu

    Abstract: The generation of 3D content from single-view images has been extensively studied, but 3D dynamic scene generation with physical consistency from videos remains in its early stages. We propose a novel framework leveraging generative 3D Gaussian Splatting (3DGS) models to extract 3D dynamic fluid objects from single-view videos. The fluid geometry represented by 3DGS is initially generated from sin… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    ACM Class: I.2.0; I.3.7

  22. arXiv:2503.00345  [pdf, other

    cs.LG

    Towards Understanding the Benefit of Multitask Representation Learning in Decision Process

    Authors: Rui Lu, Yang Yue, Andrew Zhao, Simon Du, Gao Huang

    Abstract: Multitask Representation Learning (MRL) has emerged as a prevalent technique to improve sample efficiency in Reinforcement Learning (RL). Empirical studies have found that training agents on multiple tasks simultaneously within online and transfer learning environments can greatly improve efficiency. Despite its popularity, a comprehensive theoretical framework that elucidates its operational effi… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.15701

  23. arXiv:2502.16475  [pdf, other

    cs.CV cs.AI

    Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control

    Authors: Jinbo Yan, Alan Zhao, Yixin Hu

    Abstract: Single-image 3D generation has emerged as a prominent research topic, playing a vital role in virtual reality, 3D modeling, and digital content creation. However, existing methods face challenges such as a lack of multi-view geometric consistency and limited controllability during the generation process, which significantly restrict their usability. % To tackle these challenges, we introduce Drage… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  24. arXiv:2502.14227  [pdf, other

    cs.LG cs.AI

    SleepGMUformer: A gated multimodal temporal neural network for sleep staging

    Authors: Chenjun Zhao, Xuesen Niu, Xinglin Yu, Long Chen, Na Lv, Huiyu Zhou, Aite Zhao

    Abstract: Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multi… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  25. arXiv:2502.10703  [pdf, other

    cs.LG cs.SD

    Artificial intelligence-enabled detection and assessment of Parkinson's disease using multimodal data: A survey

    Authors: Aite Zhao, Yongcan Liu, Xinglin Yu, Xinyue Xing

    Abstract: The rapid emergence of highly adaptable and reusable artificial intelligence (AI) models is set to revolutionize the medical field, particularly in the diagnosis and management of Parkinson's disease (PD). Currently, there are no effective biomarkers for diagnosing PD, assessing its severity, or tracking its progression. Numerous AI algorithms are now being used for PD diagnosis and treatment, cap… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

  26. arXiv:2502.05713  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    4D VQ-GAN: Synthesising Medical Scans at Any Time Point for Personalised Disease Progression Modelling of Idiopathic Pulmonary Fibrosis

    Authors: An Zhao, Moucheng Xu, Ahmed H. Shahin, Wim Wuyts, Mark G. Jones, Joseph Jacob, Daniel C. Alexander

    Abstract: Understanding the progression trajectories of diseases is crucial for early diagnosis and effective treatment planning. This is especially vital for life-threatening conditions such as Idiopathic Pulmonary Fibrosis (IPF), a chronic, progressive lung disease with a prognosis comparable to many cancers. Computed tomography (CT) imaging has been established as a reliable diagnostic tool for IPF. Accu… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: 4D image synthesis, VQ-GAN, neural ODEs, spatial temporal disease progression modelling, CT, IPF

  27. arXiv:2501.06608  [pdf, other

    cs.LG q-bio.QM

    Dual-Modality Representation Learning for Molecular Property Prediction

    Authors: Anyin Zhao, Zuquan Chen, Zhengyu Fang, Xiaoge Zhang, Jing Li

    Abstract: Molecular property prediction has attracted substantial attention recently. Accurate prediction of drug properties relies heavily on effective molecular representations. The structures of chemical compounds are commonly represented as graphs or SMILES sequences. Recent advances in learning drug properties commonly employ Graph Neural Networks (GNNs) based on the graph representation. For the SMILE… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

  28. arXiv:2412.05587  [pdf

    cs.SE cs.AI cs.DB

    GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models

    Authors: Shuyang Hou, Jianyuan Liang, Anqi Zhao, Huayi Wu

    Abstract: As the scale and complexity of spatiotemporal data continue to grow rapidly, the use of geospatial modeling on the Google Earth Engine (GEE) platform presents dual challenges: improving the coding efficiency of domain experts and enhancing the coding capabilities of interdisciplinary users. To address these challenges and improve the performance of large language models (LLMs) in geospatial code g… ▽ More

    Submitted 11 December, 2024; v1 submitted 7 December, 2024; originally announced December 2024.

  29. arXiv:2412.03515  [pdf, other

    cs.CV

    Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

    Authors: Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun GU, Lingyun Sun

    Abstract: Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR sc… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: https://github.com/happyw1nd/ScoreLiDAR

  30. arXiv:2411.17673  [pdf, other

    cs.CV

    SketchAgent: Language-Driven Sequential Sketch Generation

    Authors: Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba

    Abstract: Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, seq… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: project page: https://sketch-agent.csail.mit.edu/

  31. arXiv:2411.14720  [pdf

    cs.CL

    Optimizing Social Media Annotation of HPV Vaccine Skepticism and Misinformation Using Large Language Models: An Experimental Evaluation of In-Context Learning and Fine-Tuning Stance Detection Across Multiple Models

    Authors: Luhang Sun, Varsha Pendyala, Yun-Shiuan Chuang, Shanglin Yang, Jonathan Feldman, Andrew Zhao, Munmun De Choudhury, Sijia Yang, Dhavan Shah

    Abstract: This paper leverages large-language models (LLMs) to experimentally determine optimal strategies for scaling up social media content annotation for stance detection on HPV vaccine-related tweets. We examine both conventional fine-tuning and emergent in-context learning methods, systematically varying strategies of prompt engineering across widely used LLMs and their variants (e.g., GPT4, Mistral,… ▽ More

    Submitted 2 April, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  32. arXiv:2411.10753  [pdf

    cs.SE cs.AI cs.CL

    Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation

    Authors: Shuyang Hou, Haoyue Jiao, Zhangxiao Shen, Jianyuan Liang, Anqi Zhao, Xiaopu Zhang, Jianxun Wang, Huayi Wu

    Abstract: With the rapid growth of interdisciplinary demands for geospatial modeling and the rise of large language models (LLMs), geospatial code generation technology has seen significant advancements. However, existing LLMs often face challenges in the geospatial code generation process due to incomplete or unclear user requirements and insufficient knowledge of specific platform syntax rules, leading to… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

  33. arXiv:2410.21635  [pdf, other

    quant-ph cs.DS cs.LG

    Learning the structure of any Hamiltonian from minimal assumptions

    Authors: Andrew Zhao

    Abstract: We study the problem of learning an unknown quantum many-body Hamiltonian $H$ from black-box queries to its time evolution $e^{-\mathrm{i} H t}$. Prior proposals for solving this task either impose some assumptions on $H$, such as its interaction structure or locality, or otherwise use an exponential amount of computational postprocessing. In this paper, we present algorithms to learn any $n$-qubi… ▽ More

    Submitted 21 April, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: 45 pages

    Journal ref: Proceedings of the 57th Symposium on Theory of Computing (STOC), pp. 1201-1211, 2025

  34. Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

    Authors: Shuyang Hou, Anqi Zhao, Jianyuan Liang, Zhangxiao Shen, Huayi Wu

    Abstract: The rise of spatiotemporal data and the need for efficient geospatial modeling have spurred interest in automating these tasks with large language models (LLMs). However, general LLMs often generate errors in geospatial code due to a lack of domain-specific knowledge on functions and operators. To address this, a retrieval-augmented generation (RAG) approach, utilizing an external knowledge base o… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  35. GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks

    Authors: Shuyang Hou, Zhangxiao Shen, Anqi Zhao, Jianyuan Liang, Zhipeng Gui, Xuefeng Guan, Rui Li, Huayi Wu

    Abstract: The increasing demand for spatiotemporal data and modeling tasks in geosciences has made geospatial code generation technology a critical factor in enhancing productivity. Although large language models (LLMs) have demonstrated potential in code generation tasks, they often encounter issues such as refusal to code or hallucination in geospatial code generation due to a lack of domain-specific know… ▽ More

    Submitted 23 October, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

  36. arXiv:2410.16392  [pdf, other

    cs.CL cs.LG

    Training of Scaffolded Language Models with Language Supervision: A Survey

    Authors: Matthieu Lin, Jenny Sheng, Andrew Zhao, Shenzhi Wang, Yang Yue, Victor Shea Jay Huang, Huan Liu, Jun Liu, Gao Huang, Yong-Jin Liu

    Abstract: This survey organizes the intricate literature on the design and optimization of emerging structures around post-trained LMs. We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools. We view scaffolded LMs as semi-parametric models wherein we train non-parametric variables, including the prompt, tools, and scaffold's code.… ▽ More

    Submitted 16 May, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

  37. arXiv:2410.09738  [pdf

    cs.SE

    Can Large Language Models Generate Geospatial Code?

    Authors: Shuyang Hou, Zhangxiao Shen, Jianyuan Liang, Anqi Zhao, Zhipeng Gui, Rui Li, Huayi Wu

    Abstract: With the growing demand for spatiotemporal data processing and geospatial modeling, automating geospatial code generation has become essential for productivity. Large language models (LLMs) show promise in code generation but face challenges like domain-specific knowledge gaps and "coding hallucinations." This paper introduces GeoCode-Eval (GCE), a framework for assessing LLMs' ability to generate… ▽ More

    Submitted 17 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  38. arXiv:2408.15991  [pdf, other

    cs.CV

    Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation

    Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into a student generator to achieve one-step generation, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the… ▽ More

    Submitted 16 April, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: Our code is publicly available on https://github.com/SYZhang0805/DisBack

  39. arXiv:2407.17011  [pdf, other

    cs.CL

    Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

    Authors: Anhao Zhao, Fanghua Ye, Jinlan Fu, Xiaoyu Shen

    Abstract: Large language models (LLMs) exhibit remarkable in-context learning (ICL) capabilities. However, the underlying working mechanism of ICL remains poorly understood. Recent research presents two conflicting views on ICL: One emphasizes the impact of similar examples in the demonstrations, stressing the need for label correctness and more shots. The other attributes it to LLMs' inherent ability of ta… ▽ More

    Submitted 9 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  40. arXiv:2407.08770  [pdf, other

    cs.AI

    Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

    Authors: Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

    Abstract: Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current approaches for detoxification or preventing jailbreaking usually… ▽ More

    Submitted 11 February, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 14 figures

    MSC Class: 68T50 (Primary) 68T07; 62M45 (Secondary) ACM Class: I.2.7

  41. arXiv:2406.17963  [pdf, other

    cs.LG cs.HC cs.SI

    Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

    Authors: Yiqiao Jin, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar

    Abstract: We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs,… ▽ More

    Submitted 28 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 27 pages, 11 figures

  42. arXiv:2405.19026  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

    Authors: Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-jin Liu, Zilong Zheng, Gao Huang

    Abstract: Recent advances in large language model assistants have made them indispensable, raising significant concerns over managing their safety. Automated red teaming offers a promising alternative to the labor-intensive and error-prone manual probing for vulnerabilities, providing more consistent and scalable safety evaluations. However, existing approaches often compromise diversity by focusing on maxi… ▽ More

    Submitted 20 December, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by the 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25)

  43. arXiv:2404.09445  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Text-to-Motion Generation with Human Preference

    Authors: Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-Jin Liu

    Abstract: This paper presents an exploration of preference learning in text-to-motion generation. We find that current improvements in text-to-motion generation still rely on datasets requiring expert labelers with motion capture systems. Instead, learning from human preference data does not require motion capture systems; a labeler with no expertise simply compares two generated motions. This is particular… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 HuMoGen Workshop

  44. CoRAST: Towards Foundation Model-Powered Correlated Data Analysis in Resource-Constrained CPS and IoT

    Authors: Yi Hu, Jinhang Zuo, Alanis Zhao, Bob Iannucci, Carlee Joe-Wong

    Abstract: Foundation models (FMs) emerge as a promising solution to harness distributed and diverse environmental data by leveraging prior knowledge to understand the complicated temporal and spatial correlations within heterogeneous datasets. Unlike distributed learning frameworks such as federated learning, which often struggle with multimodal data, FMs can transform diverse inputs into embeddings. This p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: accepted and to be published in 2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys)

  45. arXiv:2403.13455  [pdf, other

    cs.RO

    FACT: Fast and Active Coordinate Initialization for Vision-based Drone Swarms

    Authors: Yuan Li, Anke Zhao, Yingjian Wang, Ziyi Xu, Xin Zhou, Jinni Zhou, Chao Xu, Fei Gao

    Abstract: Swarm robots have sparked remarkable developments across a range of fields. While it is necessary for various applications in swarm robots, a fast and robust coordinate initialization in vision-based drone swarms remains elusive. To this end, our paper proposes a complete system to recover a swarm's initial relative pose on platforms with size, weight, and power (SWaP) constraints. To overcome lim… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  46. arXiv:2401.12377  [pdf, other

    cs.AR

    ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs

    Authors: Sankeerth Durvasula, Adrian Zhao, Raymond Kiguru, Yushi Guan, Zhonghan Chen, Nandita Vijaykumar

    Abstract: GPUs are widely used to accelerate many important classes of workloads today. However, we observe that several important emerging classes of workloads, including simulation engines for deep reinforcement learning and dynamic neural networks, are unable to fully utilize the massive parallelism that GPUs offer. These applications tend to have kernels that are small in size, i.e., have few thread blo… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  47. arXiv:2401.05345  [pdf, other

    cs.CV cs.GR cs.PF

    DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines

    Authors: Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar

    Abstract: Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (e.g. 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods… ▽ More

    Submitted 1 December, 2023; originally announced January 2024.

  48. arXiv:2312.10399  [pdf, other

    quant-ph cs.IT physics.chem-ph

    Learning, Optimizing, and Simulating Fermions with Quantum Computers

    Authors: Andrew Zhao

    Abstract: Fermions are fundamental particles which obey seemingly bizarre quantum-mechanical principles, yet constitute all the ordinary matter that we inhabit. As such, their study is heavily motivated from both fundamental and practical incentives. In this dissertation, we will explore how the tools of quantum information and computation can assist us on both of these fronts. We primarily do so through th… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: PhD thesis. Includes a background and overview of many-fermion systems, quantum-state learning, and NISQ/error mitigation. Main chapters are based on arXiv:2010.16094 (new: lower bound on sample complexity for local fermionic estimation), arXiv:2310.03071, arXiv:1908.08067 (new: connection between unitary partitioning and matchgate circuits), and arXiv:2301.01778

  49. arXiv:2311.12848  [pdf, other

    cs.DB cs.AI

    Lightweight Knowledge Representations for Automating Data Analysis

    Authors: Marko Sterbentz, Cameron Barrie, Donna Hooshmand, Shubham Shahi, Abhratanu Dutta, Harper Pack, Andong Li Zhao, Andrew Paley, Alexander Einarsson, Kristian Hammond

    Abstract: The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this w… ▽ More

    Submitted 15 October, 2023; originally announced November 2023.

  50. arXiv:2311.09692  [pdf, other

    cs.LG cs.AI cs.RO

    Augmenting Unsupervised Reinforcement Learning with Self-Reference

    Authors: Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang

    Abstract: Humans possess the ability to draw on past experiences explicitly when learning new tasks and applying them accordingly. We believe this capacity for self-referencing is especially advantageous for reinforcement learning agents in the unsupervised pretrain-then-finetune setting. During pretraining, an agent's past experiences can be explicitly utilized to mitigate the nonstationarity of intrinsic… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Preprint