Skip to main content

Showing 1–50 of 96 results for author: Zixu

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22677  [pdf, ps, other

    cs.ET

    Prediction of Protein Three-dimensional Structures via a Hardware-Executable Quantum Computing Framework

    Authors: Yuqi Zhang, Yuxin Yang, William Martin, Kingsten Lin, Zixu Wang, Cheng-Chang Lu, Weiwen Jiang, Ruth Nussinov, Joseph Loscalzo, Qiang Guan, Feixiong Cheng

    Abstract: Accurate prediction of protein active site structures remains a central challenge in structural biology, particularly for short and flexible peptide fragments where conventional methods often fail. Here, we present a quantum computing framework specifically developed for utility-level quantum processors to address this problem. Starting from an amino acid sequence, we formulate the structure predi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 22 pages, 4 figures

  2. arXiv:2506.17612  [pdf, ps, other

    cs.CV

    JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

    Authors: Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding, Wenbo Li, Shuicheng Yan

    Abstract: Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity. While professional tools such as Adobe Lightroom offer powerful capabilities, they demand substantial expertise and manual effort. In contrast, existing AI-based solutions provide automation but often suffer from limited adjustability and poor generalization, faili… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 40 pages, 26 figures

  3. arXiv:2506.05175  [pdf, ps, other

    cs.CV

    Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

    Authors: Yuzhi Huang, Chenxin Li, Haitao Zhang, Zixu Lin, Yunlong Lin, Hengyu Liu, Wuyang Li, Xinyu Liu, Jiechao Gao, Yue Huang, Xinghao Ding, Yixuan Yuan

    Abstract: Video anomaly detection (VAD) is crucial in scenarios such as surveillance and autonomous driving, where timely detection of unexpected activities is essential. Although existing methods have primarily focused on detecting anomalous objects in videos -- either by identifying anomalous frames or objects -- they often neglect finer-grained analysis, such as anomalous pixels, which limits their abili… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2505.18603  [pdf, other

    cs.AI cs.CV

    Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning

    Authors: Ye Mo, Zirui Shao, Kai Ye, Xianwei Mao, Bo Zhang, Hangdi Xing, Peng Ye, Gang Huang, Kehan Chen, Zhou Huan, Zixu Yan, Sheng Zhou

    Abstract: Multimodal large language models (MLLMs) have made significant progress in document understanding. However, the information-dense nature of document images still poses challenges, as most queries depend on only a few relevant regions, with the rest being redundant. Existing one-pass MLLMs process entire document images without considering query relevance, often failing to focus on critical regions… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  5. arXiv:2505.14151  [pdf, ps, other

    cs.CV cs.MM

    ReactDiff: Latent Diffusion for Facial Reaction Generation

    Authors: Jiaming Li, Sheng Wang, Xin Wang, Yitao Zhu, Honglin Xiong, Zixu Zhuang, Qian Wang

    Abstract: Given the audio-visual clip of the speaker, facial reaction generation aims to predict the listener's facial reactions. The challenge lies in capturing the relevance between video and audio while balancing appropriateness, realism, and diversity. While prior works have mostly focused on uni-modal inputs or simplified reaction mappings, recent approaches such as PerFRDiff have explored multi-modal… ▽ More

    Submitted 4 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Neural Networks

  6. InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning

    Authors: Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng

    Abstract: As an important graph pre-training method, Graph Contrastive Learning (GCL) continues to play a crucial role in the ongoing surge of research on graph foundation models or LLM as enhancer for graphs. Traditional GCL optimizes InfoNCE by using augmentations to define self-supervised tasks, treating augmented pairs as positive samples and others as negative. However, this leads to semantically simil… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, Accepted by SIGIR2025

  7. arXiv:2504.04158  [pdf, other

    cs.CV

    JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

    Authors: Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Yeying Jin, Wenbo Li, Xinghao Ding

    Abstract: Vision-centric perception systems struggle with unpredictable and coupled weather degradations in the wild. Current solutions are often limited, as they either depend on specific degradation priors or suffer from significant domain gaps. To enable robust and autonomous operation in real-world conditions, we propose JarvisIR, a VLM-powered agent that leverages the VLM as a controller to manage mult… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: 25 pages, 15 figures

  8. arXiv:2503.23793  [pdf, other

    cs.CV

    Pan-LUT: Efficient Pan-sharpening via Learnable Look-Up Tables

    Authors: Zhongnan Cai, Yingying Wang, Yunlong Lin, Hui Zheng, Ge Meng, Zixu Lin, Jiaxin Xie, Junbin Lu, Yue Huang, Xinghao Ding

    Abstract: Recently, deep learning-based pan-sharpening algorithms have achieved notable advancements over traditional methods. However, many deep learning-based approaches incur substantial computational overhead during inference, especially with high-resolution images. This excessive computational demand limits the applicability of these methods in real-world scenarios, particularly in the absence of dedic… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 12 pages, 6 figures

  9. arXiv:2503.21309  [pdf, other

    cs.CV cs.AI

    FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval

    Authors: Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, Liqiang Nie

    Abstract: Composed Image Retrieval (CIR) facilitates image retrieval through a multimodal query consisting of a reference image and modification text. The reference image defines the retrieval context, while the modification text specifies desired alterations. However, existing CIR datasets predominantly employ coarse-grained modification text (CoarseMT), which inadequately captures fine-grained retrieval i… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  10. arXiv:2503.19736  [pdf

    eess.IV cs.CV

    GRN+: A Simplified Generative Reinforcement Network for Tissue Layer Analysis in 3D Ultrasound Images for Chronic Low-back Pain

    Authors: Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Xin Meng, Jiantao Pu

    Abstract: 3D ultrasound delivers high-resolution, real-time images of soft tissues, which is essential for pain research. However, manually distinguishing various tissues for quantitative analysis is labor-intensive. To streamline this process, we developed and validated GRN+, a novel multi-model framework that automates layer segmentation with minimal annotated data. GRN+ combines a ResNet-based generator… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  11. arXiv:2503.19735  [pdf

    eess.IV cs.CV

    InterSliceBoost: Identifying Tissue Layers in Three-dimensional Ultrasound Images for Chronic Lower Back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Matthew Cartier, Xiaoyan Zhao, Pengyu Chen, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison C. Bean, Ryan P. Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Kang Kim, Ajay D. Wasan, Jiantao Pu

    Abstract: Available studies on chronic lower back pain (cLBP) typically focus on one or a few specific tissues rather than conducting a comprehensive layer-by-layer analysis. Since three-dimensional (3-D) images often contain hundreds of slices, manual annotation of these anatomical structures is both time-consuming and error-prone. We aim to develop and validate a novel approach called InterSliceBoost to e… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  12. arXiv:2503.16734  [pdf, other

    cs.AI cs.IR

    Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

    Authors: Chengkai Huang, Junda Wu, Yu Xia, Zixu Yu, Ruhan Wang, Tong Yu, Ruiyi Zhang, Ryan A. Rossi, Branislav Kveton, Dongruo Zhou, Julian McAuley, Lina Yao

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportun… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  13. arXiv:2503.11495  [pdf, other

    cs.CV

    V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

    Authors: Zixu Cheng, Jian Hu, Ziquan Liu, Chenyang Si, Wei Li, Shaogang Gong

    Abstract: Human processes video reasoning in a sequential spatio-temporal reasoning logic, we first identify the relevant frames ("when") and then analyse the spatial relationships ("where") between key objects, and finally leverage these relationships to draw inferences ("what"). However, can Video Large Language Models (Video-LLMs) also "reason through a sequential spatio-temporal logic" in videos? Existi… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: A benchmark for Video Spatio-Temporal Reasoning

  14. Are Cognitive Biases as Important as they Seem for Data Visualization?

    Authors: Ali Baigelenov, Prakash Shukla, Zixu Zhang, Paul Parsons

    Abstract: Research on cognitive biases and heuristics has become increasingly popular in the visualization literature in recent years. Researchers have studied the effects of biases on visualization interpretation and subsequent decision-making. While this work is important, we contend that the view on biases has presented human cognitive abilities in an unbalanced manner, placing too much emphasis on the f… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 7 pages, CHILBW25 - Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

  15. arXiv:2502.06428  [pdf, other

    cs.CV

    CoS: Chain-of-Shot Prompting for Long Video Understanding

    Authors: Jian Hu, Zixu Cheng, Chenyang Si, Wei Li, Shaogang Gong

    Abstract: Multi-modal Large Language Models (MLLMs) struggle with long videos due to the need for excessive visual tokens. These tokens exceed massively the context length of MLLMs, resulting in filled by redundant task-irrelevant shots. How to select shots is an unsolved critical problem: sparse sampling risks missing key details, while exhaustive sampling overwhelms the model with irrelevant content, lead… ▽ More

    Submitted 11 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: A training-free test-time optimisation approach for long video understanding

  16. arXiv:2502.04200  [pdf, other

    cs.SE

    Characterizing Bugs in Login Processes of Android Applications: An Empirical Study

    Authors: Zixu Zhou, Rufeng Chen, Junfeng Chen, Yepang Liu, Lili Wei

    Abstract: The login functionality, being the gateway to app usage, plays a critical role in both user experience and application security. As Android apps increasingly incorporate login functionalities, they support a variety of authentication methods with complicated login processes, catering to personalized user experiences. However, the complexities in managing different operations in login processes mak… ▽ More

    Submitted 13 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

  17. arXiv:2501.18753  [pdf, other

    cs.CV

    INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

    Authors: Jian Hu, Zixu Cheng, Shaogang Gong

    Abstract: Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to ge… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: A new task-generic promptable segmentation approach

  18. arXiv:2501.17690  [pdf

    cs.CV cs.AI cs.LG

    Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Tong Yu, Jing Wang, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison Bean, Ryan Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Dinesh Kumbhare, Kang Kim, Ajay Wasan, Jiantao Pu

    Abstract: We introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the seg… ▽ More

    Submitted 23 June, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  19. arXiv:2501.13484  [pdf, other

    cs.LG cs.AI cs.CL

    MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

    Authors: Zukang Xu, Yuxuan Yue, Xing Hu, Zhihang Yuan, Zixu Jiang, Zhixuan Chen, Jiangyong Yu, Chen Xu, Sifan Zhou, Dawei Yang

    Abstract: Mamba is an efficient sequence model that rivals Transformers and demonstrates significant potential as a foundational architecture for various tasks. Quantization is commonly used in neural networks to reduce model size and computational latency. However, applying quantization to Mamba remains underexplored, and existing quantization methods, which have been effective for CNN and Transformer mode… ▽ More

    Submitted 11 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  20. arXiv:2501.08670  [pdf, other

    cs.SE

    Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery

    Authors: Zeqin Liao, Yuhong Nan, Zixu Gao, Henglong Liang, Sicheng Hao, Peifan Reng, Zibin Zheng

    Abstract: Decompiler is a specialized type of reverse engineering tool extensively employed in program analysis tasks, particularly in program comprehension and vulnerability detection. However, current Solidity smart contract decompilers face significant limitations in reconstructing the original source code. In particular, the bottleneck of SOTA decompilers lies in inaccurate method identification, incorr… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  21. arXiv:2412.20070  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging

    Authors: Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang

    Abstract: Medical imaging provides essential visual insights for diagnosis, and multimodal large language models (MLLMs) are increasingly utilized for its analysis due to their strong generalization capabilities; however, the underlying factors driving this generalization remain unclear. Current research suggests that multi-task training outperforms single-task as different tasks can benefit each other, but… ▽ More

    Submitted 31 May, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

  22. arXiv:2412.04802  [pdf, other

    eess.IV cs.CV

    Unsupervised Hyperspectral and Multispectral Image Fusion via Self-Supervised Modality Decoupling

    Authors: Songcheng Du, Yang Zou, Zixu Wang, Xingyuan Li, Ying Li, Changjing Shang, Qiang Shen

    Abstract: Hyperspectral and Multispectral Image Fusion (HMIF) aims to fuse low-resolution hyperspectral images (LR-HSIs) and high-resolution multispectral images (HR-MSIs) to reconstruct high spatial and high spectral resolution images. Current methods typically apply direct fusion from the two modalities without effective supervision, leading to an incomplete perception of deep modality-complementary infor… ▽ More

    Submitted 22 April, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  23. arXiv:2411.12471  [pdf, other

    cs.CV

    SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image

    Authors: Zixu Wang, Hao Yang, Yu Guo, Fei Wang

    Abstract: Snapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes… ▽ More

    Submitted 24 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

  24. arXiv:2410.21438  [pdf, other

    cs.CL cs.LG

    UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

    Authors: Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang, Shiyu Wang

    Abstract: By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Due to the differing nature and objective functions of SFT and alignment, catastrophic forgetting has become a significant issue. To address this, we introduce Unified Fine-Tuning (UFT),… ▽ More

    Submitted 6 April, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  25. arXiv:2410.19274  [pdf, other

    cs.LG cs.AI cs.OS cs.PF

    Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management

    Authors: Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various domains, yet deploying them on mobile devices remains an arduous challenge due to their extensive computational and memory demands. While lightweight LLMs have been developed to fit mobile environments, they suffer from degraded model accuracy. In contrast, sparsity-based techniques minimize DRAM usage by selectively tran… ▽ More

    Submitted 29 October, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

  26. arXiv:2410.08781  [pdf, other

    cs.CV

    VideoSAM: Open-World Video Segmentation

    Authors: Pinxue Guo, Zixu Zhao, Jianxiong Gao, Chongruo Wu, Tong He, Zheng Zhang, Tianjun Xiao, Wenqiang Zhang

    Abstract: Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception and object association across video frames are critical. While the Segment Anything Model (SAM) has excelled in static image segmentation, extending its capabilities to video segmentation poses significant challenges. We tackle two major hurdles: a) SAM's e… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  27. arXiv:2409.18794  [pdf, other

    cs.RO cs.CV

    Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs

    Authors: Yanyuan Qiao, Wenqi Lyu, Hui Wang, Zixu Wang, Zerui Li, Yuan Zhang, Mingkui Tan, Qi Wu

    Abstract: Vision-and-Language Navigation (VLN) tasks require an agent to follow textual instructions to navigate through 3D environments. Traditional approaches use supervised learning methods, relying heavily on domain-specific datasets to train VLN models. Recent methods try to utilize closed-source large language models (LLMs) like GPT-4 to solve VLN tasks in zero-shot manners, but face challenges relate… ▽ More

    Submitted 10 February, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted by ICRA 2025

  28. arXiv:2409.18423  [pdf, other

    cs.LG

    A physics-driven sensor placement optimization methodology for temperature field reconstruction

    Authors: Xu Liu, Wen Yao, Wei Peng, Zhuojia Fu, Zixue Xiang, Xiaoqian Chen

    Abstract: Perceiving the global field from sparse sensors has been a grand challenge in the monitoring, analysis, and design of physical systems. In this context, sensor placement optimization is a crucial issue. Most existing works require large and sufficient data to construct data-based criteria, which are intractable in data-free scenarios without numerical and experimental data. To this end, we propose… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Journal ref: Applied thermal engineering(2024)

  29. arXiv:2409.10644  [pdf, other

    cs.CL

    Improving Multi-candidate Speculative Decoding

    Authors: Xiaofan Lu, Yixiao Zeng, Feiyang Ma, Zixu Yu, Marco Levorato

    Abstract: Speculative Decoding (SD) is a technique to accelerate the inference of Large Language Models (LLMs) by using a lower complexity draft model to propose candidate tokens verified by a larger target model. To further improve efficiency, Multi-Candidate Speculative Decoding (MCSD) improves upon this by sampling multiple candidate tokens from the draft model at each step and verifying them in parallel… ▽ More

    Submitted 14 December, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: Accepted by NeurIPS ENLSP 2024 Workshop

  30. arXiv:2409.04847  [pdf, other

    cs.CV

    Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

    Authors: Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng Zhang

    Abstract: Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-att… ▽ More

    Submitted 11 January, 2025; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: Accepted in NeurIPS 2024

  31. arXiv:2409.02720  [pdf, other

    cs.CV cs.AI eess.SP

    GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

    Authors: Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

    Abstract: Depth estimation plays a pivotal role in autonomous driving, facilitating a comprehensive understanding of the vehicle's 3D surroundings. Radar, with its robustness to adverse weather conditions and capability to measure distances, has drawn significant interest for radar-camera depth estimation. However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D point… ▽ More

    Submitted 8 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by WACV 2025

  32. arXiv:2409.02152  [pdf

    cs.SI cs.AI

    Fair Railway Network Design

    Authors: Zixu He, Sirin Botan, Jérôme Lang, Abdallah Saffidine, Florian Sikora, Silas Workman

    Abstract: When designing a public transportation network in a country, one may want to minimise the sum of travel duration of all inhabitants. This corresponds to a purely utilitarian view and does not involve any fairness consideration, as the resulting network will typically benefit the capital city and/or large central cities while leaving some peripheral cities behind. On the other hand, a more egalitar… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 32 pages, 18 figures

  33. arXiv:2408.15339  [pdf, other

    cs.LG cs.CL

    UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function

    Authors: Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng

    Abstract: An LLM is pretrained on trillions of tokens, but the pretrained LLM may still generate undesired responses. To solve this problem, alignment techniques such as RLHF, DPO and KTO are proposed. However, these alignment techniques have limitations. For example, RLHF requires training the reward model and policy separately, which is complex, time-consuming, memory intensive and unstable during trainin… ▽ More

    Submitted 5 April, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

  34. arXiv:2407.20265  [pdf, other

    cs.LG cs.CE

    COEFF-KANs: A Paradigm to Address the Electrolyte Field with KANs

    Authors: Xinhe Li, Zhuoying Feng, Yezeng Chen, Weichen Dai, Zixu He, Yi Zhou, Shuhong Jiao

    Abstract: To reduce the experimental validation workload for chemical researchers and accelerate the design and optimization of high-energy-density lithium metal batteries, we aim to leverage models to automatically predict Coulombic Efficiency (CE) based on the composition of liquid electrolytes. There are mainly two representative paradigms in existing methods: machine learning and deep learning. However,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

  35. arXiv:2407.16216  [pdf, other

    cs.CL

    A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

    Authors: Zhichao Wang, Bin Bi, Shiva Kumar Pentyala, Kiran Ramnath, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Zhu, Xiang-Bo Mao, Sitaram Asur, Na, Cheng

    Abstract: With advancements in self-supervised learning, the availability of trillions tokens in a pre-training corpus, instruction fine-tuning, and the development of large Transformers with billions of parameters, large language models (LLMs) are now capable of generating factual and coherent responses to human queries. However, the mixed quality of training data can lead to the generation of undesired re… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  36. arXiv:2407.05736  [pdf, other

    cs.AI cs.CV

    TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

    Authors: Kun Wu, Zixu Wang, Xiulong Yang, Yangyang Chen, Zhenqi Han, Jialu Zhang, Lizhuang Liu

    Abstract: As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction mod… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  37. arXiv:2407.05118  [pdf, other

    cs.CV

    SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

    Authors: Zixu Cheng, Yujiang Pu, Shaogang Gong, Parisa Kordjamshidi, Yu Kong

    Abstract: Temporal grounding, also known as video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decomp… ▽ More

    Submitted 15 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  38. arXiv:2406.02425  [pdf, other

    cs.CV cs.RO

    CoNav: A Benchmark for Human-Centered Collaborative Navigation

    Authors: Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang, Yanxia Liu, Jinhui Zhu, Chuang Gan, Mingkui Tan

    Abstract: Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  39. Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learning

    Authors: Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng

    Abstract: Graph contrastive learning (GCL), standing as the dominant paradigm in the realm of graph pre-training, has yielded considerable progress. Nonetheless, its capacity for out-of-distribution (OOD) generalization has been relatively underexplored. In this work, we point out that the traditional optimization of InfoNCE in GCL restricts the cross-domain pairs only to be negative samples, which inevitab… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures, In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

    ACM Class: I.2

  40. arXiv:2405.03809  [pdf, other

    cs.AI

    SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction

    Authors: Zixu Wang, Zhigang Sun, Juergen Luettin, Lavdim Halilaj

    Abstract: Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and sur… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  41. arXiv:2404.19379  [pdf, other

    cs.CV cs.RO

    SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs

    Authors: Zhigang Sun, Zixu Wang, Lavdim Halilaj, Juergen Luettin

    Abstract: Trajectory prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene, including traffic participants, road topology, traffic signs, as well as their semantic relations to each other. Despite increased attention to this issue, most approaches in trajectory prediction do not consider all of these factors sufficiently. We present SemanticFormer,… ▽ More

    Submitted 1 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures, has been accepted for publication in the IEEE Robotics and Automation Letters (RA-L)

  42. arXiv:2404.02524  [pdf, other

    cs.RO

    Versatile Behavior Diffusion for Generalized Traffic Agent Simulation

    Authors: Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Chen Lv, Jaime Fernández Fisac

    Abstract: Existing traffic simulation models often fail to capture the complexities of real-world scenarios, limiting the effective evaluation of autonomous driving systems. We introduce Versatile Behavior Diffusion (VBD), a novel traffic scenario generation framework that utilizes diffusion generative models to predict scene-consistent and controllable multi-agent interactions in closed-loop settings. VBD… ▽ More

    Submitted 2 December, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  43. arXiv:2403.13244  [pdf

    cs.CL cs.AI

    Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

    Authors: Peng Zhou, Jianmin Wang, Chunyan Li, Zixu Wang, Yiping Liu, Siqi Sun, Jianxin Lin, Leyi Wei, Xibao Cai, Houtim Lai, Wei Liu, Longyue Wang, Yuansheng Liu, Xiangxiang Zeng

    Abstract: While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'tea… ▽ More

    Submitted 10 October, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 37 pages, 10 figures

  44. arXiv:2403.05055  [pdf, other

    cs.CV

    MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

    Authors: Yitao Zhu, Sheng Wang, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang

    Abstract: Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration, a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. I… ▽ More

    Submitted 24 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  45. arXiv:2402.14576  [pdf, other

    cs.NI cs.LG eess.SY

    Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

    Authors: Farnaz Niknia, Ping Wang, Zixu Wang, Aakash Agarwal, Adib S. Rezaei

    Abstract: This paper tackles the growing issue of excessive data transmission in networks. With increasing traffic, backhaul links and core networks are under significant traffic, leading to the investigation of caching solutions at edge routers. Many existing studies utilize Markov Decision Processes (MDP) to tackle caching problems, often assuming decision points at fixed intervals; however, real-world en… ▽ More

    Submitted 30 October, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  46. arXiv:2402.14174  [pdf, other

    cs.RO cs.AI eess.SY math.OC

    Blending Data-Driven Priors in Dynamic Games

    Authors: Justin Lidard, Haimin Hu, Asher Hancock, Zixu Zhang, Albert Gimó Contreras, Vikash Modi, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, María Santos, Jaime Fernández Fisac

    Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, h… ▽ More

    Submitted 6 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 20 pages, 12 figures

  47. arXiv:2402.10015  [pdf

    cs.DS cs.CC

    A Piecewise Approach for the Analysis of Exact Algorithms

    Authors: Katie Clinch, Serge Gaspers, Zixu He, Abdallah Saffidine, Tiankuang Zhang

    Abstract: To analyze the worst-case running time of branching algorithms, the majority of work in exponential time algorithms focuses on designing complicated branching rules over developing better analysis methods for simple algorithms. In the mid-$2000$s, Fomin et al. [2005] introduced measure & conquer, an advanced general analysis method, sparking widespread adoption for obtaining tighter worst-case run… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    ACM Class: F.2.2; G.2.2

  48. arXiv:2402.09246  [pdf, other

    cs.RO cs.AI eess.SY math.OC

    Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

    Authors: Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime F. Fisac

    Abstract: We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutat… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Robotics: Science and Systems (RSS) 2024

  49. arXiv:2402.06529  [pdf, other

    cs.AI cs.CL cs.LG

    Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity

    Authors: Kaiqu Liang, Zixu Zhang, Jaime Fernández Fisac

    Abstract: Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios. Additionally, inherent ambiguity in natural language instr… ▽ More

    Submitted 10 February, 2025; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024

  50. arXiv:2401.00632  [pdf, other

    cs.CR

    TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT

    Authors: Zixu Zhang, Guangsheng Yu, Caijun Sun, Xu Wang, Ying Wang, Ming Zhang, Wei Ni, Ren Ping Liu, Andrew Reeves, Nektarios Georgalas

    Abstract: Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adep… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.