Skip to main content

Showing 1–50 of 115 results for author: Yao, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05660  [pdf, ps, other

    cs.CR cs.AI cs.CL

    TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data

    Authors: Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath

    Abstract: Recent advances in foundation models, such as LLMs, have revolutionized conversational AI. Chatbots are increasingly being developed by customizing LLMs on specific conversational datasets. However, mitigating toxicity during this customization, especially when dealing with untrusted training data, remains a significant challenge. To address this, we introduce TuneShield, a defense framework desig… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Pre-print

  2. arXiv:2506.23757  [pdf, ps, other

    cs.LG stat.ME stat.ML

    Training of Spiking Neural Networks with Expectation-Propagation

    Authors: Dan Yao, Steve McLaughlin, Yoann Altmann

    Abstract: In this paper, we propose a unifying message-passing framework for training spiking neural networks (SNNs) using Expectation-Propagation. Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes nuisance parameters, such as the outputs of hidden layers. This framework allows for the first time, training of discrete and continu… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 10 pages

  3. arXiv:2506.23071  [pdf, ps, other

    cs.CL

    Text2VectorSQL: Bridging Text-to-SQL and Vector Search for Unified Natural Language Queries

    Authors: Zhengren Wang, Bozhou Li, Dongwen Yao, Wentao Zhang

    Abstract: While Text-to-SQL enables natural language interaction with structured databases, its effectiveness diminishes with unstructured data or ambiguous queries due to rigid syntax and limited expressiveness. Concurrently, vector search has emerged as a powerful paradigm for semantic retrieval, particularly for unstructured data. However, existing VectorSQL implementations still rely heavily on manual c… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Work in progess

  4. arXiv:2506.16096  [pdf, ps, other

    cs.LG cs.AI

    A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders

    Authors: Qianqian Liao, Wuque Cai, Hongze Sun, Dongze Liu, Duo Chen, Dezhong Yao, Daqing Guo

    Abstract: Recent developed graph-based methods for diagnosing brain disorders using functional connectivity highly rely on predefined brain atlases, but overlook the rich information embedded within atlases and the confounding effects of site and phenotype variability. To address these challenges, we propose a two-stage Brain-to-Population Graph Learning (B2P-GL) framework that integrates the semantic simil… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 16 pages, 7 figures, 13 tables; this paper has been submitted for possible publication

  5. arXiv:2506.01456  [pdf

    q-bio.GN cs.AI cs.LG q-bio.NC

    GenDMR: A dynamic multimodal role-swapping network for identifying risk gene phenotypes

    Authors: Lina Qin, Cheng Zhu, Chuqi Zhou, Yukun Huang, Jiayi Zhu, Ping Liang, Jinju Wang, Yixing Huang, Cheng Luo, Dezhong Yao, Ying Tan

    Abstract: Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic infor… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 31 pages, 9 figures

  6. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  7. arXiv:2505.19586  [pdf, ps, other

    cs.CL

    TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization

    Authors: Dingyu Yao, Bowen Shen, Zheng Lin, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

    Abstract: The Key-Value (KV) cache in generative large language models (LLMs) introduces substantial memory overhead. Existing works mitigate this burden by offloading or compressing the KV cache. However, loading the entire cache incurs significant latency due to PCIe bandwidth bottlenecks in CPU-GPU communication, while aggressive compression causes notable performance degradation. We identify that certai… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  8. arXiv:2505.17708  [pdf, ps, other

    cs.LG

    The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations

    Authors: Dingling Yao, Shimeng Huang, Riccardo Cadei, Kun Zhang, Francesco Locatello

    Abstract: Causal reasoning and discovery, two fundamental tasks of causal analysis, often face challenges in applications due to the complexity, noisiness, and high-dimensionality of real-world data. Despite recent progress in identifying latent causal structures using causal representation learning (CRL), what makes learned representations useful for causal downstream tasks and how to evaluate them are sti… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: 22 pages, 12 figures, 2 tables

  9. arXiv:2505.14910  [pdf, ps, other

    eess.AS cs.CL cs.SD

    TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao

    Abstract: Customizable multilingual zero-shot singing voice synthesis (SVS) has various potential applications in music composition and short video dubbing. However, existing SVS models overly depend on phoneme and note boundary annotations, limiting their robustness in zero-shot scenarios and producing poor transitions between phonemes and notes. Moreover, they also lack effective multi-level style control… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Findings of ACL 2025

  10. arXiv:2503.21761  [pdf, other

    cs.CV cs.AI cs.LG

    Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

    Authors: David Yifan Yao, Albert J. Zhai, Shenlong Wang

    Abstract: This paper presents a unified approach to understanding dynamic scenes from casual videos. Large pretrained vision foundation models, such as vision-language, video depth prediction, motion tracking, and segmentation models, offer promising capabilities. However, training a single model for comprehensive 4D understanding remains challenging. We introduce Uni4D, a multi-stage optimization framework… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project page (with code): https://davidyao99.github.io/uni4d

  11. arXiv:2503.12367  [pdf

    cs.LG physics.ao-ph

    Integrating mobile and fixed monitoring data for high-resolution PM2.5 mapping using machine learning

    Authors: Rui Xu, Dawen Yao, Yuzhuang Pian, Ruhui Cao, Yixin Fu, Xinru Yang, Ting Gan, Yonghong Liu

    Abstract: Constructing high resolution air pollution maps at lower cost is crucial for sustainable city management and public health risk assessment. However, traditional fixed-site monitoring lacks spatial coverage, while mobile low-cost sensors exhibit significant data instability. This study integrates PM2.5 data from 320 taxi-mounted mobile low-cost sensors and 52 fixed monitoring stations to address th… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2503.11720  [pdf, other

    cs.LG cs.AI

    Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

    Authors: Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang

    Abstract: We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as rewar… ▽ More

    Submitted 16 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  13. arXiv:2503.10195  [pdf, other

    cs.CV cs.NE q-bio.NC

    ST-FlowNet: An Efficient Spiking Neural Network for Event-Based Optical Flow Estimation

    Authors: Hongze Sun, Jun Wang, Wuque Cai, Duo Chen, Qianqian Liao, Jiayi He, Yan Cui, Dezhong Yao, Daqing Guo

    Abstract: Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-F… ▽ More

    Submitted 27 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 13 pages, 6 figures, 6 tables; This work has been submitted to Neural Networks for possible publication

  14. arXiv:2503.07032  [pdf, other

    cs.CL cs.CV

    Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

    Authors: Zhi Qin, Qianhui Gui, Mouxiao Bian, Rui Wang, Hong Ge, Dandan Yao, Ziying Sun, Yuan Zhao, Yu Zhang, Hui Shi, Dongdong Wang, Chenxin Song, Shenghong Ju, Lihao Liu, Junjun He, Jie Xu, Yuan-Cheng Wang

    Abstract: Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  15. arXiv:2503.04684  [pdf, other

    stat.ML cs.LG math.NA

    Propagating Model Uncertainty through Filtering-based Probabilistic Numerical ODE Solvers

    Authors: Dingling Yao, Filip Tronarp, Nathanael Bosch

    Abstract: Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs), also known as ODE filters, have been established as efficient methods for quantifying numerical uncertainty in the solution of ODEs. In practical applications, however, the underlying dynamical system often contains uncertain parameters, requiring the propagation of this model uncertainty to the ODE solutio… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  16. arXiv:2502.12084  [pdf, ps, other

    cs.CL

    VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues

    Authors: Jianshu Zhang, Dongyu Yao, Renjie Pi, Paul Pu Liang, Yi R. Fung

    Abstract: Visually linking matching cues is a crucial ability in daily life, such as identifying the same person in multiple photos based on their cues, even without knowing who they are. Despite the extensive knowledge that vision-language models (VLMs) possess, it remains largely unexplored whether they are capable of performing this fundamental task. To address this, we introduce \textbf{VLM2-Bench}, a b… ▽ More

    Submitted 2 July, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Project Page: https://vlm2-bench.github.io/ Camera Ready version

  17. arXiv:2502.08518  [pdf, other

    cs.LG cs.AI cs.DC

    FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices

    Authors: Dezhong Yao, Yuexin Shi, Tongtong Liu, Zhiqiang Xu

    Abstract: Federated Learning (FL) is increasingly adopted in edge computing scenarios, where a large number of heterogeneous clients operate under constrained or sufficient resources. The iterative training process in conventional FL introduces significant computation and communication overhead, which is unfriendly for resource-constrained edge devices. One-shot FL has emerged as a promising approach to mit… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  18. arXiv:2502.01819  [pdf, other

    cs.LG cs.AI math.OC

    Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

    Authors: Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

    Abstract: Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models. Most works in this area use a discrete-time formulation, which is prone to induced errors, and often not applicable to models with higher-order/black-box solvers. The objective of this study is to develop a disciplined approach to… ▽ More

    Submitted 16 April, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: arXiv admin note: text overlap with arXiv:2409.08400

  19. arXiv:2501.18196  [pdf, other

    cs.LG

    GDformer: Going Beyond Subsequence Isolation for Multivariate Time Series Anomaly Detection

    Authors: Qingxiang Liu, Chenghao Liu, Sheng Sun, Di Yao, Yuxuan Liang

    Abstract: Unsupervised anomaly detection of multivariate time series is a challenging task, given the requirements of deriving a compact detection criterion without accessing the anomaly points. The existing methods are mainly based on reconstruction error or association divergence, which are both confined to isolated subsequences with limited horizons, hardly promising unified series-level criterion. In th… ▽ More

    Submitted 9 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

  20. arXiv:2412.18820  [pdf, other

    cs.LG

    CausalTAD: Causal Implicit Generative Model for Debiased Online Trajectory Anomaly Detection

    Authors: Wenbin Li, Di Yao, Chang Gong, Xiaokai Chu, Quanliang Jing, Xiaolei Zhou, Yuxuan Zhang, Yunxia Fan, Jingping Bi

    Abstract: Trajectory anomaly detection, aiming to estimate the anomaly risk of trajectories given the Source-Destination (SD) pairs, has become a critical problem for many real-world applications. Existing solutions directly train a generative model for observed trajectories and calculate the conditional generative probability $P({T}|{C})$ as the anomaly risk, where ${T}$ and ${C}$ represent the trajectory… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: Accepted by ICDE 2024

  21. arXiv:2412.16955  [pdf, other

    cs.CV

    NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors

    Authors: Ziqi Zhou, Bowen Li, Yufei Song, Zhifei Yu, Shengshan Hu, Wei Wan, Leo Yu Zhang, Dezhong Yao, Hai Jin

    Abstract: With the advancement of deep learning, object detectors (ODs) with various architectures have achieved significant success in complex scenarios like autonomous driving. Previous adversarial attacks against ODs have been focused on designing customized attacks targeting their specific structures (e.g., NMS and RPN), yielding some results but simultaneously constraining their scalability. Moreover,… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  22. arXiv:2412.16581  [pdf, other

    cs.AI

    Effective and Efficient Representation Learning for Flight Trajectories

    Authors: Shuo Liu, Wenbin Li, Di Yao, Jingping Bi

    Abstract: Flight trajectory data plays a vital role in the traffic management community, especially for downstream tasks such as trajectory prediction, flight recognition, and anomaly detection. Existing works often utilize handcrafted features and design models for different tasks individually, which heavily rely on domain expertise and are hard to extend. We argue that different flight analysis tasks shar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  23. arXiv:2412.10033  [pdf, other

    cs.CV

    Timealign: A multi-modal object detection method for time misalignment fusing in autonomous driving

    Authors: Zhihang Song, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

    Abstract: The multi-modal perception methods are thriving in the autonomous driving field due to their better usage of complementary data from different sensors. Such methods depend on calibration and synchronization between sensors to get accurate environmental information. There have already been studies about space-alignment robustness in autonomous driving object detection process, however, the research… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: 8 pages, 3 figures

  24. arXiv:2412.09936  [pdf, other

    cs.CV

    CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models

    Authors: Dongyu Yao, Keling Yao, Junhong Zhou, Yinghao Zhang

    Abstract: The obesity phenomenon, known as the heavy issue, is a leading cause of preventable chronic diseases worldwide. Traditional calorie estimation tools often rely on specific data formats or complex pipelines, limiting their practicality in real-world scenarios. Recently, vision-language models (VLMs) have excelled in understanding real-world contexts and enabling conversational interactions, making… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: Disclaimer: This work is part of a course project and reflects ongoing exploration in the field of vision-language models and calorie estimation. Findings and conclusions are subject to further validation and refinement

    MSC Class: 68T07; 68U35 ACM Class: I.2.10; I.2.6; I.5.4

  25. arXiv:2412.00534  [pdf, other

    cs.LG cs.AI cs.MA

    Towards Fault Tolerance in Multi-Agent Reinforcement Learning

    Authors: Yuchen Shi, Huaxin Pei, Liang Feng, Yi Zhang, Danya Yao

    Abstract: Agent faults pose a significant threat to the performance of multi-agent reinforcement learning (MARL) algorithms, introducing two key challenges. First, agents often struggle to extract critical information from the chaotic state space created by unexpected faults. Second, transitions recorded before and after faults in the replay buffer affect training unevenly, leading to a sample imbalance pro… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

    Comments: 14 pages, 13 figures

  26. arXiv:2411.16154  [pdf, other

    cs.LG cs.CR

    DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders

    Authors: Sizai Hou, Songze Li, Duanyi Yao

    Abstract: Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. However, it is found to be susceptible to backdoor attacks merely via polluting a small portion of training data. The victim encoders associate triggered inputs with target embeddings, e.g., mapping a triggered cat image to an airplane embedding, such that the d… ▽ More

    Submitted 20 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: To appear on CVPR 2025

  27. Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects

    Authors: Dong Yao, Caizhi Tang, Qing Cui, Longfei Li

    Abstract: Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estima… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 10 pages, 4 figures, Accepted By CIKM2024

  28. arXiv:2410.06074  [pdf, other

    cs.LG

    Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

    Authors: Jiale Chen, Dingling Yao, Adeel Pervez, Dan Alistarh, Francesco Locatello

    Abstract: We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively,… ▽ More

    Submitted 1 April, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: Published as a conference paper at the Thirteenth International Conference on Learning Representations (ICLR 2025): https://openreview.net/forum?id=Oazgf8A24z

  29. arXiv:2410.04203  [pdf, other

    cs.AI

    RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

    Authors: Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David D. Yao, Wenpin Tang, Sambit Sahu

    Abstract: Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern whic… ▽ More

    Submitted 28 February, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

  30. arXiv:2409.17874  [pdf, other

    cs.AI

    DarkSAM: Fooling Segment Anything Model to Segment Nothing

    Authors: Ziqi Zhou, Yufei Song, Minghui Li, Shengshan Hu, Xianlong Wang, Leo Yu Zhang, Dezhong Yao, Hai Jin

    Abstract: Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. In this paper, we propose DarkSAM, the first prompt-free universal attack framework against SAM, including a semantic… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the 38th Annual Conference on Neural Information Processing Systems (NeurIPS'24)

  31. GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

    Authors: Chengzhong Wang, Jianjun Gu, Dingding Yao, Junfeng Li, Yonghong Yan

    Abstract: Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion model has gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the signal with isotropic Gaussian noise and recover clean speech from the prior. However, these methods often suffer from a substan… ▽ More

    Submitted 21 January, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

    Journal ref: IEEE Signal Processing Letters, vol. 32, pp. 426-430, 2025

  32. arXiv:2409.11564  [pdf, other

    cs.CL cs.AI cs.CV cs.LG eess.AS

    Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

    Authors: Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu

    Abstract: Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and data… ▽ More

    Submitted 2 November, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Survey paper

  33. arXiv:2409.08858  [pdf, other

    cs.DC

    Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection

    Authors: Dixi Yao

    Abstract: Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network bandwidth. However, to achieve the best global model performance, how we can utilize available memory and network bandwidth to the maximum remains an open challenge. In this paper, we propose to assi… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  34. arXiv:2409.08503  [pdf, other

    cs.LG cs.CR

    Enhancing Privacy in ControlNet and Stable Diffusion via Split Learning

    Authors: Dixi Yao

    Abstract: With the emerging trend of large generative models, ControlNet is introduced to enable users to fine-tune pre-trained models with their own data for various use cases. A natural question arises: how can we train ControlNet models while ensuring users' data privacy across distributed devices? Exploring different distributed training schemes, we find conventional federated learning and split learnin… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  35. arXiv:2409.08482  [pdf, other

    cs.LG cs.CR cs.CV

    Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights

    Authors: Dixi Yao

    Abstract: With the emerging trend in generative models and convenient public access to diffusion models pre-trained on large datasets, users can fine-tune these models to generate images of personal faces or items in new contexts described by natural language. Parameter efficient fine-tuning (PEFT) such as Low Rank Adaptation (LoRA) has become the most common way to save memory and computation usage on the… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  36. arXiv:2409.08400  [pdf, ps, other

    cs.LG cs.AI

    Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

    Authors: Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

    Abstract: Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide a rigorous treatment by formulating the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory con… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  37. arXiv:2409.03976  [pdf, other

    cs.HC

    DECAN: A Denoising Encoder via Contrastive Alignment Network for Dry Electrode EEG Emotion Recognition

    Authors: Meihong Zhang, Shaokai Zhao, Shuai Wang, Zhiguo Luo, Liang Xie, Tiejun Liu, Dezhong Yao, Ye Yan, Erwei Yin

    Abstract: EEG signal is important for brain-computer interfaces (BCI). Nevertheless, existing dry and wet electrodes are difficult to balance between high signal-to-noise ratio and portability in EEG recording, which limits the practical use of BCI. In this study, we propose a Denoising Encoder via Contrastive Alignment Network (DECAN) for dry electrode EEG, under the assumption of the EEG representation co… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  38. arXiv:2409.02772  [pdf, other

    cs.LG stat.ML

    Unifying Causal Representation Learning with the Invariance Principle

    Authors: Dingling Yao, Dario Rancati, Riccardo Cadei, Marco Fumero, Francesco Locatello

    Abstract: Causal representation learning (CRL) aims at recovering latent causal variables from high-dimensional observations to solve causal downstream tasks, such as predicting the effect of new interventions or more robust classification. A plethora of methods have been developed, each tackling carefully crafted problem settings that lead to different types of identifiability. These different settings are… ▽ More

    Submitted 5 March, 2025; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: ICLR2025 Camera ready

  39. arXiv:2408.13522  [pdf, other

    cs.SD eess.AS

    StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture

    Authors: Zelin Qiu, Dingding Yao, Junfeng Li

    Abstract: In this paper, we present our approach for the Track 1 of the Chinese Auditory Attention Decoding (Chinese AAD) Challenge at ISCSLP 2024. Most existing spatial auditory attention decoding (Sp-AAD) methods employ an isolated window architecture, focusing solely on global invariant features without considering relationships between different decision windows, which can lead to suboptimal performance… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  40. arXiv:2408.06300  [pdf

    cond-mat.mtrl-sci cs.LG

    Inverse designing metamaterials with programmable nonlinear functional responses in graph space

    Authors: Marco Maurizi, Derek Xu, Yu-Tong Wang, Desheng Yao, David Hahn, Mourad Oudich, Anish Satpati, Mathieu Bauchy, Wei Wang, Yizhou Sun, Yun Jing, Xiaoyu Rayne Zheng

    Abstract: Material responses to static and dynamic stimuli, represented as nonlinear curves, are design targets for engineering functionalities like structural support, impact protection, and acoustic and photonic bandgaps. Three-dimensional metamaterials offer significant tunability due to their internal structure, yet existing methods struggle to capture their complex behavior-to-structure relationships.… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 19 pages, 5 figures

  41. arXiv:2408.04310  [pdf, other

    cs.LG cs.CR

    Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed Bandit

    Authors: Duanyi Yao, Songze Li, Ye Xue, Jin Liu

    Abstract: Vertical federated learning (VFL), where each participating client holds a subset of data features, has found numerous applications in finance, healthcare, and IoT systems. However, adversarial attacks, particularly through the injection of adversarial examples (AEs), pose serious challenges to the security of VFL models. In this paper, we investigate such vulnerabilities through developing a nove… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Published on ICLR2024

  42. arXiv:2407.06498  [pdf, other

    cs.HC

    Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

    Authors: Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

    Abstract: The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  43. arXiv:2407.05869  [pdf, other

    cs.AI

    PORCA: Root Cause Analysis with Partially Observed Data

    Authors: Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi

    Abstract: Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which… ▽ More

    Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  44. arXiv:2407.00541  [pdf

    cs.CL cs.AI cs.IR

    Answering real-world clinical questions using large language model based systems

    Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

    Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

  45. arXiv:2407.00014  [pdf, other

    cs.RO eess.SY

    Kinetic and Kinematic Sensors-free Approach for Estimation of Continuous Force and Gesture in sEMG Prosthetic Hands

    Authors: Gang Liu, Zhenxiang Wang, Chuanmei Xi, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

    Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinetic and kinematic parameters. However, establishing these models requires complex sensors systems to collect corresponding kinetic and kinematic data in synchronization with sEMG, which is cumbersome and user-unfriendly. This paper proposes a kinetic and kinematic sensors-free approach for controllin… ▽ More

    Submitted 16 September, 2024; v1 submitted 1 May, 2024; originally announced July 2024.

    Comments: 17 pages

  46. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  47. CausalMMM: Learning Causal Structure for Marketing Mix Modeling

    Authors: Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi

    Abstract: In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better predictio… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: WSDM 2024, full version

  48. HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning

    Authors: Zhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu

    Abstract: Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGKDD 2024

  49. Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

    Authors: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo

    Abstract: Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication

  50. arXiv:2405.16848  [pdf, other

    cs.CV

    A re-calibration method for object detection with multi-modal alignment bias in autonomous driving

    Authors: Zhihang Song, Dingyi Yao, Ruibo MIng, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

    Abstract: Multi-modal object detection in autonomous driving has achieved great breakthroughs due to the usage of fusing complementary information from different sensors. The calibration in fusion between sensors such as LiDAR and camera is always supposed to be precise in previous work. However, in reality, calibration matrices are fixed when the vehicles leave the factory, but vibration, bumps, and data l… ▽ More

    Submitted 20 May, 2025; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures