Skip to main content

Showing 1–20 of 20 results for author: Ding, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.16870  [pdf, other

    stat.ME

    CoCA: Cooperative Component Analysis

    Authors: Daisy Yi Ding, Alden Green, Min Woo Sun, Robert Tibshirani

    Abstract: We propose Cooperative Component Analysis (CoCA), a new method for unsupervised multi-view analysis: it identifies the component that simultaneously captures significant within-view variance and exhibits strong cross-view correlation. The challenge of integrating multi-view data is particularly important in biology and medicine, where various types of "-omic" data, ranging from genomics to proteom… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  2. arXiv:2405.19544  [pdf, other

    cs.AI cs.CL cs.LG math.OC stat.ML

    One-Shot Safety Alignment for Large Language Models via Optimal Dualization

    Authors: Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding

    Abstract: The growing safety concerns surrounding large language models raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, typical Lagrangian-based primal-dual policy optimization methods are computa… ▽ More

    Submitted 22 November, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 32 pages, 6 figures, 8 tables

  3. arXiv:2308.09444  [pdf, other

    cs.LG stat.ML

    An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network

    Authors: Weiguo Lu, Xuan Wu, Deng Ding, Gangnan Yuan

    Abstract: We propose an Gaussian Mixture Model (GMM) learning algorithm, based on our previous work of GMM expansion idea. The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning. We theoretically proof that this new algorithm is guarantee to converge regardless the parameters initialis… ▽ More

    Submitted 6 September, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

  4. arXiv:2308.01458  [pdf, other

    q-bio.QM q-bio.GN stat.AP

    Semi-supervised Cooperative Learning for Multiomics Data Fusion

    Authors: Daisy Yi Ding, Xiaotao Shen, Michael Snyder, Robert Tibshirani

    Abstract: Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and of… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: The 2023 ICML Workshop on Machine Learning for Multimodal Healthcare Data. arXiv admin note: text overlap with arXiv:2112.12337

  5. arXiv:2112.12337  [pdf, other

    stat.ME q-bio.QM stat.ML

    Cooperative learning for multiview analysis

    Authors: Daisy Yi Ding, Shuangning Li, Balasubramanian Narasimhan, Robert Tibshirani

    Abstract: We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from… ▽ More

    Submitted 3 September, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  6. arXiv:2010.16418  [pdf, other

    cs.LG cs.SI stat.ML

    Handling Missing Data with Graph Representation Learning

    Authors: Jiaxuan You, Xiaobai Ma, Daisy Yi Ding, Mykel Kochenderfer, Jure Leskovec

    Abstract: Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting… ▽ More

    Submitted 30 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  7. arXiv:2010.11385  [pdf, other

    stat.ME stat.AP

    Dirichlet Process Mixture Models with Shrinkage Prior

    Authors: Dawei Ding, George Karabatsos

    Abstract: We propose Dirichlet Process Mixture (DPM) models for prediction and cluster-wise variable selection, based on two choices of shrinkage baseline prior distributions for the linear regression coefficients, namely the Horseshoe prior and Normal-Gamma prior. We show in a simulation study that each of the two proposed DPM models tend to outperform the standard DPM model based on the non-shrinkage norm… ▽ More

    Submitted 25 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

  8. arXiv:2009.02845  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Fast and Secure Distributed Nonnegative Matrix Factorization

    Authors: Yuqiu Qian, Conghui Tan, Danhao Ding, Hui Li, Nikos Mamoulis

    Abstract: Nonnegative matrix factorization (NMF) has been successfully applied in several data mining tasks. Recently, there is an increasing interest in the acceleration of NMF, due to its high cost on large matrices. On the other hand, the privacy issue of NMF over federated data is worthy of attention, since NMF is prevalently applied in image and text analysis which may involve leveraging privacy data (… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

    Comments: Published in IEEE Transactions on Knowledge and Data Engineering (TKDE). This arXiv version includes the appendices with additional proofs

  9. arXiv:2003.00534  [pdf, ps, other

    cs.LG math.OC stat.ML

    Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

    Authors: Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

    Abstract: We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation in which an agent aims to maximize the expected total reward subject to a safety constraint on the expected total value of a utility function. We focus on an episodic setting with the function approximation where the Markov transition kernels have a linear structure but do not im… ▽ More

    Submitted 25 October, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

    Comments: 44 pages. We have revised the linear MDP assumption and fixed a bug in our previous proofs

  10. arXiv:1912.06910  [pdf, other

    cs.LG cs.AI stat.ML

    Adapting Behaviour for Learning Progress

    Authors: Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

    Abstract: Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data fo… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  11. arXiv:1911.07084  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare

    Authors: Scott L. Fleming, Kuhan Jeyapragasan, Tony Duan, Daisy Ding, Saurabh Gombar, Nigam Shah, Emma Brunskill

    Abstract: There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, clinical datasets for reinforcement learning algorithms, many researchers will resample the time series data to short, regular intervals and use last-observation-carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not maintain any expl… ▽ More

    Submitted 16 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  12. arXiv:1910.03225  [pdf, other

    cs.LG stat.ML

    NGBoost: Natural Gradient Boosting for Probabilistic Prediction

    Authors: Tony Duan, Anand Avati, Daisy Yi Ding, Khanh K. Thai, Sanjay Basu, Andrew Y. Ng, Alejandro Schuler

    Abstract: We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation -- crucial in applica… ▽ More

    Submitted 9 June, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Accepted for ICML 2020

  13. arXiv:1907.06260  [pdf, other

    cs.LG cs.CY stat.ML

    Counterfactual Reasoning for Fair Clinical Risk Prediction

    Authors: Stephen Pfohl, Tony Duan, Daisy Yi Ding, Nigam H. Shah

    Abstract: The use of machine learning systems to support decision making in healthcare raises questions as to what extent these systems may introduce or exacerbate disparities in care for historically underrepresented and mistreated groups, due to biases implicitly embedded in observational data in electronic health records. To address this problem in the context of clinical risk prediction models, we devel… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

    Comments: Machine Learning for Healthcare 2019

  14. arXiv:1808.08400  [pdf, other

    stat.ME

    Tree-based Particle Smoothing Algorithms in a Hidden Markov Model

    Authors: Dong Ding, Axel Gandy

    Abstract: We provide a new strategy built on the divide-and-conquer approach by Lindsten et al. (2017) to investigate the smoothing problem in a hidden Markov model. We employ this approach to decompose a hidden Markov model into sub-models with intermediate target distributions based on an auxiliary tree structure and produce independent samples from the sub-models at the leaf nodes towards the original mo… ▽ More

    Submitted 25 August, 2018; originally announced August 2018.

  15. arXiv:1808.03331  [pdf, other

    stat.ML cs.LG

    The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data

    Authors: Daisy Yi Ding, Chloé Simpson, Stephen Pfohl, Dave C. Kale, Kenneth Jung, Nigam H. Shah

    Abstract: Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aim… ▽ More

    Submitted 5 January, 2019; v1 submitted 9 August, 2018; originally announced August 2018.

    Comments: Pacific Symposium on Biocomputing (PSB) 2019, Hawaii, https://psb.stanford.edu/psb-online/; 13 pages, 7 figures

  16. arXiv:1806.07001  [pdf, ps, other

    stat.ML cs.LG

    Theoretical Analysis of Image-to-Image Translation with Adversarial Learning

    Authors: Xudong Pan, Mi Zhang, Daizong Ding

    Abstract: Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a f… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: will appear in ICML2018

  17. arXiv:1711.05225  [pdf, other

    cs.CV cs.LG stat.ML

    CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

    Authors: Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

    Abstract: We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on w… ▽ More

    Submitted 25 December, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

  18. arXiv:1709.06680  [pdf, other

    stat.ML cs.LG

    Deep Lattice Networks and Partial Monotonic Functions

    Authors: Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya Gupta

    Abstract: We propose learning deep models that are monotonic with respect to a user-specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network. We implement the layers and projections with new computational graph nodes in TensorFlow and use t… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

    Comments: 9 pages, NIPS 2017

  19. arXiv:1703.09305  [pdf, other

    stat.ME

    Implementing Monte Carlo Tests with P-value Buckets

    Authors: Axel Gandy, Georg Hahn, Dong Ding

    Abstract: Software packages usually report the results of statistical tests using p-values. Users often interpret these by comparing them to standard thresholds, e.g. 0.1%, 1% and 5%, which is sometimes reinforced by a star rating (***, **, *). We consider an arbitrary statistical test whose p-value p is not available explicitly, but can be approximated by Monte Carlo samples, e.g. by bootstrap or permutati… ▽ More

    Submitted 4 November, 2019; v1 submitted 27 March, 2017; originally announced March 2017.

  20. arXiv:1611.01675  [pdf, other

    stat.ME

    A simple method for implementing Monte Carlo tests

    Authors: Dong Ding, Axel Gandy, Georg Hahn

    Abstract: We consider a statistical test whose p-value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p-value for an observed data set lies above or below a given threshold such as 5%. We want to ensure that the resampling risk, the probability of the (Monte Carlo) decision being different from the true decision, is uniformly bounded. This article introduce… ▽ More

    Submitted 9 October, 2019; v1 submitted 5 November, 2016; originally announced November 2016.