Skip to main content

Showing 1–28 of 28 results for author: Lee, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.06401  [pdf, other

    stat.CO stat.ME

    fastfrechet: An R package for fast implementation of Fréchet regression with distributional responses

    Authors: Alexander Coulter, Rebecca Lee, Irina Gaynanova

    Abstract: Distribution-as-response regression problems are gaining wider attention, especially within biomedical settings where observation-rich patient specific data sets are available, such as feature densities in CT scans (Petersen et al., 2021) actigraphy (Ghosal et al., 2023), and continuous glucose monitoring (Coulter et al., 2024; Matabuena et al., 2021). To accommodate the complex structure of such… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 4 pages, 2 figures

  2. arXiv:2503.05574  [pdf, ps, other

    cs.LG math.OC stat.ML

    BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

    Authors: Toby Boyne, Jose Pablo Folch, Robert M Lee, Behrang Shafei, Ruth Misener

    Abstract: We perform Bayesian optimization using a Gaussian process perspective on Bayesian Additive Regression Trees (BART). Our BART Kernel (BARK) uses tree agreement to define a posterior over piecewise-constant functions, and we explore the space of tree kernels using a Markov chain Monte Carlo approach. Where BART only samples functions, the resulting BARK model obtains samples of Gaussian processes de… ▽ More

    Submitted 6 June, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 9 main pages, 28 total pages, 14 figures, 9 tables

  3. arXiv:2502.06970  [pdf, other

    cs.LG stat.ML

    Model Diffusion for Certifiable Few-shot Transfer Learning

    Authors: Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim

    Abstract: In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deploymen… ▽ More

    Submitted 28 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  4. arXiv:2408.05040  [pdf, ps, other

    cs.LG math.OC stat.ML

    BoFire: Bayesian Optimization Framework Intended for Real Experiments

    Authors: Johannes P. Dürholt, Thomas S. Asche, Johanna Kleinekorte, Gabriel Mancino-Ball, Benjamin Schiller, Simon Sung, Julian Keupp, Aaron Osburg, Toby Boyne, Ruth Misener, Rosona Eldred, Wagner Steuer Costa, Chrysoula Kappatou, Robert M. Lee, Dominik Linzner, David Walz, Niklas Wulkow, Behrang Shafei

    Abstract: Our open-source Python package BoFire combines Bayesian Optimization (BO) with other design of experiments (DoE) strategies focusing on developing and optimizing new chemistry. Previous BO implementations, for example as they exist in the literature or software, require substantial adaptation for effective real-world deployment in chemical industry. BoFire provides a rich feature-set with extensiv… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 6 pages, 1 figure, 1 listing

  5. arXiv:2406.00549  [pdf, other

    stat.ME cs.AI

    Zero Inflation as a Missing Data Problem: a Proxy-based Approach

    Authors: Trung Phung, Jaron J. R. Lee, Opeyemi Oladapo-Shittu, Eili Y. Klein, Ayse Pinar Gurses, Susan M. Hannum, Kimberly Weems, Jill A. Marsteller, Sara E. Cosgrove, Sara C. Keller, Ilya Shpitser

    Abstract: A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e.g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros,… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: 28 pages, 8 figues, accepted for the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

  6. arXiv:2405.10221  [pdf, other

    math.OC cs.LG stat.ML

    Scalarisation-based risk concepts for robust multi-objective optimisation

    Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

    Abstract: Robust optimisation is a well-established framework for optimising functions in the presence of uncertainty. The inherent goal of this problem is to identify a collection of inputs whose outputs are both desirable for the decision maker, whilst also being robust to the underlying uncertainties in the problem. In this work, we study the multi-objective case of this problem. We identify that the maj… ▽ More

    Submitted 25 May, 2025; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: EJOR 2025. 40 pages. Code is available at: https://github.com/benmltu/scalarize

  7. arXiv:2405.01404  [pdf, other

    stat.ML cs.LG math.OC stat.ME

    Random Pareto front surfaces

    Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

    Abstract: The goal of multi-objective optimisation is to identify the Pareto front surface which is the set obtained by connecting the best trade-off points. Typically this surface is computed by evaluating the objectives at different points and then interpolating between the subset of the best evaluated trade-off points. In this work, we propose to parameterise the Pareto front surface using polar coordina… ▽ More

    Submitted 21 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: The code is available at: https://github.com/benmltu/scalarize

  8. arXiv:2404.06602  [pdf, ps, other

    stat.ME

    A General Identification Algorithm For Data Fusion Problems Under Systematic Selection

    Authors: Jaron J. R. Lee, AmirEmad Ghassami, Ilya Shpitser

    Abstract: Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a different set of variables, or be obtained under different experimental conditions than the primary dataset. Analysis based on multiple datasets must carefully… ▽ More

    Submitted 15 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages

  9. arXiv:2312.00622  [pdf, other

    cs.LG math.OC stat.ME

    Practical Path-based Bayesian Optimization

    Authors: Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

    Abstract: There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itse… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 6 main pages, 12 with references and appendix. 4 figures, 2 tables. To appear in NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

    Journal ref: NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

  10. arXiv:2308.04212  [pdf, other

    stat.ML cs.LG stat.ME

    Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study

    Authors: Seyoung Park, Eun Ryung Lee, Hyokyoung G. Hong

    Abstract: Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which ca… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  11. arXiv:2305.11774  [pdf, other

    math.OC cs.LG stat.ML

    Multi-objective optimisation via the R2 utilities

    Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

    Abstract: The goal of multi-objective optimisation is to identify a collection of points which describe the best possible trade-offs between the multiple objectives. In order to solve this vector-valued optimisation problem, practitioners often appeal to the use of scalarisation functions in order to transform the multi-objective problem into a collection of single-objective problems. This set of scalarised… ▽ More

    Submitted 8 May, 2025; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: SIAM Review 2025. 47 pages. Code is available at: https://github.com/benmltu/scalarize

  12. arXiv:2301.11477  [pdf, other

    stat.ME cs.MS

    Ananke: A Python Package For Causal Inference Using Graphical Models

    Authors: Jaron J. R. Lee, Rohit Bhattacharya, Razieh Nabi, Ilya Shpitser

    Abstract: We implement Ananke: an object-oriented Python package for causal inference with graphical models. At the top of our inheritance structure is an easily extensible Graph class that provides an interface to several broadly useful graph-based algorithms and methods for visualization. We use best practices of object-oriented programming to implement subclasses of the Graph superclass that correspond t… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  13. arXiv:2211.06149  [pdf, other

    cs.LG cs.CE stat.ML

    Combining Multi-Fidelity Modelling and Asynchronous Batch Bayesian Optimization

    Authors: Jose Pablo Folch, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

    Abstract: Bayesian Optimization is a useful tool for experiment design. Unfortunately, the classical, sequential setting of Bayesian Optimization does not translate well into laboratory experiments, for instance battery design, where measurements may come from different sources and their evaluations may require significant waiting times. Multi-fidelity Bayesian Optimization addresses the setting with measur… ▽ More

    Submitted 23 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 19 pages in main paper / 28 with references and appendix, 7 figures, 2 tables, accepted into Computers and Chemical Engineering

  14. arXiv:2207.00879  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces

    Authors: Alexander Thebelt, Calvin Tsay, Robert M. Lee, Nathan Sudermann-Merx, David Walz, Behrang Shafei, Ruth Misener

    Abstract: Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search, as they achieve good predictive performance with little or no manual tuning, naturally handle discrete feature spaces, and are relatively insensitive to outliers in the training data. Two well-known challenges in using tree ensembles for black-box optimization are (i) effecti… ▽ More

    Submitted 30 December, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

    Comments: 27 pages, 9 figures, 4 tables

  15. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  16. arXiv:2111.03140  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    Multi-Objective Constrained Optimization for Energy Applications via Tree Ensembles

    Authors: Alexander Thebelt, Calvin Tsay, Robert M. Lee, Nathan Sudermann-Merx, David Walz, Tom Tranter, Ruth Misener

    Abstract: Energy systems optimization problems are complex due to strongly non-linear system behavior and multiple competing objectives, e.g. economic gain vs. environmental impact. Moreover, a large number of input variables and different variable types, e.g. continuous and categorical, are challenges commonly present in real-world applications. In some cases, proposed optimal solutions need to obey explic… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 36 pages, 8 figures, 5 tables

  17. arXiv:2105.08868  [pdf, other

    stat.ME

    Markov-Restricted Analysis of Randomized Trials with Non-Monotone Missing Binary Outcomes

    Authors: Jaron J. R. Lee, Agatha S. Mallett, Ilya Shpitser, Aimee Campbell, Edward Nunes, Daniel O. Scharfstein

    Abstract: Scharfstein et al. (2021) developed a sensitivity analysis model for analyzing randomized trials with repeatedly measured binary outcomes that are subject to nonmonotone missingness. Their approach becomes computationally intractable when the number of measurements is large (e.g., greater than 15). In this paper, we repair this problem by introducing mth-order Markovian restrictions. We establish… ▽ More

    Submitted 23 August, 2024; v1 submitted 18 May, 2021; originally announced May 2021.

  18. arXiv:2007.08668  [pdf, other

    cs.LG eess.SP stat.ML

    BRP-NAS: Prediction-based NAS using GCNs

    Authors: Łukasz Dudziak, Thomas Chau, Mohamed S. Abdelfattah, Royson Lee, Hyeji Kim, Nicholas D. Lane

    Abstract: Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks. This efficiency is especially important in the case of on-device deployment, where improvements in accuracy should be balanced out with computational demands of a model. In practice, performance metrics of model are computationally expensive to obtain… ▽ More

    Submitted 19 January, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at NeurIPS 2020

  19. arXiv:2005.02979  [pdf, ps, other

    cs.LG cs.AI eess.SY stat.ML

    A Survey of Algorithms for Black-Box Safety Validation of Cyber-Physical Systems

    Authors: Anthony Corso, Robert J. Moss, Mark Koren, Ritchie Lee, Mykel J. Kochenderfer

    Abstract: Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a blac… ▽ More

    Submitted 14 October, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

    Journal ref: Journal of Artificial Intelligence Research, vol. 72, p. 377-428, 2021

  20. arXiv:2004.06801  [pdf, other

    cs.RO cs.LG eess.SY stat.ML

    Scalable Autonomous Vehicle Safety Validation through Dynamic Programming and Scene Decomposition

    Authors: Anthony Corso, Ritchie Lee, Mykel J. Kochenderfer

    Abstract: An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over f… ▽ More

    Submitted 26 June, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

  21. arXiv:2004.01157  [pdf, ps, other

    stat.ML cs.LG

    Identification Methods With Arbitrary Interventional Distributions as Inputs

    Authors: Jaron J. R. Lee, Ilya Shpitser

    Abstract: Causal inference quantifies cause-effect relationships by estimating counterfactual parameters from data. This entails using \emph{identification theory} to establish a link between counterfactual parameters of interest and distributions from which data is available. A line of work characterized non-parametric identification for a wide variety of causal parameters in terms of the \emph{observed da… ▽ More

    Submitted 15 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

  22. arXiv:2003.04774  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    ENTMOOT: A Framework for Optimization over Ensemble Tree Models

    Authors: Alexander Thebelt, Jan Kronqvist, Miten Mistry, Robert M. Lee, Nathan Sudermann-Merx, Ruth Misener

    Abstract: Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which i… ▽ More

    Submitted 18 May, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 33 pages, 10 figures, 2 tables

  23. arXiv:2001.05472  [pdf, other

    quant-ph cs.LG stat.ML

    Machine learning transfer efficiencies for noisy quantum walks

    Authors: Alexey A. Melnikov, Leonid E. Fedichkin, Ray-Kuang Lee, Alexander Alodjants

    Abstract: Quantum effects are known to provide an advantage in particle transfer across networks. In order to achieve this advantage, requirements on both a graph type and a quantum system coherence must be found. Here we show that the process of finding these requirements can be automated by learning from simulated examples. The automation is done by using a convolutional neural network of a particular typ… ▽ More

    Submitted 18 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: 6 pages, 4 figures

    Journal ref: Adv. Quantum Technol. 3, 1900115 (2020)

  24. arXiv:1911.08666  [pdf, other

    cs.LG cs.RO stat.ML

    Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks

    Authors: Vibhavari Dasagi, Robert Lee, Jake Bruce, Jürgen Leitner

    Abstract: Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  25. arXiv:1902.01909  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    Adaptive Stress Testing for Autonomous Vehicles

    Authors: Mark Koren, Saud Alsaif, Ritchie Lee, Mykel J. Kochenderfer

    Abstract: This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle's environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to fin… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  26. arXiv:1809.07480  [pdf, other

    cs.LG stat.ML

    Sim-to-Real Transfer of Robot Learning with Variable Length Inputs

    Authors: Vibhavari Dasagi, Robert Lee, Serena Mou, Jake Bruce, Niko Sünderhauf, Jürgen Leitner

    Abstract: Current end-to-end deep Reinforcement Learning (RL) approaches require jointly learning perception, decision-making and low-level control from very sparse reward signals and high-dimensional inputs, with little capability of incorporating prior knowledge. This results in prohibitively long training times for use on real-world robotic tasks. Existing algorithms capable of extracting task-level repr… ▽ More

    Submitted 8 October, 2019; v1 submitted 20 September, 2018; originally announced September 2018.

  27. arXiv:1801.05394  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Time Series Segmentation through Automatic Feature Learning

    Authors: Wei-Han Lee, Jorge Ortiz, Bongjun Ko, Ruby Lee

    Abstract: Internet of things (IoT) applications have become increasingly popular in recent years, with applications ranging from building energy monitoring to personal health tracking and activity recognition. In order to leverage these data, automatic knowledge extraction - whereby we map from observations to interpretable states and transitions - must be done at scale. As such, we have seen many recent Io… ▽ More

    Submitted 26 January, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

  28. Single Nugget Kriging

    Authors: Minyong R. Lee, Art B. Owen

    Abstract: We propose a method with better predictions at extreme values than the standard method of Kriging. We construct our predictor in two ways: by penalizing the mean squared error through conditional bias and by penalizing the conditional likelihood at the target function value. Our prediction exhibits robustness to the model mismatch in the covariance parameters, a desirable feature for computer simu… ▽ More

    Submitted 20 July, 2015; v1 submitted 17 July, 2015; originally announced July 2015.

    Journal ref: Statistica Sinica 28 (2018), 649-669