-
Rate-Informed Discovery via Bayesian Adaptive Multifidelity Sampling
Authors:
Aman Sinha,
Payam Nikdel,
Supratik Paul,
Shimon Whiteson
Abstract:
Ensuring the safety of autonomous vehicles (AVs) requires both accurate estimation of their performance and efficient discovery of potential failure cases. This paper introduces Bayesian adaptive multifidelity sampling (BAMS), which leverages the power of adaptive Bayesian sampling to achieve efficient discovery while simultaneously estimating the rate of adverse events. BAMS prioritizes explorati…
▽ More
Ensuring the safety of autonomous vehicles (AVs) requires both accurate estimation of their performance and efficient discovery of potential failure cases. This paper introduces Bayesian adaptive multifidelity sampling (BAMS), which leverages the power of adaptive Bayesian sampling to achieve efficient discovery while simultaneously estimating the rate of adverse events. BAMS prioritizes exploration of regions with potentially low performance, leading to the identification of novel and critical scenarios that traditional methods might miss. Using real-world AV data we demonstrate that BAMS discovers 10 times as many issues as Monte Carlo (MC) and importance sampling (IS) baselines, while at the same time generating rate estimates with variances 15 and 6 times narrower than MC and IS baselines respectively.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
Authors:
M. M. A. Valiuddin,
R. J. G. van Sloun,
C. G. A. Viviers,
P. H. N. de With,
F. van der Sommen
Abstract:
Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epist…
▽ More
Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision-making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation, by discussing fundamental concepts of uncertainty quantification, governing advancements in the field as well as the application to various tasks. Moreover, literature on both types of uncertainties trace back to four key applications: (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) Active Learning. An extensive discussion follows that includes an overview of utilized datasets for each of the applications and evaluation of the available methods. We also highlight challenges related to architectures, uncertainty quantification methods, standardization and benchmarking, and finally end with recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.
△ Less
Submitted 12 March, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
PICZL: Image-based Photometric Redshifts for AGN
Authors:
William Roster,
Mara Salvato,
Sven Krippendorf,
Aman Saxena,
Raphael Shirley,
Johannes Buchner,
Julien Wolf,
Tom Dwelly,
Franz E. Bauer,
James Aird,
Claudio Ricci,
Roberto J. Assef,
Scott F. Anderson,
Xin Liu,
Andrea Merloni,
Jochen Weller,
Kirpal Nandra
Abstract:
Computing photo-z for AGN is challenging, primarily due to the interplay of relative emissions associated with the SMBH and its host galaxy. SED fitting methods, effective in pencil-beam surveys, face limitations in all-sky surveys with fewer bands available, lacking the ability to capture the AGN contribution to the SED accurately. This limitation affects the many 10s of millions of AGN clearly s…
▽ More
Computing photo-z for AGN is challenging, primarily due to the interplay of relative emissions associated with the SMBH and its host galaxy. SED fitting methods, effective in pencil-beam surveys, face limitations in all-sky surveys with fewer bands available, lacking the ability to capture the AGN contribution to the SED accurately. This limitation affects the many 10s of millions of AGN clearly singled out and identified by SRG/eROSITA. Our goal is to significantly enhance photometric redshift performance for AGN in all-sky surveys while avoiding the need to merge multiple data sets. Instead, we employ readily available data products from the 10th Data Release of the Imaging Legacy Survey for DESI, covering > 20,000 deg$^{2}$ with deep images and catalog-based photometry in the grizW1-W4 bands. We introduce PICZL, a machine-learning algorithm leveraging an ensemble of CNNs. Utilizing a cross-channel approach, the algorithm integrates distinct SED features from images with those obtained from catalog-level data. Full probability distributions are achieved via the integration of Gaussian mixture models. On a validation sample of 8098 AGN, PICZL achieves a variance $σ_{\textrm{NMAD}}$ of 4.5% with an outlier fraction $η$ of 5.6%, outperforming previous attempts to compute accurate photo-z for AGN using ML. We highlight that the model's performance depends on many variables, predominantly the depth of the data. A thorough evaluation of these dependencies is presented in the paper. Our streamlined methodology maintains consistent performance across the entire survey area when accounting for differing data quality. The same approach can be adopted for future deep photometric surveys such as LSST and Euclid, showcasing its potential for wide-scale realisation. With this paper, we release updated photo-z (including errors) for the XMM-SERVS W-CDF-S, ELAIS-S1 and LSS fields.
△ Less
Submitted 13 November, 2024; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Authors:
Gavia Gray,
Aman Tiwari,
Shane Bergsma,
Joel Hestness
Abstract:
Per-example gradient norms are a vital ingredient for estimating gradient noise scale (GNS) with minimal variance. Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. Using this method we are able to observe the GNS of different layers at hig…
▽ More
Per-example gradient norms are a vital ingredient for estimating gradient noise scale (GNS) with minimal variance. Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. Using this method we are able to observe the GNS of different layers at higher accuracy than previously possible. We find that the total GNS of contemporary transformer models is predicted well by the GNS of only the normalization layers. As a result, focusing only on the normalization layer, we develop a custom kernel to compute the per-example gradient norms while performing the LayerNorm backward pass with zero throughput overhead. Tracking GNS on only those layers, we are able to guide a practical batch size schedule that reduces training time by 18% on a Chinchilla-optimal language model.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Disentangling Representations through Multi-task Learning
Authors:
Pantelis Vafidis,
Aman Bhargava,
Antonio Rangel
Abstract:
Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure (''disentangled'' or ''abstract'' representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along approximately orthogonal directions, thus facilitating feature-based generalization. We provide experimental and…
▽ More
Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure (''disentangled'' or ''abstract'' representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along approximately orthogonal directions, thus facilitating feature-based generalization. We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence accumulation classification tasks, canonical in the neuroscience literature. The key conceptual finding is that, by producing accurate multi-task classification estimates, a system implicitly represents a set of coordinates specifying a disentangled representation of the underlying latent state of the data it receives. The theory provides conditions for the emergence of these representations in terms of noise, number of tasks, and evidence accumulation time. We experimentally validate these predictions in RNNs trained to multi-task, which learn disentangled representations in the form of continuous attractors, leading to zero-shot out-of-distribution (OOD) generalization in predicting latent factors. We demonstrate the robustness of our framework across autoregressive architectures, decision boundary geometries and in tasks requiring classification confidence estimation. We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities. Overall, our framework establishes a formal link between competence at multiple tasks and the formation of disentangled, interpretable world models in both biological and artificial systems, and helps explain why ANNs often arrive at human-interpretable concepts, and how they both may acquire exceptional zero-shot generalization capabilities.
△ Less
Submitted 2 March, 2025; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Neural Optimization with Adaptive Heuristics for Intelligent Marketing System
Authors:
Changshuai Wei,
Benjamin Zelditch,
Joyce Chen,
Andre Assuncao Silva T Ribeiro,
Jingyi Kenneth Tay,
Borja Ocejo Elizondo,
Keerthi Selvaraj,
Aman Gupta,
Licurgo Benemann De Almeida
Abstract:
Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimizat…
▽ More
Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.
△ Less
Submitted 25 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection
Authors:
Ankan Kar,
Nirjhar Nath,
Utpalraj Kemprai,
Aman
Abstract:
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit prof…
▽ More
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
△ Less
Submitted 7 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
QuantEase: Optimization-based Quantization for Language Models
Authors:
Kayhan Behdin,
Ayan Acharya,
Aman Gupta,
Qingquan Song,
Siyu Zhu,
Sathiya Keerthi,
Rahul Mazumder
Abstract:
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f…
▽ More
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity.
△ Less
Submitted 1 December, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Leveraging Factored Action Spaces for Off-Policy Evaluation
Authors:
Aaman Rebello,
Shengpu Tang,
Jenna Wiens,
Sonali Parbhoo
Abstract:
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combinati…
▽ More
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
An ensemble of convolution-based methods for fault detection using vibration signals
Authors:
Xian Yeow Lee,
Aman Kumar,
Lasitha Vidyaratne,
Aniruddha Rajendra Rao,
Ahmed Farahat,
Chetan Gupta
Abstract:
This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st…
▽ More
This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent studies have shown using convolution kernel-based methods like ROCKET, and 1D convolutional neural networks with ResNet and FCN, have robust performance for multivariate time-series data classification. We propose an ensemble of three convolution kernel-based methods and show its efficacy on this fault detection problem by outperforming other approaches and achieving an accuracy of more than 98.8\%.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Authors:
Kayhan Behdin,
Qingquan Song,
Aman Gupta,
Sathiya Keerthi,
Ayan Acharya,
Borja Ocejo,
Gregory Dexter,
Rajiv Khanna,
David Durfee,
Rahul Mazumder
Abstract:
Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as…
▽ More
Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.
△ Less
Submitted 30 September, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Heterogeneous Calibration: A post-hoc model-agnostic framework for improved generalization
Authors:
David Durfee,
Aman Gupta,
Kinjal Basu
Abstract:
We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks. We consider overconfident models, whose performance is significantly better on training vs test data and give intuition onto why they might under-utilize moderately effective simple patterns in the data. We refer t…
▽ More
We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks. We consider overconfident models, whose performance is significantly better on training vs test data and give intuition onto why they might under-utilize moderately effective simple patterns in the data. We refer to these simple patterns as heterogeneous partitions of the feature space and show theoretically that perfectly calibrating each partition separately optimizes AUC. This gives a general paradigm of heterogeneous calibration as a post-hoc procedure by which heterogeneous partitions of the feature space are identified through tree-based algorithms and post-hoc calibration techniques are applied to each partition to improve AUC. While the theoretical optimality of this framework holds for any model, we focus on deep neural networks (DNNs) and test the simplest instantiation of this paradigm on a variety of open-source datasets. Experiments demonstrate the effectiveness of this framework and the future potential for applying higher-performing partitioning schemes along with more effective calibration techniques.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Community detection using low-dimensional network embedding algorithms
Authors:
Aman Barot,
Shankar Bhamidi,
Souvik Dhara
Abstract:
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learnin…
▽ More
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Logit Attenuating Weight Normalization
Authors:
Aman Gupta,
Rohan Ramanath,
Jun Shi,
Anika Ramachandran,
Sirou Zhou,
Mingzhou Zhou,
S. Sathiya Keerthi
Abstract:
Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned $\ell_2$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move…
▽ More
Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned $\ell_2$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move around) in the parameter space. Although regularization is typically understood from an overfitting perspective, we highlight its role in making the network more adaptive and enabling it to escape more easily from weights that generalize poorly. To provide such a capability, we propose a method called Logit Attenuating Weight Normalization (LAWN), that can be stacked onto any gradient-based optimizer. LAWN controls the logits by constraining the weight norms of layers in the final homogeneous sub-network. Empirically, we show that the resulting LAWN variant of the optimizer makes a deep network more adaptive to finding minimas with superior generalization performance on large-scale image classification and recommender systems. While LAWN is particularly impressive in improving Adam, it greatly improves all optimizers when used with large batch sizes
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
U.S. Power Resilience for 2002--2019
Authors:
Aman Ankit,
Zhanlin Liu,
Scott B. Miles,
Youngjun Choe
Abstract:
Prolonged power outages debilitate the economy and threaten public health. Existing research is generally limited in its scope to a single event, an outage cause, or a region. Here, we provide one of the most comprehensive analyses of U.S. power outages for 2002--2019. We categorized all outage data collected under U.S. federal mandates into four outage causes and computed industry-standard reliab…
▽ More
Prolonged power outages debilitate the economy and threaten public health. Existing research is generally limited in its scope to a single event, an outage cause, or a region. Here, we provide one of the most comprehensive analyses of U.S. power outages for 2002--2019. We categorized all outage data collected under U.S. federal mandates into four outage causes and computed industry-standard reliability metrics. Our spatiotemporal analysis reveals six of the most resilient U.S. states since 2010, improvement of power resilience against natural hazards in the south and northeast regions, and a disproportionately large number of human attacks for its population in the Western Electricity Coordinating Council region. Our regression analysis identifies several statistically significant predictors and hypotheses for power resilience. Furthermore, we propose a novel framework for analyzing outage data using differential weighting and influential points to better understand power resilience. We share curated data and code as Supplementary Materials.
△ Less
Submitted 20 July, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems
Authors:
Aman Sinha,
Matthew O'Kelly,
Russ Tedrake,
John Duchi
Abstract:
Learning-based methodologies increasingly find applications in safety-critical domains like autonomous driving and medical robotics. Due to the rare nature of dangerous events, real-world testing is prohibitively expensive and unscalable. In this work, we employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events. W…
▽ More
Learning-based methodologies increasingly find applications in safety-critical domains like autonomous driving and medical robotics. Due to the rare nature of dangerous events, real-world testing is prohibitively expensive and unscalable. In this work, we employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events. We develop a novel rare-event simulation method that combines exploration, exploitation, and optimization techniques to find failure modes and estimate their rate of occurrence. We provide rigorous guarantees for the performance of our method in terms of both statistical and computational efficiency. Finally, we demonstrate the efficacy of our approach on a variety of scenarios, illustrating its usefulness as a tool for rapid sensitivity analysis and model comparison that are essential to developing and testing safety-critical autonomous systems.
△ Less
Submitted 8 August, 2021; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Using LSTM for the Prediction of Disruption in ADITYA Tokamak
Authors:
Aman Agarwal,
Aditya Mishra,
Priyanka Sharma,
Swati Jain,
Sutapa Ranjan,
Ranjana Manchanda
Abstract:
Major disruptions in tokamak pose a serious threat to the vessel and its surrounding pieces of equipment. The ability of the systems to detect any behavior that can lead to disruption can help in alerting the system beforehand and prevent its harmful effects. Many machine learning techniques have already been in use at large tokamaks like JET and ASDEX, but are not suitable for ADITYA, which is co…
▽ More
Major disruptions in tokamak pose a serious threat to the vessel and its surrounding pieces of equipment. The ability of the systems to detect any behavior that can lead to disruption can help in alerting the system beforehand and prevent its harmful effects. Many machine learning techniques have already been in use at large tokamaks like JET and ASDEX, but are not suitable for ADITYA, which is comparatively small. Through this work, we discuss a new real-time approach to predict the time of disruption in ADITYA tokamak and validate the results on an experimental dataset. The system uses selected diagnostics from the tokamak and after some pre-processing steps, sends them to a time-sequence Long Short-Term Memory (LSTM) network. The model can make the predictions 12 ms in advance at less computation cost that is quick enough to be deployed in real-time applications.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Machine-Learning Driven Drug Repurposing for COVID-19
Authors:
Semih Cantürk,
Aman Singh,
Patrick St-Amant,
Jason Behrmann
Abstract:
The integration of machine learning methods into bioinformatics provides particular benefits in identifying how therapeutics effective in one context might have utility in an unknown clinical context or against a novel pathology. We aim to discover the underlying associations between viral proteins and antiviral therapeutics that are effective against them by employing neural network models. Using…
▽ More
The integration of machine learning methods into bioinformatics provides particular benefits in identifying how therapeutics effective in one context might have utility in an unknown clinical context or against a novel pathology. We aim to discover the underlying associations between viral proteins and antiviral therapeutics that are effective against them by employing neural network models. Using the National Center for Biotechnology Information virus protein database and the DrugVirus database, which provides a comprehensive report of broad-spectrum antiviral agents (BSAAs) and viruses they inhibit, we trained ANN models with virus protein sequences as inputs and antiviral agents deemed safe-in-humans as outputs. Model training excluded SARS-CoV-2 proteins and included only Phases II, III, IV and Approved level drugs. Using sequences for SARS-CoV-2 (the coronavirus that causes COVID-19) as inputs to the trained models produces outputs of tentative safe-in-human antiviral candidates for treating COVID-19. Our results suggest multiple drug candidates, some of which complement recent findings from noteworthy clinical studies. Our in-silico approach to drug repurposing has promise in identifying new drug candidates and treatments for other viruses.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Towards classification parity across cohorts
Authors:
Aarsh Patel,
Rahul Gupta,
Mukund Harakere,
Satyapriya Krishna,
Aman Alok,
Peng Liu
Abstract:
Recently, there has been a lot of interest in ensuring algorithmic fairness in machine learning where the central question is how to prevent sensitive information (e.g. knowledge about the ethnic group of an individual) from adding "unfair" bias to a learning algorithm (Feldman et al. (2015), Zemel et al. (2013)). This has led to several debiasing algorithms on word embeddings (Qian et al. (2019)…
▽ More
Recently, there has been a lot of interest in ensuring algorithmic fairness in machine learning where the central question is how to prevent sensitive information (e.g. knowledge about the ethnic group of an individual) from adding "unfair" bias to a learning algorithm (Feldman et al. (2015), Zemel et al. (2013)). This has led to several debiasing algorithms on word embeddings (Qian et al. (2019) , Bolukbasi et al. (2016)), coreference resolution (Zhao et al. (2018a)), semantic role labeling (Zhao et al. (2017)), etc. Most of these existing work deals with explicit sensitive features such as gender, occupations or race which doesn't work with data where such features are not captured due to privacy concerns. In this research work, we aim to achieve classification parity across explicit as well as implicit sensitive features. We define explicit cohorts as groups of people based on explicit sensitive attributes provided in the data (age, gender, race) whereas implicit cohorts are defined as groups of people with similar language usage. We obtain implicit cohorts by clustering embeddings of each individual trained on the language generated by them using a language model. We achieve two primary objectives in this work : [1.] We experimented and discovered classification performance differences across cohorts based on implicit and explicit features , [2] We improved classification parity by introducing modification to the loss function aimed to minimize the range of model performances across cohorts.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis
Authors:
Aman Sinha,
Matthew O'Kelly,
Hongrui Zheng,
Rahul Mangharam,
John Duchi,
Russ Tedrake
Abstract:
Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorith…
▽ More
Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.
△ Less
Submitted 22 August, 2020; v1 submitted 8 March, 2020;
originally announced March 2020.
-
Efficient Black-box Assessment of Autonomous Vehicle Safety
Authors:
Justin Norden,
Matthew O'Kelly,
Aman Sinha
Abstract:
While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulati…
▽ More
While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulation to evaluate AV systems. However, having a simulation stack alone is not a solution. A simulation testing framework needs to prioritize which scenarios to run, learn how the chosen scenarios provide coverage of failure modes, and rank failure scenarios in order of importance. We implement a simulation testing framework that evaluates an entire modern AV system as a black box. This framework estimates the probability of accidents under a base distribution governing standard traffic behavior. In order to accelerate rare-event probability evaluation, we efficiently learn to identify and rank failure scenarios via adaptive importance-sampling methods. Using this framework, we conduct the first independent evaluation of a full-stack commercial AV system, Comma AI's OpenPilot.
△ Less
Submitted 5 June, 2020; v1 submitted 8 December, 2019;
originally announced December 2019.
-
Semiparametric Estimation of Correlated Random Coefficient Models without Instrumental Variables
Authors:
Samuele Centorrino,
Aman Ullah,
Jing Xue
Abstract:
We study a linear random coefficient model where slope parameters may be correlated with some continuous covariates. Such a model specification may occur in empirical research, for instance, when quantifying the effect of a continuous treatment observed at two time periods. We show one can carry identification and estimation without instruments. We propose a semiparametric estimator of average par…
▽ More
We study a linear random coefficient model where slope parameters may be correlated with some continuous covariates. Such a model specification may occur in empirical research, for instance, when quantifying the effect of a continuous treatment observed at two time periods. We show one can carry identification and estimation without instruments. We propose a semiparametric estimator of average partial effects and of average treatment effects on the treated. We showcase the small sample properties of our estimator in an extensive simulation study. Among other things, we reveal that it compares favorably with a control function estimator. We conclude with an application to the effect of malaria eradication on economic development in Colombia.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
SynGAN: Towards Generating Synthetic Network Attacks using GANs
Authors:
Jeremy Charlier,
Aman Singh,
Gaston Ormazabal,
Radu State,
Henning Schulzrinne
Abstract:
The rapid digital transformation without security considerations has resulted in the rise of global-scale cyberattacks. The first line of defense against these attacks are Network Intrusion Detection Systems (NIDS). Once deployed, however, these systems work as blackboxes with a high rate of false positives with no measurable effectiveness. There is a need to continuously test and improve these sy…
▽ More
The rapid digital transformation without security considerations has resulted in the rise of global-scale cyberattacks. The first line of defense against these attacks are Network Intrusion Detection Systems (NIDS). Once deployed, however, these systems work as blackboxes with a high rate of false positives with no measurable effectiveness. There is a need to continuously test and improve these systems by emulating real-world network attack mutations. We present SynGAN, a framework that generates adversarial network attacks using the Generative Adversial Networks (GAN). SynGAN generates malicious packet flow mutations using real attack traffic, which can improve NIDS attack detection rates. As a first step, we compare two public datasets, NSL-KDD and CICIDS2017, for generating synthetic Distributed Denial of Service (DDoS) network attacks. We evaluate the attack quality (real vs. synthetic) using a gradient boosting classifier.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
OmniNet: A unified architecture for multi-modal multi-task learning
Authors:
Subhojeet Pramanik,
Priyanka Agrawal,
Aman Hussain
Abstract:
Transformer is a popularly used neural network architecture, especially for language understanding. We introduce an extended and unified architecture that can be used for tasks involving a variety of modalities like image, text, videos, etc. We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the tempor…
▽ More
Transformer is a popularly used neural network architecture, especially for language understanding. We introduce an extended and unified architecture that can be used for tasks involving a variety of modalities like image, text, videos, etc. We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the temporal input sequence. The proposed architecture further enables a single model to support tasks with multiple input modalities as well as asynchronous multi-task learning, thus we refer to it as OmniNet. For example, a single instance of OmniNet can concurrently learn to perform the tasks of part-of-speech tagging, image captioning, visual question answering and video activity recognition. We demonstrate that training these four tasks together results in about three times compressed model while retaining the performance in comparison to training them individually. We also show that using this neural network pre-trained on some modalities assists in learning unseen tasks such as video captioning and video question answering. This illustrates the generalization capacity of the self-attention mechanism on the spatio-temporal cache present in OmniNet.
△ Less
Submitted 3 July, 2020; v1 submitted 17 July, 2019;
originally announced July 2019.
-
One-vs-All Models for Asynchronous Training: An Empirical Analysis
Authors:
Rahul Gupta,
Aman Alok,
Shankar Ananthakrishnan
Abstract:
Any given classification problem can be modeled using multi-class or One-vs-All (OVA) architecture. An OVA system consists of as many OVA models as the number of classes, providing the advantage of asynchrony, where each OVA model can be re-trained independent of other models. This is particularly advantageous in settings where scalable model training is a consideration (for instance in an industr…
▽ More
Any given classification problem can be modeled using multi-class or One-vs-All (OVA) architecture. An OVA system consists of as many OVA models as the number of classes, providing the advantage of asynchrony, where each OVA model can be re-trained independent of other models. This is particularly advantageous in settings where scalable model training is a consideration (for instance in an industrial environment where multiple and frequent updates need to be made to the classification system). In this paper, we conduct empirical analysis on realizing independent updates to OVA models and its impact on the accuracy of the overall OVA system. Given that asynchronous updates lead to differences in training datasets for OVA models, we first define a metric to quantify the differences in datasets. Thereafter, using Natural Language Understanding as a task of interest, we estimate the impact of three factors: (i) number of classes, (ii) number of data points and, (iii) divergences in training datasets across OVA models; on the OVA system accuracy. Finally, we observe the accuracy impact of increased asynchrony in a Spoken Language Understanding system. We analyze the results and establish that the proposed metric correlates strongly with the model performances in both the experimental settings.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models
Authors:
Aman Verma,
Guido Powell,
Yu Luo,
David Stephens,
David L. Buckeridge
Abstract:
Modeling disease progression in healthcare administrative databases is complicated by the fact that patients are observed only at irregular intervals when they seek healthcare services. In a longitudinal cohort of 76,888 patients with chronic obstructive pulmonary disease (COPD), we used a continuous-time hidden Markov model with a generalized linear model to model healthcare utilization events. W…
▽ More
Modeling disease progression in healthcare administrative databases is complicated by the fact that patients are observed only at irregular intervals when they seek healthcare services. In a longitudinal cohort of 76,888 patients with chronic obstructive pulmonary disease (COPD), we used a continuous-time hidden Markov model with a generalized linear model to model healthcare utilization events. We found that the fitted model provides interpretable results suitable for summarization and hypothesis generation.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
In-silico Risk Analysis of Personalized Artificial Pancreas Controllers via Rare-event Simulation
Authors:
Matthew O'Kelly,
Aman Sinha,
Justin Norden,
Hongseok Namkoong
Abstract:
Modern treatments for Type 1 diabetes (T1D) use devices known as artificial pancreata (APs), which combine an insulin pump with a continuous glucose monitor (CGM) operating in a closed-loop manner to control blood glucose levels. In practice, poor performance of APs (frequent hyper- or hypoglycemic events) is common enough at a population level that many T1D patients modify the algorithms on exist…
▽ More
Modern treatments for Type 1 diabetes (T1D) use devices known as artificial pancreata (APs), which combine an insulin pump with a continuous glucose monitor (CGM) operating in a closed-loop manner to control blood glucose levels. In practice, poor performance of APs (frequent hyper- or hypoglycemic events) is common enough at a population level that many T1D patients modify the algorithms on existing AP systems with unregulated open-source software. Anecdotally, the patients in this group have shown superior outcomes compared with standard of care, yet we do not understand how safe any AP system is since adverse outcomes are rare. In this paper, we construct generative models of individual patients' physiological characteristics and eating behaviors. We then couple these models with a T1D simulator approved for pre-clinical trials by the FDA. Given the ability to simulate patient outcomes in-silico, we utilize techniques from rare-event simulation theory in order to efficiently quantify the performance of a device with respect to a particular patient. We show a 72,000$\times$ speedup in simulation speed over real-time and up to 2-10 times increase in the frequency which we are able to sample adverse conditions relative to standard Monte Carlo sampling. In practice our toolchain enables estimates of the likelihood of hypoglycemic events with approximately an order of magnitude fewer simulations.
△ Less
Submitted 1 December, 2018;
originally announced December 2018.
-
Surrogate-assisted parallel tempering for Bayesian neural learning
Authors:
Rohitash Chandra,
Konark Jain,
Arpit Kapoor,
Ashray Aman
Abstract:
Due to the need for robust uncertainty quantification, Bayesian neural learning has gained attention in the era of deep learning and big data. Markov Chain Monte-Carlo (MCMC) methods typically implement Bayesian inference which faces several challenges given a large number of parameters, complex and multimodal posterior distributions, and computational complexity of large neural network models. Pa…
▽ More
Due to the need for robust uncertainty quantification, Bayesian neural learning has gained attention in the era of deep learning and big data. Markov Chain Monte-Carlo (MCMC) methods typically implement Bayesian inference which faces several challenges given a large number of parameters, complex and multimodal posterior distributions, and computational complexity of large neural network models. Parallel tempering MCMC addresses some of these limitations given that they can sample multimodal posterior distributions and utilize high-performance computing. However, certain challenges remain given large neural network models and big data. Surrogate-assisted optimization features the estimation of an objective function for models which are computationally expensive. In this paper, we address the inefficiency of parallel tempering MCMC for large-scale problems by combining parallel computing features with surrogate assisted likelihood estimation that describes the plausibility of a model parameter value, given specific observed data. Hence, we present surrogate-assisted parallel tempering for Bayesian neural learning for simple to computationally expensive models. Our results demonstrate that the methodology significantly lowers the computational cost while maintaining quality in decision making with Bayesian neural networks. The method has applications for a Bayesian inversion and uncertainty quantification for a broad range of numerical models.
△ Less
Submitted 14 May, 2020; v1 submitted 21 November, 2018;
originally announced November 2018.
-
Machine Learning Algorithms for Classification of Microcirculation Images from Septic and Non-Septic Patients
Authors:
Perikumar Javia,
Aman Rana,
Nathan Shapiro,
Pratik Shah
Abstract:
Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the…
▽ More
Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the receiver operating characteristics of the classifier was 0.92, the precision was 0.92 and the recall was 0.84. Codes representing the learned feature space of trained classifier were visualized using t-SNE embedding and were separable and distinguished between images from critically ill and non-septic patients. Using an unsupervised convolutional autoencoder, independent of the clinical diagnosis, we also report clustering of learned features from a compressed representation associated with healthy images and those with microcirculatory dysfunction. The feature space used by our trained classifier to distinguish between images from septic and non-septic patients has potential diagnostic application.
△ Less
Submitted 20 February, 2019; v1 submitted 24 October, 2018;
originally announced November 2018.
-
Computational Histological Staining and Destaining of Prostate Core Biopsy RGB Images with Generative Adversarial Neural Networks
Authors:
Aman Rana,
Gregory Yauney,
Alarice Lowe,
Pratik Shah
Abstract:
Histopathology tissue samples are widely available in two states: paraffin-embedded unstained and non-paraffin-embedded stained whole slide RGB images (WSRI). Hematoxylin and eosin stain (H&E) is one of the principal stains in histology but suffers from several shortcomings related to tissue preparation, staining protocols, slowness and human error. We report two novel approaches for training mach…
▽ More
Histopathology tissue samples are widely available in two states: paraffin-embedded unstained and non-paraffin-embedded stained whole slide RGB images (WSRI). Hematoxylin and eosin stain (H&E) is one of the principal stains in histology but suffers from several shortcomings related to tissue preparation, staining protocols, slowness and human error. We report two novel approaches for training machine learning models for the computational H&E staining and destaining of prostate core biopsy RGB images. The staining model uses a conditional generative adversarial network that learns hierarchical non-linear mappings between whole slide RGB image (WSRI) pairs of prostate core biopsy before and after H&E staining. The trained staining model can then generate computationally H&E-stained prostate core WSRIs using previously unseen non-stained biopsy images as input. The destaining model, by learning mappings between an H&E stained WSRI and a non-stained WSRI of the same biopsy, can computationally destain previously unseen H&E-stained images. Structural and anatomical details of prostate tissue and colors, shapes, geometries, locations of nuclei, stroma, vessels, glands and other cellular components were generated by both models with structural similarity indices of 0.68 (staining) and 0.84 (destaining). The proposed staining and destaining models can engender computational H&E staining and destaining of WSRI biopsies without additional equipment and devices.
△ Less
Submitted 20 February, 2019; v1 submitted 26 October, 2018;
originally announced November 2018.
-
Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation
Authors:
Matthew O'Kelly,
Aman Sinha,
Hongseok Namkoong,
John Duchi,
Russ Tedrake
Abstract:
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de facto}$ evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation fram…
▽ More
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de facto}$ evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. Using adaptive importance-sampling methods to accelerate rare-event probability evaluation, we estimate the probability of an accident under a base distribution governing standard traffic behavior. We demonstrate our framework on a highway scenario, accelerating system evaluation by $2$-$20$ times over naive Monte Carlo sampling methods and $10$-$300 \mathsf{P}$ times (where $\mathsf{P}$ is the number of processors) over real-world testing.
△ Less
Submitted 12 January, 2019; v1 submitted 31 October, 2018;
originally announced November 2018.
-
Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health
Authors:
Gregory Yauney,
Aman Rana,
Lawrence C. Wong,
Perikumar Javia,
Ali Muftu,
Pratik Shah
Abstract:
Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90…
▽ More
Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank
Authors:
Aman Agarwal,
Ivan Zaitsev,
Thorsten Joachims
Abstract:
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias if observatio…
▽ More
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias if observation propensities are known, it remains to show how to accurately estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. We merely require that we have implicit feedback data from multiple different ranking functions. Furthermore, we argue that our estimation technique applies to an extended class of Contextual Position-Based Propensity Models, where propensities not only depend on position but also on observable features of the query and document. Initial simulation studies confirm that the approach is scalable, accurate, and robust.
△ Less
Submitted 9 June, 2018;
originally announced June 2018.
-
Certifying Some Distributional Robustness with Principled Adversarial Training
Authors:
Aman Sinha,
Hongseok Namkoong,
Riccardo Volpi,
John Duchi
Abstract:
Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserst…
▽ More
Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball, we provide a training procedure that augments model parameter updates with worst-case perturbations of training data. For smooth losses, our procedure provably achieves moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization. Furthermore, our statistical guarantees allow us to efficiently certify robustness for the population loss. For imperceptible perturbations, our method matches or outperforms heuristic approaches.
△ Less
Submitted 1 May, 2020; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Random Projection Trees Revisited
Authors:
Aman Dhesi,
Purushottam Kar
Abstract:
The Random Projection Tree structures proposed in [Freund-Dasgupta STOC08] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPTreeMax and the RPTreeMean data structures. Our result for RPTreeMax gives a near-optimal bound on the number of levels required by this data structure to reduce the siz…
▽ More
The Random Projection Tree structures proposed in [Freund-Dasgupta STOC08] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPTreeMax and the RPTreeMean data structures. Our result for RPTreeMax gives a near-optimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor $s \geq 2$. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPTreeMean adapts to manifold dimension as well.
△ Less
Submitted 20 October, 2010; v1 submitted 19 October, 2010;
originally announced October 2010.