-
Relative Cumulative Residual Information Measure
Authors:
Mary Andrews,
Smitha S,
Sudheesh K. Kattumannil
Abstract:
In this paper, we develop a relative cumulative residual information (RCRI) measure that intends to quantify the divergence between two survival functions. The dynamic relative cumulative residual information (DRCRI) measure is also introduced. We establish some characterization results under the proportional hazards model assumption. Additionally, we obtained the non-parametric estimators of RCRI…
▽ More
In this paper, we develop a relative cumulative residual information (RCRI) measure that intends to quantify the divergence between two survival functions. The dynamic relative cumulative residual information (DRCRI) measure is also introduced. We establish some characterization results under the proportional hazards model assumption. Additionally, we obtained the non-parametric estimators of RCRI and DRCRI measures based on the kernel density type estimator for the survival function. The effectiveness of the estimators are assessed through an extensive Monte Carlo simulation study. We consider the data from the third Gaia data release (Gaia DR3) for demonstrating the use of the proposed measure. For this study, we have collected epoch photometry data for the objects Gaia DR3 4111834567779557376 and Gaia DR3 5090605830056251776.
△ Less
Submitted 19 November, 2024; v1 submitted 30 September, 2024;
originally announced October 2024.
-
A practical guide to causal discovery with cohort data
Authors:
Ryan M. Andrews,
Ronja Foraita,
Vanessa Didelez,
Janine Witte
Abstract:
In this guide, we present how to perform constraint-based causal discovery using three popular software packages: pcalg (with add-ons tpc and micd), bnlearn, and TETRAD. We focus on how these packages can be used with observational data and in the presence of mixed data (i.e., data where some variables are continuous, while others are categorical), a known time ordering between variables, and miss…
▽ More
In this guide, we present how to perform constraint-based causal discovery using three popular software packages: pcalg (with add-ons tpc and micd), bnlearn, and TETRAD. We focus on how these packages can be used with observational data and in the presence of mixed data (i.e., data where some variables are continuous, while others are categorical), a known time ordering between variables, and missing data. Throughout, we point out the relative strengths and limitations of each package, as well as give practical recommendations. We hope this guide helps anyone who is interested in performing constraint-based causal discovery on their data.
△ Less
Submitted 18 December, 2023; v1 submitted 30 August, 2021;
originally announced August 2021.
-
Evolution of Q Values for Deep Q Learning in Stable Baselines
Authors:
Matthew Andrews,
Cemil Dibek,
Karina Palyutina
Abstract:
We investigate the evolution of the Q values for the implementation of Deep Q Learning (DQL) in the Stable Baselines library. Stable Baselines incorporates the latest Reinforcement Learning techniques and achieves superhuman performance in many game environments. However, for some simple non-game environments, the DQL in Stable Baselines can struggle to find the correct actions. In this paper we a…
▽ More
We investigate the evolution of the Q values for the implementation of Deep Q Learning (DQL) in the Stable Baselines library. Stable Baselines incorporates the latest Reinforcement Learning techniques and achieves superhuman performance in many game environments. However, for some simple non-game environments, the DQL in Stable Baselines can struggle to find the correct actions. In this paper we aim to understand the types of environment where this suboptimal behavior can happen, and also investigate the corresponding evolution of the Q values for individual states.
We compare a smart TrafficLight environment (where performance is poor) with the AI Gym FrozenLake environment (where performance is perfect). We observe that DQL struggles with TrafficLight because actions are reversible and hence the Q values in a given state are closer than in FrozenLake. We then investigate the evolution of the Q values using a recent decomposition technique of Achiam et al.. We observe that for TrafficLight, the function approximation error and the complex relationships between the states lead to a situation where some Q values meander far from optimal.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Insights into the "cross-world" independence assumption of causal mediation analysis
Authors:
Ryan M. Andrews,
Vanessa Didelez
Abstract:
Causal mediation analysis is a useful tool for epidemiological research, but it has been criticized for relying on a "cross-world" independence assumption that is empirically difficult to verify and problematic to justify based on background knowledge. In the present article we aim to assist the applied researcher in understanding this assumption. Synthesizing what is known about the cross-world i…
▽ More
Causal mediation analysis is a useful tool for epidemiological research, but it has been criticized for relying on a "cross-world" independence assumption that is empirically difficult to verify and problematic to justify based on background knowledge. In the present article we aim to assist the applied researcher in understanding this assumption. Synthesizing what is known about the cross-world independence assumption, we discuss the relationship between assumptions for causal mediation analyses, causal models, and non-parametric identification of natural direct and indirect effects. In particular we give a practical example of an applied setting where the cross-world independence assumption is violated even without any post-treatment confounding. Further, we review possible alternatives to the cross-world independence assumption, including the use of computation of bounds that avoid the assumption altogether. Finally, we carry out a numerical study in which the cross-world independence assumption is violated to assess the ensuing bias in estimating natural direct and indirect effects. We conclude with recommendations for carrying out causal mediation analyses.
△ Less
Submitted 30 September, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Transformer to CNN: Label-scarce distillation for efficient text classification
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop…
▽ More
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Machine Learning in High Energy Physics Community White Paper
Authors:
Kim Albertsson,
Piero Altoe,
Dustin Anderson,
John Anderson,
Michael Andrews,
Juan Pedro Araque Espinosa,
Adam Aurisano,
Laurent Basara,
Adrian Bevan,
Wahid Bhimji,
Daniele Bonacorsi,
Bjorn Burkle,
Paolo Calafiura,
Mario Campanelli,
Louis Capps,
Federico Carminati,
Stefano Carrazza,
Yi-fan Chen,
Taylor Childers,
Yann Coadou,
Elias Coniavitis,
Kyle Cranmer,
Claire David,
Douglas Davis,
Andrea De Simone
, et al. (103 additional authors not shown)
Abstract:
Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d…
▽ More
Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit.
△ Less
Submitted 16 May, 2019; v1 submitted 8 July, 2018;
originally announced July 2018.