Expert-Augmented Machine Learning
Authors:
E. D. Gennatas,
J. H. Friedman,
L. H. Ungar,
R. Pirracchio,
E. Eaton,
L. Reichman,
Y. Interian,
C. B. Simone,
A. Auerbach,
E. Delgado,
M. J. Van der Laan,
T. D. Solberg,
G. Valdes
Abstract:
Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may…
▽ More
Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability on a different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications.
△ Less
Submitted 5 January, 2021; v1 submitted 22 March, 2019;
originally announced March 2019.
Band-phase-randomized Surrogates to assess nonlinearity in non-stationary time series
Authors:
Diego Guarin,
Edilson Delgado,
Alvaro Orozco
Abstract:
Testing for nonlinearity is one of the most important preprocessing steps in nonlinear time series analysis. Typically, this is done by means of the linear surrogate data methods. But it is a known fact that the validity of the results heavily depends on the stationarity of the time series. Since most physiological signals are non-stationary, it is easy to falsely detect nonlinearity using the lin…
▽ More
Testing for nonlinearity is one of the most important preprocessing steps in nonlinear time series analysis. Typically, this is done by means of the linear surrogate data methods. But it is a known fact that the validity of the results heavily depends on the stationarity of the time series. Since most physiological signals are non-stationary, it is easy to falsely detect nonlinearity using the linear surrogate data methods. In this document, we propose a methodology to extend the procedure for generating constrained surrogate time series in order to assess nonlinearity in non-stationary data. The method is based on the band-phase-randomized surrogates, which consists (contrary to the linear surrogate data methods) in randomizing only a portion of the Fourier phases in the high frequency band. Analysis of simulated time series showed that in comparison to the linear surrogate data method, our method is able to discriminate between linear stationarity, linear non-stationary and nonlinear time series. When applying our methodology to heart rate variability (HRV) time series that present spikes and other kinds of nonstationarities, we where able to obtain surrogate time series that look like the data and preserves linear correlations, something that is not possible to do with the existing surrogate data methods.
△ Less
Submitted 31 January, 2011;
originally announced January 2011.