On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
Authors:
Dennis Wei,
Rahul Nair,
Amit Dhurandhar,
Kush R. Varshney,
Elizabeth M. Daly,
Moninder Singh
Abstract:
Interpretable and explainable machine learning has seen a recent surge of interest. We focus on safety as a key motivation behind the surge and make the relationship between interpretability and safety more quantitative. Toward assessing safety, we introduce the concept of maximum deviation via an optimization problem to find the largest deviation of a supervised learning model from a reference mo…
▽ More
Interpretable and explainable machine learning has seen a recent surge of interest. We focus on safety as a key motivation behind the surge and make the relationship between interpretability and safety more quantitative. Toward assessing safety, we introduce the concept of maximum deviation via an optimization problem to find the largest deviation of a supervised learning model from a reference model regarded as safe. We then show how interpretability facilitates this safety assessment. For models including decision trees, generalized linear and additive models, the maximum deviation can be computed exactly and efficiently. For tree ensembles, which are not regarded as interpretable, discrete optimization techniques can still provide informative bounds. For a broader class of piecewise Lipschitz functions, we leverage the multi-armed bandit literature to show that interpretability produces tighter (regret) bounds on the maximum deviation. We present case studies, including one on mortgage approval, to illustrate our methods and the insights about models that may be obtained from deviation maximization.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment
Authors:
Qinmengge Li,
Matthew T. Patrick,
Haihan Zhang,
Chachrit Khunsriraksakul,
Philip E. Stuart,
Johann E. Gudjonsson,
Rajan Nair,
James T. Elder,
Dajiang J. Liu,
Jian Kang,
Lam C. Tsoi,
Kevin He
Abstract:
Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau…
▽ More
Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Caucasian model for the minority population also has limited performance. In addition, due to data privacy, the individual genotype data is not accessible for either the Caucasian population or the minority population. To address these challenges, we propose a Bregman divergence-based estimation procedure to measure and optimally balance the information from different populations. The proposed method only requires the use of encrypted summary statistics and improves the PRS performance for ethnic minority groups by incorporating additional information. We provide the asymptotic consistency and weak oracle property for the proposed method. Simulations and real data analyses also show its advantages in prediction and variable selection.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.