Search | arXiv e-print repository

Exploring higher-order neural network node interactions with total correlation

Authors: Thomas Kerby, Teresa White, Kevin Moon

Abstract: In domains such as ecological systems, collaborations, and the human brain the variables interact in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult problem that is further exacerbated when the HOIs change across the data. To solve this problem we propose a new method called Local Correlation Explanation (CorEx) to capture HOIs at a local scale… ▽ More In domains such as ecological systems, collaborations, and the human brain the variables interact in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult problem that is further exacerbated when the HOIs change across the data. To solve this problem we propose a new method called Local Correlation Explanation (CorEx) to capture HOIs at a local scale by first clustering data points based on their proximity on the data manifold. We then use a multivariate version of the mutual information called the total correlation, to construct a latent factor representation of the data within each cluster to learn the local HOIs. We use Local CorEx to explore HOIs in synthetic and real world data to extract hidden insights about the data structure. Lastly, we demonstrate Local CorEx's suitability to explore and interpret the inner workings of trained neural networks. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2206.08463 [pdf, other]

Estimating the lifetime risk of a false positive screening test result

Authors: Tim White, Sara Algeri

Abstract: False positive results in screening tests have potentially severe psychological, medical, and financial consequences for the recipient. However, there have been few efforts to quantify how the risk of a false positive accumulates over time. We seek to fill this gap by estimating the probability that an individual who adheres to the U.S. Preventive Services Task Force (USPSTF) screening guidelines… ▽ More False positive results in screening tests have potentially severe psychological, medical, and financial consequences for the recipient. However, there have been few efforts to quantify how the risk of a false positive accumulates over time. We seek to fill this gap by estimating the probability that an individual who adheres to the U.S. Preventive Services Task Force (USPSTF) screening guidelines will receive at least one false positive in a lifetime. To do so, we assembled a data set of 116 studies cited by the USPSTF that report the number of true positives, false negatives, true negatives, and false positives for the primary screening procedure for one of five cancers or six sexually transmitted diseases. We use these data to estimate the probability that an individual in one of 14 demographic subpopulations will receive at least one false positive for one of these eleven diseases in a lifetime. We specify a suitable statistical model to account for the hierarchical structure of the data, and we use the parametric bootstrap to quantify the uncertainty surrounding our estimates. The estimated probability of receiving at least one false positive in a lifetime is 85.5% ($\pm$0.9%) and 38.9% ($\pm$3.6%) for baseline groups of women and men, respectively. It is higher for subpopulations recommended to screen more frequently than the baseline, including more vulnerable groups such as pregnant women and men who have sex with men. Since screening technology is imperfect, false positives remain inevitable. The high lifetime risk of a false positive reveals the importance of educating patients about this phenomenon. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 18 pages, 1 figure, 3 tables, 1 ancillary file

arXiv:1906.00564 [pdf, other]

C2P2: A Collective Cryptocurrency Up/Down Price Prediction Engine

Authors: Chongyang Bai, Tommy White, Linda Xiao, V. S. Subrahmanian, Ziheng Zhou

Abstract: We study the problem of predicting whether the price of the 21 most popular cryptocurrencies (according to coinmarketcap.com) will go up or down on day d, using data up to day d-1. Our C2P2 algorithm is the first algorithm to consider the fact that the price of a cryptocurrency c might depend not only on historical prices, sentiments, global stock indices, but also on the prices and predicted pric… ▽ More We study the problem of predicting whether the price of the 21 most popular cryptocurrencies (according to coinmarketcap.com) will go up or down on day d, using data up to day d-1. Our C2P2 algorithm is the first algorithm to consider the fact that the price of a cryptocurrency c might depend not only on historical prices, sentiments, global stock indices, but also on the prices and predicted prices of other cryptocurrencies. C2P2 therefore does not predict cryptocurrency prices one coin at a time --- rather it uses similarity metrics in conjunction with collective classification to compare multiple cryptocurrency features to jointly predict the cryptocurrency prices for all 21 coins considered. We show that our C2P2 algorithm beats out a recent competing 2017 paper by margins varying from 5.1-83% and another Bitcoin-specific prediction paper from 2018 by 16%. In both cases, C2P2 is the winner on all cryptocurrencies considered. Moreover, we experimentally show that the use of similarity metrics within our C2P2 algorithm leads to a direct improvement for 20 out of 21 cryptocurrencies ranging from 0.4% to 17.8%. Without the similarity component, C2P2 still beats competitors on 20 out of 21 cryptocurrencies considered. We show that all these results are statistically significant via a Student's t-test with p<1e-5. Check our demo at https://www.cs.dartmouth.edu/dsail/demos/c2p2 △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: IEEE Blockchain-2019

arXiv:1707.09675 [pdf]

On Designing of a Low Leakage Patient-Centric Provider Network

Authors: Yuchen Zheng, Kun Lin, Thomas White, Jeremy Pickereign, Gigi Yuen-Reed

Abstract: When a patient in a provider network seeks services outside of their community, the community experiences a leakage. Leakage is undesirable as it typically leads to higher out-of-network cost for patient and increases barrier for care coordination, which is particularly problematic for Accountable Care Organization (ACO) as the in-network providers are financially responsible for patient quality a… ▽ More When a patient in a provider network seeks services outside of their community, the community experiences a leakage. Leakage is undesirable as it typically leads to higher out-of-network cost for patient and increases barrier for care coordination, which is particularly problematic for Accountable Care Organization (ACO) as the in-network providers are financially responsible for patient quality and outcome. We aim to design a data-driven method to identify naturally occurring provider networks driven by diabetic patient choices, and understand the relationship among provider composition, patient composition, and service leakage pattern. We construct a healthcare provider network based on patients' historical medical insurance claims. A community detection algorithm is used to identify naturally occurring communities of collaborating providers. Finally, import-export analysis is conducted to benchmark their leakage pattern and identify further leakage reduction opportunity. The design yields six major provider communities with diverse profiles. Some communities are geographically concentrated, while others tend to draw patients with certain diabetic co-morbidities. Providers from the same healthcare institution are likely to be assigned to the same community. While most communities have high within-community utilization and spending, at 85% and 86% respectively, leakage still persists. Hence, we utilize a metric from import-export analysis to detect leakage, gaining insight on how to minimizing leakage. In conclusion, we identify patient-driven provider organization by surfacing providers who share a large number of patients. By analyzing the import-export behavior of each identified community using a novel approach and profiling community patient and provider composition we understand the key features of having a balanced number of PCP and specialists and provider heterogeneity. △ Less

Submitted 30 July, 2017; originally announced July 2017.

arXiv:1609.04468 [pdf, other]

Sampling Generative Networks

Authors: Tom White

Abstract: We introduce several techniques for sampling and visualizing the latent spaces of generative models. Replacing linear interpolation with spherical linear interpolation prevents diverging from a model's prior distribution and produces sharper samples. J-Diagrams and MINE grids are introduced as visualizations of manifolds created by analogies and nearest neighbors. We demonstrate two new techniques… ▽ More We introduce several techniques for sampling and visualizing the latent spaces of generative models. Replacing linear interpolation with spherical linear interpolation prevents diverging from a model's prior distribution and produces sharper samples. J-Diagrams and MINE grids are introduced as visualizations of manifolds created by analogies and nearest neighbors. We demonstrate two new techniques for deriving attribute vectors: bias-corrected vectors with data replication and synthetic vectors with data augmentation. Binary classification using attribute vectors is presented as a technique supporting quantitative analysis of the latent space. Most techniques are intended to be independent of model type and examples are shown on both Variational Autoencoders and Generative Adversarial Networks. △ Less

Submitted 6 December, 2016; v1 submitted 14 September, 2016; originally announced September 2016.

Comments: 11 pages, 11 figures

Showing 1–5 of 5 results for author: White, T