-
Multivariate probability distribution for categorical and ordinal random variables
Authors:
Takashi Arai
Abstract:
We propose a multivariate probability distribution for categorical and ordinal random variables. To this end, we use the Grassmann distribution in conjunction with dummy encoding of categorical and ordinal variables. To realize the co-occurrence probabilities of dummy variables required for categorical and ordinal variables, we propose a parsimonious parameterization for the Grassmann distribution…
▽ More
We propose a multivariate probability distribution for categorical and ordinal random variables. To this end, we use the Grassmann distribution in conjunction with dummy encoding of categorical and ordinal variables. To realize the co-occurrence probabilities of dummy variables required for categorical and ordinal variables, we propose a parsimonious parameterization for the Grassmann distribution that ensures the positivity of probability distribution. As an application of the proposed distribution, we develop a factor analysis for categorical and ordinal variables and show the validity of the model using a real dataset.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Factor analysis for a mixture of continuous and binary random variables
Authors:
Takashi Arai
Abstract:
We propose a multivariate probability distribution that models a linear correlation between binary and continuous variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution. As an application of the proposed distribution, we develop a factor analysis for a mixture of continuous and binary variables. We also discuss improper solutions as…
▽ More
We propose a multivariate probability distribution that models a linear correlation between binary and continuous variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution. As an application of the proposed distribution, we develop a factor analysis for a mixture of continuous and binary variables. We also discuss improper solutions associated with factor analysis. As a prescription to avoid improper solutions, we propose a constraint that each row vector of factor loading matrix has the same norm. We numerically validated the proposed factor analysis and norm constraint prescription by analyzing real datasets.
△ Less
Submitted 12 February, 2023; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Multivariate binary probability distribution in the Grassmann formalism
Authors:
Takashi Arai
Abstract:
We propose a probability distribution for multivariate binary random variables. For this purpose, we use the Grassmann number, an anti-commuting number. In our model, the partition function, the central moment, and the marginal and conditional distributions are expressed analytically by the matrix of the parameters analogous to the covariance matrix in the multivariate Gaussian distribution. That…
▽ More
We propose a probability distribution for multivariate binary random variables. For this purpose, we use the Grassmann number, an anti-commuting number. In our model, the partition function, the central moment, and the marginal and conditional distributions are expressed analytically by the matrix of the parameters analogous to the covariance matrix in the multivariate Gaussian distribution. That is, summation over all possible states is not necessary for obtaining the partition function and various expected values, which is a problem with the conventional multivariate Bernoulli distribution. The proposed model has many similarities to the multivariate Gaussian distribution. For example, the marginal and conditional distributions are expressed by the parameter matrix and its inverse matrix, respectively. That is, the inverse matrix expresses a sort of partial correlation. Analytical expressions for the marginal and conditional distributions are also useful in generating random numbers for multivariate binary variables. Hence, we validated the proposed method using synthetic datasets. We observed that the sampling distributions of various statistics are consistent with the theoretical predictions and estimates are consistent and asymptotically normal.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
A new Granger causality measure for eliminating the confounding influence of latent common inputs
Authors:
Takashi Arai
Abstract:
In this paper, we propose a new Granger causality measure which is robust against the confounding influence of latent common inputs. This measure is inspired by partial Granger causality in the literature, and its variant. Using numerical experiments we first show that the test statistics for detecting directed interactions between time series approximately obey the $F$-distributions when there ar…
▽ More
In this paper, we propose a new Granger causality measure which is robust against the confounding influence of latent common inputs. This measure is inspired by partial Granger causality in the literature, and its variant. Using numerical experiments we first show that the test statistics for detecting directed interactions between time series approximately obey the $F$-distributions when there are no interactions. Then, we propose a practical procedure for inferring directed interactions, which is based on the idea of multiple statistical test in situations where the confounding influence of latent common inputs may exist. The results of numerical experiments demonstrate that the proposed method successfully eliminates the influence of latent common inputs while the normal Granger causality method detects spurious interactions due to the influence of the confounder.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
Effectiveness of LSTMs in Predicting Congestive Heart Failure Onset
Authors:
Sunil Mallya,
Marc Overhage,
Navneet Srivastava,
Tatsuya Arai,
Cole Erdman
Abstract:
In this paper we present a Recurrent neural networks (RNN) based architecture that achieves an AUCROC of 0.9147 for predicting the onset of Congestive Heart Failure (CHF) 15 months in advance using a 12-month observation window on a large cohort of 216,394 patients. We believe this to be the largest study in CHF onset prediction with respect to the number of CHF case patients in the cohort and the…
▽ More
In this paper we present a Recurrent neural networks (RNN) based architecture that achieves an AUCROC of 0.9147 for predicting the onset of Congestive Heart Failure (CHF) 15 months in advance using a 12-month observation window on a large cohort of 216,394 patients. We believe this to be the largest study in CHF onset prediction with respect to the number of CHF case patients in the cohort and the test set (3,332 CHF patients) on which the AUC metrics are reported. We explore the extent to which LSTM (Long Short Term Memory) based model, a variant of RNNs, can accurately predict the onset of CHF when compared to known linear baselines like Logistic Regression, Random Forests and deep learning based models such as Multi-Layer Perceptron and Convolutional Neural Networks. We utilize demographics, medical diagnosis and procedure data from 21,405 CHF and 194,989 control patients to as our features. We describe our feature embedding strategy for medical diagnosis codes that accommodates the sparse, irregular, longitudinal, and high-dimensional characteristics of EHR data. We empirically show that LSTMs can capture the longitudinal aspects of EHR data better than the proposed baselines. As an attempt to interpret the model, we present a temporal data analysis-based technique on false positives to attribute feature importance. A model capable of predicting the onset of congestive heart failure months in the future with this level of accuracy and precision can support efforts of practitioners to implement risk factor reduction strategies and researchers to begin to systematically evaluate interventions to potentially delay or avert development of the disease with high mortality, morbidity and significant costs.
△ Less
Submitted 13 February, 2019; v1 submitted 6 February, 2019;
originally announced February 2019.