-
Nonparametric Universal Copula Modeling
Authors:
Subhadeep Mukhopadhyay,
Emanuel Parzen
Abstract:
To handle the ubiquitous problem of "dependence learning," copulas are quickly becoming a pervasive tool across a wide range of data-driven disciplines encompassing neuroscience, finance, econometrics, genomics, social science, machine learning, healthcare and many more. Copula (or connection) functions were invented in 1959 by Abe Sklar in response to a query of Maurice Frechet. After 60 years, w…
▽ More
To handle the ubiquitous problem of "dependence learning," copulas are quickly becoming a pervasive tool across a wide range of data-driven disciplines encompassing neuroscience, finance, econometrics, genomics, social science, machine learning, healthcare and many more. Copula (or connection) functions were invented in 1959 by Abe Sklar in response to a query of Maurice Frechet. After 60 years, where do we stand now? This article provides a history of the key developments and offers a unified perspective.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
LP Approach to Statistical Modeling
Authors:
Subhadeep Mukhopadhyay,
Emanuel Parzen
Abstract:
We present an approach to statistical data modeling and exploratory data analysis called `LP Statistical Data Science.' It aims to generalize and unify traditional and novel statistical measures, methods, and exploratory tools. This article outlines fundamental concepts along with real-data examples to illustrate how the `LP Statistical Algorithm' can systematically tackle different varieties of d…
▽ More
We present an approach to statistical data modeling and exploratory data analysis called `LP Statistical Data Science.' It aims to generalize and unify traditional and novel statistical measures, methods, and exploratory tools. This article outlines fundamental concepts along with real-data examples to illustrate how the `LP Statistical Algorithm' can systematically tackle different varieties of data types, data patterns, and data structures under a coherent theoretical framework. A fundamental role is played by specially designed orthonormal basis of a random variable X for linear (Hilbert space theory) representation of a general function of X, such as $\mbox{E}[Y \mid X]$.
△ Less
Submitted 11 May, 2014;
originally announced May 2014.
-
LP Mixed Data Science : Outline of Theory
Authors:
Emanuel Parzen,
Subhadeep Mukhopadhyay
Abstract:
This article presents the theoretical foundation of a new frontier of research-`LP Mixed Data Science'-that simultaneously extends and integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and is applicable to the teaching and training of statistics.
Statistics journals have great difficulty accepting papers unlike those previously publ…
▽ More
This article presents the theoretical foundation of a new frontier of research-`LP Mixed Data Science'-that simultaneously extends and integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and is applicable to the teaching and training of statistics.
Statistics journals have great difficulty accepting papers unlike those previously published. For statisticians with new big ideas a practical strategy is to publish them in many small applied studies which enables one to provide references to work of others. This essay outlines the many concepts, new theory, and important algorithms of our new culture of statistical science called LP MIXED DATA SCIENCE. It provides comprehensive solutions to problems of data analysis and nonparametric modeling of many variables that are continuous or discrete, which does not yet have a large literature. It develops a new modeling approach to nonparametric estimation of the multivariate copula density. We discuss the theory which we believe is very elegant (and can provide a framework for United Statistical Algorithms, for traditional Small Data methods and Big Data methods).
△ Less
Submitted 6 November, 2013; v1 submitted 3 November, 2013;
originally announced November 2013.
-
Nonlinear Time Series Modeling: A Unified Perspective, Algorithm, and Application
Authors:
Subhadeep Mukhopadhyay,
Emanuel Parzen
Abstract:
A new comprehensive approach to nonlinear time series analysis and modeling is developed in the present paper. We introduce novel data-specific mid-distribution based Legendre Polynomial (LP) like nonlinear transformations of the original time series Y(t) that enables us to adapt all the existing stationary linear Gaussian time series modeling strategy and made it applicable for non-Gaussian and n…
▽ More
A new comprehensive approach to nonlinear time series analysis and modeling is developed in the present paper. We introduce novel data-specific mid-distribution based Legendre Polynomial (LP) like nonlinear transformations of the original time series Y(t) that enables us to adapt all the existing stationary linear Gaussian time series modeling strategy and made it applicable for non-Gaussian and nonlinear processes in a robust fashion. The emphasis of the present paper is on empirical time series modeling via the algorithm LPTime. We demonstrate the effectiveness of our theoretical framework using daily S&P 500 return data between Jan/2/1963 - Dec/31/2009. Our proposed LPTime algorithm systematically discovers all the `stylized facts' of the financial time series automatically all at once, which were previously noted by many researchers one at a time.
△ Less
Submitted 23 December, 2017; v1 submitted 2 August, 2013;
originally announced August 2013.
-
United Statistical Algorithm, Small and Big Data: Future OF Statistician
Authors:
Emanuel Parzen,
Subhadeep Mukhopadhyay
Abstract:
This article provides the role of big idea statisticians in future of Big Data Science. We describe the `United Statistical Algorithms' framework for comprehensive unification of traditional and novel statistical methods for modeling Small Data and Big Data, especially mixed data (discrete, continuous).
This article provides the role of big idea statisticians in future of Big Data Science. We describe the `United Statistical Algorithms' framework for comprehensive unification of traditional and novel statistical methods for modeling Small Data and Big Data, especially mixed data (discrete, continuous).
△ Less
Submitted 2 August, 2013;
originally announced August 2013.
-
Modeling, dependence, classification, united statistical science, many cultures
Authors:
Emanuel Parzen,
Subhadeep Mukhopadhyay
Abstract:
Breiman (2001) proposed to statisticians awareness of two cultures: 1. Parametric modeling culture, pioneered by R.A.Fisher and Jerzy Neyman; 2. Algorithmic predictive culture, pioneered by machine learning research.
Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of many cultures, including the focus of our research: 3. Nonparametric, quantile based, in…
▽ More
Breiman (2001) proposed to statisticians awareness of two cultures: 1. Parametric modeling culture, pioneered by R.A.Fisher and Jerzy Neyman; 2. Algorithmic predictive culture, pioneered by machine learning research.
Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of many cultures, including the focus of our research: 3. Nonparametric, quantile based, information theoretic modeling. We provide a unification of many statistical methods for traditional small data sets and emerging big data sets in terms of comparison density, copula density, measure of dependence, correlation, information, new measures (called LP score comoments) that apply to long tailed distributions with out finite second order moments. A very important goal is to unify methods for discrete and continuous random variables. Our research extends these methods to modern high dimensional data modeling.
△ Less
Submitted 23 April, 2012; v1 submitted 20 April, 2012;
originally announced April 2012.
-
Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation
Authors:
S. Mukhopadhyay,
Emanuel Parzen,
S. N. Lahiri
Abstract:
This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not i…
▽ More
This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not important for classification, then it will have similar distributional aspect under different classes. This simple and straightforward observation motivates us to quantify `How and Why' the distribution of a variable changes over classes through CR-statistic. The second contribution of our paper is to develop and investigate the FDR based thresholding technology from a completely new point of view for adaptive thresholding, which leads to a elegant algorithm called CDfdr. This paper attempts to show how all of these problems of detection, extraction and interpretation for interesting variables can be treated in a unified way under one broad general theme - comparison analysis. It is proposed that a key to accomplishing this unification is to think in terms of the quantile function and the comparison density. We illustrate and demonstrate the power of our methodology using three real data sets.
△ Less
Submitted 14 December, 2011;
originally announced December 2011.