Skip to main content

Showing 1–46 of 46 results for author: Subhadeep

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.21845  [pdf, ps, other

    stat.ML cs.LG cs.SI stat.ME

    Spectral clustering for dependent community Hawkes process models of temporal networks

    Authors: Lingfei Zhao, Hadeel Soliman, Kevin S. Xu, Subhadeep Paul

    Abstract: Temporal networks observed continuously over time through timestamped relational events data are commonly encountered in application settings including online social media communications, financial transactions, and international relations. Temporal networks often exhibit community structure and strong dependence patterns among node pairs. This dependence can be modeled through mutual excitations,… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  2. arXiv:2412.18081  [pdf, other

    stat.ML cs.LG

    Heterogeneous transfer learning for high dimensional regression with feature mismatch

    Authors: Jae Ho Chang, Massimiliano Russo, Subhadeep Paul

    Abstract: We consider the problem of transferring knowledge from a source, or proxy, domain to a new target domain for learning a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However, most homogeneous transfer and multi-task learning methods assume that the target and proxy domains have the s… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  3. arXiv:2409.19400  [pdf, other

    stat.ME stat.ML

    The co-varying ties between networks and item responses via latent variables

    Authors: Selena Wang, Plamena Powla, Tracy Sweet, Subhadeep Paul

    Abstract: Relationships among teachers are known to influence their teaching-related perceptions. We study whether and how teachers' advising relationships (networks) are related to their perceptions of satisfaction, students, and influence over educational policies, recorded as their responses to a questionnaire (item responses). We propose a novel joint model of network and item responses (JNIRM) with cor… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  4. arXiv:2406.05944  [pdf, other

    stat.ME math.ST

    Embedding Network Autoregression for time series analysis and causal peer effect inference

    Authors: Jae Ho Chang, Subhadeep Paul

    Abstract: We propose an Embedding Network Autoregressive Model for multivariate networked longitudinal data. We assume the network is generated from a latent variable model, and these unobserved variables are included in a structural peer effect model or a time series network autoregressive model as additive effects. This approach takes a unified view of two related yet fundamentally different problems: (1)… ▽ More

    Submitted 23 March, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

  5. arXiv:2402.04593  [pdf, other

    stat.ME math.ST

    Spatial autoregressive model with measurement error in covariates

    Authors: Subhadeep Paul, Shanjukta Nath

    Abstract: The Spatial AutoRegressive model (SAR) is commonly used in studies involving spatial and network data to estimate the spatial or network peer influence and the effects of covariates on the response, taking into account the dependence among units. While the model can be efficiently estimated with a Quasi maximum likelihood approach (QMLE), the detrimental effect of covariate measurement error on th… ▽ More

    Submitted 6 August, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  6. arXiv:2205.09263  [pdf, other

    cs.LG cs.SI stat.ML

    A Mutually Exciting Latent Space Hawkes Process Model for Continuous-time Networks

    Authors: Zhipeng Huang, Hadeel Soliman, Subhadeep Paul, Kevin S. Xu

    Abstract: Networks and temporal point processes serve as fundamental building blocks for modeling complex dynamic relational data in various domains. We propose the latent space Hawkes (LSH) model, a novel generative model for continuous-time networks of relational events, using a latent space representation for nodes. We model relational events between nodes using mutually exciting Hawkes processes with ba… ▽ More

    Submitted 6 July, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: To appear in UAI 2022. Code available at https://github.com/IdeasLabUT/Latent-Space-Hawkes

  7. arXiv:2205.00639  [pdf, other

    stat.ME cs.LG cs.SI stat.ML

    The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks

    Authors: Hadeel Soliman, Lingfei Zhao, Zhipeng Huang, Subhadeep Paul, Kevin S. Xu

    Abstract: The stochastic block model (SBM) is one of the most widely used generative models for network data. Many continuous-time dynamic network models are built upon the same assumption as the SBM: edges or events between all pairs of nodes are conditionally independent given the block or community memberships, which prevents them from reproducing higher-order motifs such as triangles that are commonly o… ▽ More

    Submitted 6 July, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: To appear at ICML 2022. Code available at https://github.com/IdeasLabUT/Multivariate-Community-Hawkes

  8. arXiv:2203.14223  [pdf, other

    stat.ME stat.ML

    Identifying Peer Influence in Therapeutic Communities Adjusting for Latent Homophily

    Authors: Shanjukta Nath, Keith Warren, Subhadeep Paul

    Abstract: We investigate peer role model influence on successful graduation from Therapeutic Communities (TCs) for substance abuse and criminal behavior. We use data from 3 TCs that kept records of exchanges of affirmations among residents and their precise entry and exit dates, allowing us to form peer networks and define a causal effect of interest. The role model effect measures the difference in the exp… ▽ More

    Submitted 10 June, 2024; v1 submitted 27 March, 2022; originally announced March 2022.

  9. arXiv:2203.03040  [pdf, other

    econ.EM stat.ME

    Modelplasticity and Abductive Decision Making

    Authors: Subhadeep, Mukhopadhyay

    Abstract: `All models are wrong but some are useful' (George Box 1979). But, how to find those useful ones starting from an imperfect model? How to make informed data-driven decisions equipped with an imperfect model? These fundamental questions appear to be pervasive in virtually all empirical fields -- including economics, finance, marketing, healthcare, climate change, defense planning, and operations re… ▽ More

    Submitted 7 March, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: Final accepted version. The supplementary section contains some notes on the connections and differences between the Bayesian statistical approach vs. the Abductive statistical approach to model misspecification, robustness, and decision-making

  10. arXiv:2111.08054  [pdf, other

    econ.EM cs.AI stat.ME

    Abductive Inference and C. S. Peirce: 150 Years Later

    Authors: Deep Mukhopadhyay

    Abstract: This paper is about two things: (i) Charles Sanders Peirce (1837-1914) -- an iconoclastic philosopher and polymath who is among the greatest of American minds. (ii) Abductive inference -- a term coined by C. S. Peirce, which he defined as "the process of forming explanatory hypotheses. It is the only logical operation which introduces any new idea." Abductive inference and quantitative economics… ▽ More

    Submitted 2 February, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Final accepted version

  11. arXiv:2108.09438  [pdf, other

    math.ST econ.EM stat.ME

    A Maximum Entropy Copula Model for Mixed Data: Representation, Estimation, and Applications

    Authors: Subhadeep, Mukhopadhyay

    Abstract: A new nonparametric model of maximum-entropy (MaxEnt) copula density function is proposed, which offers the following advantages: (i) it is valid for mixed random vector. By `mixed' we mean the method works for any combination of discrete or continuous variables in a fully automated manner; (ii) it yields a bonafide density estimate with intepretable parameters. By `bonafide' we mean the estimate… ▽ More

    Submitted 22 August, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

    Comments: Revised and accepted version. Dedication: This paper is dedicated to E. T. Jaynes, the originator of the Maximum Entropy Principle, for his birth centenary. And to the memory of Leo Goodman, a transformative legend of Categorical Data Analysis. This paper is inspired in part to demonstrate how these two modeling philosophies can be connected and united in some ways

  12. arXiv:2108.07380  [pdf, other

    stat.ML cs.AI cs.LG econ.EM

    InfoGram and Admissible Machine Learning

    Authors: Subhadeep Mukhopadhyay

    Abstract: We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible mac… ▽ More

    Submitted 19 August, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Keywords: Admissible machine learning; InfoGram; L-Features; Information-theory; ALFA-testing, Algorithmic risk management; Fairness; Interpretability; COREml; FINEml

  13. arXiv:2108.07372  [pdf, other

    stat.ME econ.EM math.ST stat.AP

    Density Sharpening: Principles and Applications to Discrete Data Analysis

    Authors: Subhadeep Mukhopadhyay

    Abstract: This article introduces a general statistical modeling principle called "Density Sharpening" and applies it to the analysis of discrete count data. The underlying foundation is based on a new theory of nonparametric approximation and smoothing methods for discrete distributions which play a useful role in explaining and uniting a large class of applied statistical methods. The proposed modeling fr… ▽ More

    Submitted 21 August, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Keywords: Density sharpening principle; LP-Fourier analysis; Explanatory goodness-of-fit; Jaynes' dice problem; Compressive chi-square; Data-efficient learning

  14. arXiv:2103.08035  [pdf, other

    stat.ME stat.AP

    Testing for the Network Small-World Property

    Authors: Kartik Lovekar, Srijan Sengupta, Subhadeep Paul

    Abstract: Researchers have long observed that the ``small-world" property, which combines the concepts of high transitivity or clustering with a low average path length, is ubiquitous for networks obtained from a variety of disciplines, including social sciences, biology, neuroscience, and ecology. However, we find several shortcomings of the currently prevalent definition and detection methods rendering th… ▽ More

    Submitted 8 October, 2024; v1 submitted 14 March, 2021; originally announced March 2021.

  15. arXiv:2006.07405  [pdf, other

    cs.LG cs.DC stat.ML

    O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

    Authors: Subhadeep Bhattacharya, Weikuan Yu, Fahim Tahmid Chowdhury

    Abstract: Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters. Many sparsification and quantization techniques have been proposed to compress the gradients, some reducing the communication complexity to O(k), where k << n. In this paper, we introduce a strategy cal… ▽ More

    Submitted 15 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

  16. arXiv:2005.13596  [pdf, other

    stat.ML cs.AI cs.LG econ.EM stat.ME

    Breiman's "Two Cultures" Revisited and Reconciled

    Authors: Subhadeep, Mukhopadhyay, Kaijun Wang

    Abstract: In a landmark paper published in 2001, Leo Breiman described the tense standoff between two cultures of data modeling: parametric statistical and algorithmic machine learning. The cultural division between these two statistical learning frameworks has been growing at a steady pace in recent years. What is the way forward? It has become blatantly obvious that this widening gap between "the two cult… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: This paper celebrates the 70th anniversary of Statistical Machine Learning--- how far we've come, and how far we have to go. Keywords: Integrated statistical learning theory, Exploratory machine learning, Uncertainty prediction machine, ML-powered modern applied statistics, Information theory

  17. arXiv:2004.09588  [pdf, other

    stat.ME math.ST stat.ML

    On The Problem of Relevance in Statistical Inference

    Authors: Subhadeep Mukhopadhyay, Kaijun Wang

    Abstract: This paper is dedicated to the "50 Years of the Relevance Problem" - a long-neglected topic that begs attention from practical statisticians who are concerned with the problem of drawing inference from large-scale heterogeneous data.

    Submitted 4 May, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: Revised (much-improved) version. The procedure (including all the datasets) is implemented in the R-package LPRelevance

  18. arXiv:1912.05503  [pdf, other

    stat.ME stat.AP stat.CO

    Nonparametric Universal Copula Modeling

    Authors: Subhadeep Mukhopadhyay, Emanuel Parzen

    Abstract: To handle the ubiquitous problem of "dependence learning," copulas are quickly becoming a pervasive tool across a wide range of data-driven disciplines encompassing neuroscience, finance, econometrics, genomics, social science, machine learning, healthcare and many more. Copula (or connection) functions were invented in 1959 by Abe Sklar in response to a query of Maurice Frechet. After 60 years, w… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: A perspective on "60 years of Copula"

  19. arXiv:1912.00846  [pdf, other

    cs.LG cs.CL cs.SD stat.ML

    Attentive Modality Hopping Mechanism for Speech Emotion Recognition

    Authors: Seunghyun Yoon, Subhadeep Dey, Hwanhee Lee, Kyomin Jung

    Abstract: In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system. The traditional approaches tackle this task by fusing the knowledge from the various modalities independently for performing emotion classification. In contrast to these approaches, we tackle the problem by introducing an attention mechanism to combine t… ▽ More

    Submitted 22 April, 2020; v1 submitted 29 November, 2019; originally announced December 2019.

    Comments: 5 pages, Accepted as a conference paper at ICASSP 2020

  20. arXiv:1910.12128  [pdf, other

    stat.AP stat.ME

    Joint Latent Space Model for Social Networks with Multivariate Attributes

    Authors: Selena Shuo Wang, Subhadeep Paul, Paul De Boeck

    Abstract: In many application problems in social, behavioral, and economic sciences, researchers often have data on a social network among a group of individuals along with high dimensional multivariate measurements for each individual. To analyze such networked data structures, we propose a joint Attribute and Person Latent Space Model (APLSM) that summarizes information from the social network and the mul… ▽ More

    Submitted 1 February, 2021; v1 submitted 26 October, 2019; originally announced October 2019.

    Comments: A previous version of this paper (version 1) used a different application problem and dataset, and also had a slightly different title

  21. arXiv:1908.06940  [pdf, other

    cs.SI cs.LG physics.soc-ph stat.ME stat.ML

    CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation

    Authors: Makan Arastuie, Subhadeep Paul, Kevin S. Xu

    Abstract: In many application settings involving networks, such as messages between users of an on-line social network or transactions between traders in financial markets, the observed data consist of timestamped relational events, which form a continuous-time network. We propose the Community Hawkes Independent Pairs (CHIP) generative model for such networks. We show that applying spectral clustering to a… ▽ More

    Submitted 10 November, 2020; v1 submitted 19 August, 2019; originally announced August 2019.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. Source code is available at https://github.com/IdeasLabUT/CHIP-Network-Model

  22. arXiv:1908.05227  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

    Authors: Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt

    Abstract: In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data selection mechanism to obtain the best hypothesized output, further used to retrain the seed model.… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

    Comments: Interspeech 2019

    MSC Class: 62H30

  23. arXiv:1901.07090  [pdf, other

    math.ST stat.CO

    Spectral Graph Analysis: A Unified Explanation and Modern Perspectives

    Authors: Subhadeep Mukhopadhyay, Kaijun Wang

    Abstract: Complex networks or graphs are ubiquitous in sciences and engineering: biological networks, brain networks, transportation networks, social networks, and the World Wide Web, to name a few. Spectral graph theory provides a set of useful techniques and models for understanding `patterns of interconnectedness' in a graph. Our prime focus in this paper is on the following question: Is there a unified… ▽ More

    Submitted 21 January, 2019; originally announced January 2019.

    Comments: The first draft of the paper was written in June 2015

  24. arXiv:1812.06515  [pdf, other

    stat.ML cs.LG

    Higher-Order Spectral Clustering under Superimposed Stochastic Block Model

    Authors: Subhadeep Paul, Olgica Milenkovic, Yuguo Chen

    Abstract: Higher-order motif structures and multi-vertex interactions are becoming increasingly important in studies that aim to improve our understanding of functionalities and evolution patterns of networks. To elucidate the role of higher-order structures in community detection problems over complex networks, we introduce the notion of a Superimposed Stochastic Block Model (SupSBM). The model is based on… ▽ More

    Submitted 16 December, 2018; originally announced December 2018.

  25. arXiv:1810.01724  [pdf, other

    stat.ME math.ST stat.ML

    A Nonparametric Approach to High-dimensional k-sample Comparison Problems

    Authors: Subhadeep, Mukhopadhyay, Kaijun Wang

    Abstract: High-dimensional k-sample comparison is a common applied problem. We construct a class of easy-to-implement nonparametric distribution-free tests based on new tools and unexplored connections with spectral graph theory. The test is shown to possess various desirable properties along with a characteristic exploratory flavor that has practical consequences. The numerical examples show that our metho… ▽ More

    Submitted 8 August, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: Biometrika (in press)

  26. arXiv:1805.02292  [pdf, other

    stat.AP

    A random effects stochastic block model for joint community detection in multiple networks with applications to neuroimaging

    Authors: Subhadeep Paul, Yuguo Chen

    Abstract: Motivated by multi-subject experiments in neuroimaging studies, we develop a modeling framework for joint community detection in a group of related networks, which can be considered as a sample from a population of networks. The proposed random effects stochastic block model facilitates the study of group differences and subject-specific variations in the community structure. The model proposes a… ▽ More

    Submitted 21 March, 2020; v1 submitted 6 May, 2018; originally announced May 2018.

  27. arXiv:1805.02075  [pdf, other

    stat.ME stat.CO

    Decentralized Nonparametric Multiple Testing

    Authors: Subhadeep Mukhopadhyay

    Abstract: Consider a big data multiple testing task, where, due to storage and computational bottlenecks, one is given a very large collection of p-values by splitting into manageable chunks and distributing over thousands of computer nodes. This paper is concerned with the following question: How can we find the full data multiple testing solution by operating completely independently on individual machine… ▽ More

    Submitted 5 May, 2018; originally announced May 2018.

    Comments: Revised version

  28. arXiv:1804.04640  [pdf, other

    stat.ML cs.LG

    Fast Counting in Machine Learning Applications

    Authors: Subhadeep Karan, Matthew Eichhorn, Blake Hurlburt, Grant Iraci, Jaroslaw Zola

    Abstract: We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and… ▽ More

    Submitted 7 January, 2019; v1 submitted 12 April, 2018; originally announced April 2018.

  29. arXiv:1802.00474  [pdf, other

    stat.ME math.ST stat.ML

    Bayesian Modeling via Goodness-of-fit

    Authors: Subhadeep, Mukhopadhyay, Douglas Fletcher

    Abstract: The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis workflow that is more effective than either of the two separately. In this paper, we propose the idea of "Bayes via goodn… ▽ More

    Submitted 16 April, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

    Comments: Revised version

    MSC Class: 62F15; 62G07; 62G05

  30. arXiv:1708.04098  [pdf, ps, other

    stat.OT

    Statistics Educational Challenge in the 21st Century

    Authors: Subhadeep Mukhopadhyay

    Abstract: What do we teach and what should we teach? An honest answer to this question is painful, very painful--what we teach lags decades behind what we practice. How can we reduce this `gap' to prepare a data science workforce of trained next-generation statisticians? This is a challenging open problem that requires many well-thought-out experiments before finding the secret sauce. My goal in this articl… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

    Comments: Invited Opinion Article

  31. arXiv:1704.07353  [pdf, other

    stat.ML

    Spectral and matrix factorization methods for consistent community detection in multi-layer networks

    Authors: Subhadeep Paul, Yuguo Chen

    Abstract: We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these "intermediate fusion" methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix… ▽ More

    Submitted 3 December, 2018; v1 submitted 24 April, 2017; originally announced April 2017.

  32. arXiv:1608.00623  [pdf, other

    stat.ME cs.SI physics.soc-ph

    Null Models and Community Detection in Multi-Layer Networks

    Authors: Subhadeep Paul, Yuguo Chen

    Abstract: Multi-layer networks are networks on a set of entities (nodes) with multiple types of relations (edges) among them where each type of relation/interaction is represented as a network layer. As with single layer networks, community detection is an important task in multi-layer networks. A large group of popular community detection methods in networks are based on optimizing a quality function known… ▽ More

    Submitted 9 December, 2020; v1 submitted 1 August, 2016; originally announced August 2016.

  33. arXiv:1605.05349  [pdf, other

    stat.ML

    Orthogonal symmetric non-negative matrix factorization under the stochastic block model

    Authors: Subhadeep Paul, Yuguo Chen

    Abstract: We present a method based on the orthogonal symmetric non-negative matrix tri-factorization of the normalized Laplacian matrix for community detection in complex networks. While the exact factorization of a given order may not exist and is NP hard to compute, we obtain an approximate factorization by solving an optimization problem. We establish the connection of the factors obtained through the f… ▽ More

    Submitted 17 May, 2016; originally announced May 2016.

    Comments: 35 pages, 3 figures

    MSC Class: 62F12; 62H30; 90B15; 15A23

  34. arXiv:1602.03861  [pdf, other

    math.ST stat.ME stat.ML

    Unified Statistical Theory of Spectral Graph Analysis

    Authors: Subhadeep Mukhopadhyay

    Abstract: The goal of this paper is to show that there exists a simple, yet universal statistical logic of spectral graph analysis by recasting it into a nonparametric function estimation problem. The prescribed viewpoint appears to be good enough to accommodate most of the existing spectral graph techniques as a consequence of just one single formalism and algorithm.

    Submitted 20 September, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

    Comments: Major changes have been done in terms of contents and structure of the paper. New set of motivations for GraField, Expanding Section 4, Connections with Diffusion map and Google's PageRank method etc

  35. arXiv:1509.06428  [pdf, other

    stat.ME math.ST

    Large-Scale Mode Identification and Data-Driven Sciences

    Authors: Subhadeep Mukhopadhyay

    Abstract: Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-base investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large dat… ▽ More

    Submitted 8 November, 2016; v1 submitted 21 September, 2015; originally announced September 2015.

    Comments: I would like to express my sincere thanks to the Editor and the anonymous reviewers for their in-depth comments, which have greatly improved the manuscript

  36. arXiv:1508.03747  [pdf, other

    stat.AP stat.CO stat.ME

    Nonparametric Distributed Learning Architecture for Big Data: Algorithm and Applications

    Authors: Scott Bruce, Zeda Li, Hsiang-Chieh Yang, Subhadeep Mukhopadhyay

    Abstract: Dramatic increases in the size and complexity of modern datasets have made traditional "centralized" statistical inference prohibitive. In addition to computational challenges associated with big data learning, the presence of numerous data types (e.g. discrete, continuous, categorical, etc.) makes automation and scalability difficult. A question of immediate concern is how to design a data-intens… ▽ More

    Submitted 26 February, 2018; v1 submitted 15 August, 2015; originally announced August 2015.

    Comments: The purpose of this paper is to answer the question: What is the relevance of small-data-ideas in this big-data world? The bigger question is: Should we make difficult things easy or easy things look difficult? The first option will probably make some impact in the long-run, but the second one will surely earn prestigious journal publications in short-run, IEEE Transactions on Big Data (forthcoming). The first report came out in 2015

    MSC Class: 62G05

  37. arXiv:1507.08727  [pdf, other

    math.ST stat.ME

    Large Scale Signal Detection: A Unified Perspective

    Authors: Subhadeep Mukhopadhyay

    Abstract: There is an overwhelmingly large literature and algorithms already available on `large scale inference problems' based on different modeling techniques and cultures. Our primary goal in this paper is \emph{not to add one more new methodology} to the existing toolbox but instead (a) to clarify the mystery how these different simultaneous inference methods are \emph{connected}, (b) to provide an alt… ▽ More

    Submitted 31 March, 2017; v1 submitted 30 July, 2015; originally announced July 2015.

    Comments: Online link of supplementary materials added; copyediting typos corrected

    Journal ref: Biometrics (2016), 72, 2, 325-334

  38. Community detection in multi-relational data with restricted multi-layer stochastic blockmodel

    Authors: Subhadeep Paul, Yuguo Chen

    Abstract: In recent years there has been an increased interest in statistical analysis of data with multiple types of relations among a set of entities. Such multi-relational data can be represented as multi-layer graphs where the set of vertices represents the entities and multiple types of edges represent the different relations among them. For community detection in multi-layer graphs, we consider two ra… ▽ More

    Submitted 21 January, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: 55 pages, 9 figures

    Journal ref: Electron. J. Statist. Volume 10, Number 2 (2016), 3807-3870

  39. arXiv:1412.1530  [pdf, other

    math.ST stat.ME

    Strength of Connections in a Random Graph: Definition, Characterization, and Estimation

    Authors: Subhadeep Mukhopadhyay

    Abstract: How can the `affinity' or `strength' of ties of a random graph be characterized and compactly represented? How can concepts like Fourier and inverse-Fourier like transform be developed for graph data? To do so, we introduce a new graph-theoretic function called `Graph Correlation Density Field' (or in short GraField), which differs from the traditional edge probability density-based approaches, to… ▽ More

    Submitted 9 December, 2015; v1 submitted 3 December, 2014; originally announced December 2014.

    Comments: 18 pages, 6 Figures. Third version

  40. arXiv:1405.2601  [pdf, other

    math.ST stat.ME

    LP Approach to Statistical Modeling

    Authors: Subhadeep Mukhopadhyay, Emanuel Parzen

    Abstract: We present an approach to statistical data modeling and exploratory data analysis called `LP Statistical Data Science.' It aims to generalize and unify traditional and novel statistical measures, methods, and exploratory tools. This article outlines fundamental concepts along with real-data examples to illustrate how the `LP Statistical Algorithm' can systematically tackle different varieties of d… ▽ More

    Submitted 11 May, 2014; originally announced May 2014.

  41. arXiv:1311.0562  [pdf, ps, other

    math.ST stat.ME

    LP Mixed Data Science : Outline of Theory

    Authors: Emanuel Parzen, Subhadeep Mukhopadhyay

    Abstract: This article presents the theoretical foundation of a new frontier of research-`LP Mixed Data Science'-that simultaneously extends and integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and is applicable to the teaching and training of statistics. Statistics journals have great difficulty accepting papers unlike those previously publ… ▽ More

    Submitted 6 November, 2013; v1 submitted 3 November, 2013; originally announced November 2013.

  42. arXiv:1308.2403  [pdf, other

    stat.ME math.ST stat.AP stat.ML

    CDfdr: A Comparison Density Approach to Local False Discovery Rate Estimation

    Authors: Subhadeep Mukhopadhyay

    Abstract: Efron et al. (2001) proposed empirical Bayes formulation of the frequentist Benjamini and Hochbergs False Discovery Rate method (Benjamini and Hochberg,1995). This article attempts to unify the `two cultures' using concepts of comparison density and distribution function. We have also shown how almost all of the existing local fdr methods can be viewed as proposing various model specification for… ▽ More

    Submitted 11 August, 2013; originally announced August 2013.

  43. arXiv:1308.0642  [pdf, other

    math.ST stat.AP stat.ME stat.ML

    Nonlinear Time Series Modeling: A Unified Perspective, Algorithm, and Application

    Authors: Subhadeep Mukhopadhyay, Emanuel Parzen

    Abstract: A new comprehensive approach to nonlinear time series analysis and modeling is developed in the present paper. We introduce novel data-specific mid-distribution based Legendre Polynomial (LP) like nonlinear transformations of the original time series Y(t) that enables us to adapt all the existing stationary linear Gaussian time series modeling strategy and made it applicable for non-Gaussian and n… ▽ More

    Submitted 23 December, 2017; v1 submitted 2 August, 2013; originally announced August 2013.

    Comments: Major restructuring has been done

  44. arXiv:1308.0641  [pdf, other

    math.ST stat.ME stat.ML

    United Statistical Algorithm, Small and Big Data: Future OF Statistician

    Authors: Emanuel Parzen, Subhadeep Mukhopadhyay

    Abstract: This article provides the role of big idea statisticians in future of Big Data Science. We describe the `United Statistical Algorithms' framework for comprehensive unification of traditional and novel statistical methods for modeling Small Data and Big Data, especially mixed data (discrete, continuous).

    Submitted 2 August, 2013; originally announced August 2013.

  45. arXiv:1204.4699  [pdf, other

    math.ST stat.ME stat.ML

    Modeling, dependence, classification, united statistical science, many cultures

    Authors: Emanuel Parzen, Subhadeep Mukhopadhyay

    Abstract: Breiman (2001) proposed to statisticians awareness of two cultures: 1. Parametric modeling culture, pioneered by R.A.Fisher and Jerzy Neyman; 2. Algorithmic predictive culture, pioneered by machine learning research. Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of many cultures, including the focus of our research: 3. Nonparametric, quantile based, in… ▽ More

    Submitted 23 April, 2012; v1 submitted 20 April, 2012; originally announced April 2012.

    Comments: 31 pages, 10 Figures

    MSC Class: 62Gxx

  46. arXiv:1112.3373  [pdf, other

    stat.ME

    Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation

    Authors: S. Mukhopadhyay, Emanuel Parzen, S. N. Lahiri

    Abstract: This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not i… ▽ More

    Submitted 14 December, 2011; originally announced December 2011.