-
Multiview graph dual-attention deep learning and contrastive learning for multi-criteria recommender systems
Authors:
Saman Forouzandeh,
Pavel N. Krivitsky,
Rohitash Chandra
Abstract:
Recommender systems leveraging deep learning models have been crucial for assisting users in selecting items aligned with their preferences and interests. However, a significant challenge persists in single-criteria recommender systems, which often overlook the diverse attributes of items that have been addressed by Multi-Criteria Recommender Systems (MCRS). Shared embedding vector for multi-crite…
▽ More
Recommender systems leveraging deep learning models have been crucial for assisting users in selecting items aligned with their preferences and interests. However, a significant challenge persists in single-criteria recommender systems, which often overlook the diverse attributes of items that have been addressed by Multi-Criteria Recommender Systems (MCRS). Shared embedding vector for multi-criteria item ratings but have struggled to capture the nuanced relationships between users and items based on specific criteria. In this study, we present a novel representation for Multi-Criteria Recommender Systems (MCRS) based on a multi-edge bipartite graph, where each edge represents one criterion rating of items by users, and Multiview Dual Graph Attention Networks (MDGAT). Employing MDGAT is beneficial and important for adequately considering all relations between users and items, given the presence of both local (criterion-based) and global (multi-criteria) relations. Additionally, we define anchor points in each view based on similarity and employ local and global contrastive learning to distinguish between positive and negative samples across each view and the entire graph. We evaluate our method on two real-world datasets and assess its performance based on item rating predictions. The results demonstrate that our method achieves higher accuracy compared to the baseline method for predicting item ratings on the same datasets. MDGAT effectively capture the local and global impact of neighbours and the similarity between nodes.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Rejoinder to Discussion of "A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks''
Authors:
Pavel N. Krivitsky,
Pietro Coletti,
Niel Hens
Abstract:
This rejoinder responds to discussions by of Caimo, Niezink, and Schweinberger and Fritz of ''A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks'' by Krivitsky, Coletti, and Hens, all published in the Journal of the American Statistical Association in 2023.
This rejoinder responds to discussions by of Caimo, Niezink, and Schweinberger and Fritz of ''A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks'' by Krivitsky, Coletti, and Hens, all published in the Journal of the American Statistical Association in 2023.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Balance Correlations, Agentic Zeros, and Networks: The Structure of 192 Years of War and Peace
Authors:
David Dekker,
David Krackhardt,
Patrick Doreian,
Pavel N. Krivitsky
Abstract:
Social network extensions of Heider's balance theory have not always been consistent. Structural balance theory primarily focuses on graph partitioning, thereby assuming, homogeneity in balance-driven behavior of nodes. We present a general model and formal notation that permit testing such behavioral assumptions. Specifically, we formulate statements as a comparison of two conditional probabiliti…
▽ More
Social network extensions of Heider's balance theory have not always been consistent. Structural balance theory primarily focuses on graph partitioning, thereby assuming, homogeneity in balance-driven behavior of nodes. We present a general model and formal notation that permit testing such behavioral assumptions. Specifically, we formulate statements as a comparison of two conditional probabilities of a tie, $Ego\stackrel{q}{\text{-}}Alter$, first conditional on 2-paths $Ego\, \stackrel{r}{\text{-}}\,X\,\stackrel{s}{\text{-}}\,Alter$, and second conditional on all others, $\neg (Ego\,\stackrel{r}{\text{-}}\,X\,\stackrel{s}{\text{-}}\,Alter)$. The key here is that $q$, $r$ and $s$ represent indices of relations in a set of mutually exclusive and exhaustive relations (their sum produces a complete graph). This relaxes the assumption of a signed graph dichotomy. Here we identify neutral as distinct from negative and positive ties. Descriptive statistics measuring the difference in conditional probabilities, or the prevalence for any stipulated balance configuration, are given by the point bi-serial correlations of relation $q$ with the count of $2$-paths (through relations $r$ and $s$). Two major advantages are: direct comparison, even if network sizes and densities differ, and evaluation of specific (un)balance behaviors. We apply this approach on a data set with friendly vs hostile relations between countries from 1816 to 2007. We find strong evidence for one of the four classic Heiderian balance theory predictions, and virtually no evidence in support of the unbalanced predictions. However, we do find stable and surprising evidence that the neutral ties are important in balancing the relations among nations. Results further suggest that prevalence of balance driven behavior varies over time, and that other triadic motivated behaviors prevail among countries in certain eras.
△ Less
Submitted 8 October, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Modeling Tie Duration in ERGM-Based Dynamic Network Models
Authors:
Pavel N. Krivitsky
Abstract:
Krivitsky and Handcock (2014) proposed a Separable Temporal ERGM (STERGM) framework for modeling social networks, which facilitates separable modeling of the tie duration distributions and the structural dynamics of tie formation. In this note, we explore the hazard structures achievable in this framework, with first- and higher-order Markov assumptions, and propose ways to model a variety of dura…
▽ More
Krivitsky and Handcock (2014) proposed a Separable Temporal ERGM (STERGM) framework for modeling social networks, which facilitates separable modeling of the tie duration distributions and the structural dynamics of tie formation. In this note, we explore the hazard structures achievable in this framework, with first- and higher-order Markov assumptions, and propose ways to model a variety of duration distributions in this framework.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
ergm 4: Computational Improvements
Authors:
Pavel N. Krivitsky,
David R. Hunter,
Martina Morris,
Chad Klumb
Abstract:
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the performance improvements in the 2021 release of ergm version 4. These include performance enhancements to the Markov chain Monte Carlo…
▽ More
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the performance improvements in the 2021 release of ergm version 4. These include performance enhancements to the Markov chain Monte Carlo and maximum likelihood estimation algorithms as well as broader and faster searching for networks with certain target statistics using simulated annealing.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Modeling of Dynamic Networks based on Egocentric Data with Durational Information
Authors:
Pavel N. Krivitsky
Abstract:
Modeling of dynamic networks -- networks that evolve over time -- has manifold applications in many fields. In epidemiology in particular, there is a need for data-driven modeling of human sexual relationship networks for the purpose of modeling and simulation of the spread of sexually transmitted disease. Dynamic network data about such networks are extremely difficult to collect, however, and mu…
▽ More
Modeling of dynamic networks -- networks that evolve over time -- has manifold applications in many fields. In epidemiology in particular, there is a need for data-driven modeling of human sexual relationship networks for the purpose of modeling and simulation of the spread of sexually transmitted disease. Dynamic network data about such networks are extremely difficult to collect, however, and much more readily available are egocentrically sampled data of a network at a single time point, with some attendant information about the sexual history of respondents.
Krivitsky and Handcock (2014) proposed a Separable Temporal ERGM (STERGM) framework, which facilitates separable modeling of the tie duration distributions and the structural dynamics of tie formation. In this work, we apply this modeling framework to this problem, by studying the long-run properties of STERGM processes, developing methods for fitting STERGMs to egocentrically sampled data, and extending the network size adjustment method of Krivitsky, Handcock, and Morris (2011) to dynamic models.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks
Authors:
Pavel N. Krivitsky,
Pietro Coletti,
Niel Hens
Abstract:
The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, th…
▽ More
The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We wish to combine their strengths to learn the social forces that shape household contact formation and facilitate simulation for prediction of disease spread, while generalising to the population of households in the region.
To accomplish this, we describe a flexible framework for specifying multi-network models in the exponential family class and identify the requirements for inference and prediction under this framework to be consistent, identifiable, and generalisable, even when data are incomplete; explore how these requirements may be violated in practice; and develop a suite of quantitative and graphical diagnostics for detecting violations and suggesting improvements to candidate models. We report on the effects of network size, geography, and household roles on household contact patterns (activity, heterogeneity in activity, and triadic closure).
△ Less
Submitted 18 July, 2023; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Likelihood-based Inference for Exponential-Family Random Graph Models via Linear Programming
Authors:
Pavel N. Krivitsky,
Alina R. Kuvelkar,
David R. Hunter
Abstract:
This article discusses the problem of determining whether a given point, or set of points, lies within the convex hull of another set of points in $d$ dimensions. This problem arises naturally in a statistical context when using a particular approximation to the loglikelihood function for an exponential family model; in particular, we discuss the application to network models here. While the conve…
▽ More
This article discusses the problem of determining whether a given point, or set of points, lies within the convex hull of another set of points in $d$ dimensions. This problem arises naturally in a statistical context when using a particular approximation to the loglikelihood function for an exponential family model; in particular, we discuss the application to network models here. While the convex hull question may be solved via a simple linear program, this approach is not well known in the statistical literature. Furthermore, this article details several substantial improvements to the convex hull-testing algorithm currently implemented in the widely used 'ergm' package for network modeling.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
ergm 4: New features
Authors:
Pavel N. Krivitsky,
David R. Hunter,
Martina Morris,
Chad Klumb
Abstract:
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that…
▽ More
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergm's functionality to other network data types and structural features and the robust set of online resources that support the statnet development process and applications.
△ Less
Submitted 15 March, 2022; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Revisiting Bayesian Autoencoders with MCMC
Authors:
Rohitash Chandra,
Mahir Jain,
Manavendra Maharana,
Pavel N. Krivitsky
Abstract:
Autoencoders gained popularity in the deep learning revolution given their ability to compress data and provide dimensionality reduction. Although prominent deep learning methods have been used to enhance autoencoders, the need to provide robust uncertainty quantification remains a challenge. This has been addressed with variational autoencoders so far. Bayesian inference via Markov Chain Monte Ca…
▽ More
Autoencoders gained popularity in the deep learning revolution given their ability to compress data and provide dimensionality reduction. Although prominent deep learning methods have been used to enhance autoencoders, the need to provide robust uncertainty quantification remains a challenge. This has been addressed with variational autoencoders so far. Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling has faced several limitations for large models; however, recent advances in parallel computing and advanced proposal schemes have opened routes less traveled. This paper presents Bayesian autoencoders powered by MCMC sampling implemented using parallel computing and Langevin-gradient proposal distribution. The results indicate that the proposed Bayesian autoencoder provides similar performance accuracy when compared to related methods in the literature. Furthermore, it provides uncertainty quantification in the reduced data representation. This motivates further applications of the Bayesian autoencoder framework for other deep learning models.
△ Less
Submitted 28 April, 2022; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Exponential-Family Models of Random Graphs: Inference in Finite-, Super-, and Infinite Population Scenarios
Authors:
Michael Schweinberger,
Pavel N. Krivitsky,
Carter T. Butts,
Jonathan Stewart
Abstract:
Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling sparse and dense random graphs, short- and long-tailed degree distributions, covariates, and a wide range of complex dependencies. Special cases of ERGMs are generalized linear models (GLMs), Bernoulli random graphs, $β$-models, $p_1$-models, and models related to Markov random fields in spatial st…
▽ More
Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling sparse and dense random graphs, short- and long-tailed degree distributions, covariates, and a wide range of complex dependencies. Special cases of ERGMs are generalized linear models (GLMs), Bernoulli random graphs, $β$-models, $p_1$-models, and models related to Markov random fields in spatial statistics and other areas of statistics. While widely used in practice, questions have been raised about the theoretical properties of ERGMs. These include concerns that some ERGMs are near-degenerate and that many ERGMs are non-projective. To address them, careful attention must be paid to model specifications and their underlying assumptions, and in which inferential settings models are employed. As we discuss, near-degeneracy can affect simplistic ERGMs lacking structure, but well-posed ERGMs with additional structure can be well-behaved. Likewise, lack of projectivity can affect non-likelihood-based inference, but likelihood-based inference does not require projectivity. Here, we review well-posed ERGMs along with likelihood-based inference. We first clarify the core statistical notions of "sample" and "population" in the ERGM framework, and separate the process that generates the population graph from the observation process. We then review likelihood-based inference in finite-, super-, and infinite-population scenarios. We conclude with consistency results, and an application to human brain networks
△ Less
Submitted 12 September, 2019; v1 submitted 15 July, 2017;
originally announced July 2017.
-
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Authors:
Vishesh Karwa,
Pavel N. Krivitsky,
Aleksandra B. Slavković
Abstract:
Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the…
▽ More
Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy \emph{and} supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analyzing such data. We use a simple yet effective randomized response mechanism to generate synthetic networks under $ε$-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.
△ Less
Submitted 23 September, 2016; v1 submitted 9 November, 2015;
originally announced November 2015.
-
Capturing Multivariate Spatial Dependence: Model, Estimate and then Predict
Authors:
Noel Cressie,
Sandy Burden,
Walter Davis,
Pavel N. Krivitsky,
Payam Mokhtarian,
Thomas Suesse,
Andrew Zammit-Mangion
Abstract:
Physical processes rarely occur in isolation, rather they influence and interact with one another. Thus, there is great benefit in modeling potential dependence between both spatial locations and different processes. It is the interaction between these two dependencies that is the focus of Genton and Kleiber's paper under discussion. We see the problem of ensuring that any multivariate spatial cov…
▽ More
Physical processes rarely occur in isolation, rather they influence and interact with one another. Thus, there is great benefit in modeling potential dependence between both spatial locations and different processes. It is the interaction between these two dependencies that is the focus of Genton and Kleiber's paper under discussion. We see the problem of ensuring that any multivariate spatial covariance matrix is nonnegative definite as important, but we also see it as a means to an end. That "end" is solving the scientific problem of predicting a multivariate field. [arXiv:1507.08017].
△ Less
Submitted 30 July, 2015;
originally announced July 2015.
-
Exponential-Family Random Graph Models for Rank-Order Relational Data
Authors:
Pavel N. Krivitsky,
Carter T. Butts
Abstract:
Rank-order relational data, in which each actor ranks the others according to some criterion, often arise from sociometric measurements of judgment (e.g., self-reported interpersonal interaction) or preference (e.g., relative liking). We propose a class of exponential-family models for rank-order relational data and derive a new class of sufficient statistics for such data, which assume no more th…
▽ More
Rank-order relational data, in which each actor ranks the others according to some criterion, often arise from sociometric measurements of judgment (e.g., self-reported interpersonal interaction) or preference (e.g., relative liking). We propose a class of exponential-family models for rank-order relational data and derive a new class of sufficient statistics for such data, which assume no more than within-subject ordinal properties. Application of MCMC MLE to this family allows us to estimate effects for a variety of plausible mechanisms governing rank structure in cross-sectional context, and to model the evolution of such structures over time. We apply this framework to model the evolution of relative liking judgments in an acquaintance process, and to model recall of relative volume of interpersonal interaction among members of a technology education program.
△ Less
Submitted 20 June, 2015; v1 submitted 1 October, 2012;
originally announced October 2012.
-
On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry
Authors:
Pavel N. Krivitsky,
Eric D. Kolaczyk
Abstract:
The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature - natural analogues of standard ques…
▽ More
The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature - natural analogues of standard questions already posed and addressed in more classical areas of statistics - that have yet to even be posed, much less addressed. Here we raise and consider one such question in connection with network modeling. Specifically, we ask, "Given an observed network, what is the sample size?" Using simple, illustrative examples from the class of exponential random graph models, we show that the answer to this question can very much depend on basic properties of the networks expected under the model, as the number of vertices $n_V$ in the network grows. In particular, adopting the (asymptotic) scaling of the variance of the maximum likelihood parameter estimates as a notion of effective sample size ($n_{\mathrm{eff}}$), we show that when modeling the overall propensity to have ties and the propensity to reciprocate ties, whether the networks are sparse or not under the model (i.e., having a constant or an increasing number of ties per vertex, respectively) is sufficient to yield an order of magnitude difference in $n_{\mathrm{eff}}$, from $O(n_V)$ to $O(n^2_V)$. In addition, we report simulation study results that suggest similar properties for models for triadic (friend-of-a-friend) effects. We then explore some practical implications of this result, using both simulation and data on food-sharing from Lamalera, Indonesia.
△ Less
Submitted 5 August, 2015; v1 submitted 5 December, 2011;
originally announced December 2011.
-
Exponential-Family Random Graph Models for Valued Networks
Authors:
Pavel N. Krivitsky
Abstract:
Exponential-family random graph models (ERGMs) provide a principled and flexible way to model and simulate features common in social networks, such as propensities for homophily, mutuality, and friend-of-a-friend triad closure, through choice of model terms (sufficient statistics). However, those ERGMs modeling the more complex features have, to date, been limited to binary data: presence or absen…
▽ More
Exponential-family random graph models (ERGMs) provide a principled and flexible way to model and simulate features common in social networks, such as propensities for homophily, mutuality, and friend-of-a-friend triad closure, through choice of model terms (sufficient statistics). However, those ERGMs modeling the more complex features have, to date, been limited to binary data: presence or absence of ties. Thus, analysis of valued networks, such as those where counts, measurements, or ranks are observed, has necessitated dichotomizing them, losing information and introducing biases.
In this work, we generalize ERGMs to valued networks. Focusing on modeling counts, we formulate an ERGM for networks whose ties are counts and discuss issues that arise when moving beyond the binary case. We introduce model terms that generalize and model common social network features for such data and apply these methods to a network dataset whose values are counts of interactions.
△ Less
Submitted 19 January, 2012; v1 submitted 7 January, 2011;
originally announced January 2011.
-
A Separable Model for Dynamic Networks
Authors:
Pavel N. Krivitsky,
Mark S. Handcock
Abstract:
Models of dynamic networks --- networks that evolve over time --- have manifold applications. We develop a discrete-time generative model for social network evolution that inherits the richness and flexibility of the class of exponential-family random graph models. The model --- a Separable Temporal ERGM (STERGM) --- facilitates separable modeling of the tie duration distributions and the structur…
▽ More
Models of dynamic networks --- networks that evolve over time --- have manifold applications. We develop a discrete-time generative model for social network evolution that inherits the richness and flexibility of the class of exponential-family random graph models. The model --- a Separable Temporal ERGM (STERGM) --- facilitates separable modeling of the tie duration distributions and the structural dynamics of tie formation. We develop likelihood-based inference for the model, and provide computational algorithms for maximum likelihood estimation. We illustrate the interpretability of the model in analyzing a longitudinal network of friendship ties within a school.
△ Less
Submitted 19 August, 2012; v1 submitted 8 November, 2010;
originally announced November 2010.
-
Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models
Authors:
Pavel N. Krivitsky,
Mark S. Handcock,
Martina Morris
Abstract:
Exponential-family random graph models (ERGMs) provide a principled way to model and simulate features common in human social networks, such as propensities for homophily and friend-of-a-friend triad closure. We show that, without adjustment, ERGMs preserve density as network size increases. Density invariance is often not appropriate for social networks. We suggest a simple modification based on…
▽ More
Exponential-family random graph models (ERGMs) provide a principled way to model and simulate features common in human social networks, such as propensities for homophily and friend-of-a-friend triad closure. We show that, without adjustment, ERGMs preserve density as network size increases. Density invariance is often not appropriate for social networks. We suggest a simple modification based on an offset which instead preserves the mean degree and accommodates changes in network composition asymptotically. We demonstrate that this approach allows ERGMs to be applied to the important situation of egocentrically sampled data. We analyze data from the National Health and Social Life Survey (NHSLS).
△ Less
Submitted 27 December, 2010; v1 submitted 29 April, 2010;
originally announced April 2010.