-
The illusion of households as entities in social networks
Authors:
Izabel Aguiar,
Philip S. Chodrow,
Johan Ugander
Abstract:
Data recording connections between people in communities and villages are collected and analyzed in various ways, most often as either networks of individuals or as networks of households. These two networks can differ in substantial ways. The methodological choice of which network to study, therefore, is an important aspect in both study design and data analysis. In this work, we consider various…
▽ More
Data recording connections between people in communities and villages are collected and analyzed in various ways, most often as either networks of individuals or as networks of households. These two networks can differ in substantial ways. The methodological choice of which network to study, therefore, is an important aspect in both study design and data analysis. In this work, we consider various key differences between household and individual social network structure, and ways in which the networks cannot be used interchangeably. In addition to formalizing the choices for representing each network, we explore the consequences of how the results of social network analysis change depending on the choice between studying the individual and household network -- from determining whether networks are assortative or disassortative to the ranking of influence-maximizing nodes. As our main contribution, we draw upon related work to propose a set of systematic recommendations for determining the relevant network representation to study. Our recommendations include assessing a series of entitativity criteria and relating these criteria to theories and observations about patterns and norms in social dynamics at the household level: notably, how information spreads within households and how power structures and gender roles affect this spread. We draw upon the definition of an illusion of entitativity to identify cases wherein grouping people into households does not satisfy these criteria or adequately represent given cultural or experimental contexts. Given the widespread use of social network data for studying communities, there is broad impact in understanding which network to study and the consequences of that decision. We hope that this work gives guidance to practitioners and researchers collecting and studying social network data.
△ Less
Submitted 1 May, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Hypergraph Link Prediction via Hyperedge Copying
Authors:
Xie He,
Philip S. Chodrow,
Peter J. Mucha
Abstract:
We propose a generative model of temporally-evolving hypergraphs in which hyperedges form via noisy copying of previous hyperedges. Our proposed model reproduces several stylized facts from many empirical hypergraphs, is learnable from data, and defines a likelihood over a complete hypergraph rather than ego-based or other sub-hypergraphs. Analyzing our model, we derive descriptions of node degree…
▽ More
We propose a generative model of temporally-evolving hypergraphs in which hyperedges form via noisy copying of previous hyperedges. Our proposed model reproduces several stylized facts from many empirical hypergraphs, is learnable from data, and defines a likelihood over a complete hypergraph rather than ego-based or other sub-hypergraphs. Analyzing our model, we derive descriptions of node degree, edge size, and edge intersection size distributions in terms of the model parameters. We also show several features of empirical hypergraphs which are and are not successfully captured by our model. We provide a scalable stochastic expectation maximization algorithm with which we can fit our model to hypergraph data sets with millions of nodes and edges. Finally, we assess our model on a hypergraph link prediction task, finding that an instantiation of our model with just 11 parameters can achieve competitive predictive performance with large neural networks.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Emergence of polarization in a sigmoidal bounded-confidence model of opinion dynamics
Authors:
Heather Z. Brooks,
Philip S. Chodrow,
Mason A. Porter
Abstract:
We study a nonlinear bounded-confidence model (BCM) of continuous-time opinion dynamics on networks with both persuadable individuals and zealots. The model is parameterized by a scalar $γ$, which controls the steepness of a smooth influence function. This influence function encodes the relative weights that nodes place on the opinions of other nodes. When $γ= 0$, this influence function recovers…
▽ More
We study a nonlinear bounded-confidence model (BCM) of continuous-time opinion dynamics on networks with both persuadable individuals and zealots. The model is parameterized by a scalar $γ$, which controls the steepness of a smooth influence function. This influence function encodes the relative weights that nodes place on the opinions of other nodes. When $γ= 0$, this influence function recovers Taylor's averaging model; when $γ\rightarrow \infty$, the influence function converges to that of a modified Hegselmann--Krause (HK) BCM. Unlike the classical HK model, however, our sigmoidal bounded-confidence model (SBCM) is smooth for any finite $γ$. We show that the set of steady states of our SBCM is qualitatively similar to that of the Taylor model when $γ$ is small and that the set of steady states approaches a subset of the set of steady states of a modified HK model as $γ\rightarrow \infty$. For several special graph topologies, we give analytical descriptions of important features of the space of steady states. A notable result is a closed-form relationship between the stability of a polarized state and the graph topology in a simple model of echo chambers in social networks. Because the influence function of our BCM is smooth, we are able to study it with linear stability analysis, which is difficult to employ with the usual discontinuous influence functions in BCMs.
△ Less
Submitted 29 July, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Generative hypergraph clustering: from blockmodels to modularity
Authors:
Philip S. Chodrow,
Nate Veldt,
Austin R. Benson
Abstract:
Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. Many graph algorithms for this task are based on variants of the stochastic blockmodel, a random graph with flexible cluster structure. However, there are few models and algorithms for hypergraph clustering.…
▽ More
Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. Many graph algorithms for this task are based on variants of the stochastic blockmodel, a random graph with flexible cluster structure. However, there are few models and algorithms for hypergraph clustering. Here, we propose a Poisson degree-corrected hypergraph stochastic blockmodel (DCHSBM), a generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum-likelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvain-type algorithm for this objective, as well as a a faster, specialized "All-Or-Nothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edge-size parameters. We show that AON hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. We use our generative model to analyze different patterns of higher-order structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in co-purchasing behavior, and hotel locations from web browsing sessions, finding interpretable higher-order structure. We then study the behavior of our AON hypergraph Louvain algorithm, finding that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higher-order structure.
△ Less
Submitted 18 August, 2021; v1 submitted 23 January, 2021;
originally announced January 2021.
-
Emergence of Hierarchy in Networked Endorsement Dynamics
Authors:
Mari Kawakatsu,
Philip S. Chodrow,
Nicole Eikmeier,
Daniel B. Larremore
Abstract:
Many social and biological systems are characterized by enduring hierarchies, including those organized around prestige in academia, dominance in animal groups, and desirability in online dating. Despite their ubiquity, the general mechanisms that explain the creation and endurance of such hierarchies are not well understood. We introduce a generative model for the dynamics of hierarchies using ti…
▽ More
Many social and biological systems are characterized by enduring hierarchies, including those organized around prestige in academia, dominance in animal groups, and desirability in online dating. Despite their ubiquity, the general mechanisms that explain the creation and endurance of such hierarchies are not well understood. We introduce a generative model for the dynamics of hierarchies using time-varying networks in which new links are formed based on the preferences of nodes in the current network and old links are forgotten over time. The model produces a range of hierarchical structures, ranging from egalitarianism to bistable hierarchies, and we derive critical points that separate these regimes in the limit of long system memory. Importantly, our model supports statistical inference, allowing for a principled comparison of generative mechanisms using data. We apply the model to study hierarchical structures in empirical data on hiring patterns among mathematicians, dominance relations among parakeets, and friendships among members of a fraternity, observing several persistent patterns as well as interpretable differences in the generative mechanisms favored by each. Our work contributes to the growing literature on statistically grounded models of time-varying networks.
△ Less
Submitted 7 May, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Moments of Uniform Random Multigraphs with Fixed Degree Sequences
Authors:
Philip S. Chodrow
Abstract:
We study the expected adjacency matrix of a uniformly random multigraph with fixed degree sequence $\mathbf{d} \in \mathbb{Z}_+^n$. This matrix arises in a variety of analyses of networked data sets, including modularity-maximization and mean-field theories of spreading processes. Its structure is well-understood for large, sparse, simple graphs: the expected number of edges between nodes $i$ and…
▽ More
We study the expected adjacency matrix of a uniformly random multigraph with fixed degree sequence $\mathbf{d} \in \mathbb{Z}_+^n$. This matrix arises in a variety of analyses of networked data sets, including modularity-maximization and mean-field theories of spreading processes. Its structure is well-understood for large, sparse, simple graphs: the expected number of edges between nodes $i$ and $j$ is roughly $\frac{d_id_j}{\sum_\ell{d_\ell}}$. Many network data sets are neither large, sparse, nor simple, and in these cases the standard approximation no longer applies. We derive a novel estimator using a dynamical approach: the estimator emerges from the stationarity conditions of a class of Markov Chain Monte Carlo algorithms for graph sampling. We derive error bounds for this estimator, and provide an efficient scheme with which to compute it. We test the estimator on synthetic and empirical degree sequences, finding that it enjoys relative error against ground truth a full order of magnitude smaller than the standard approximation. We then compare modularity maximization techniques using both the standard and novel estimator, finding that the qualitative structure of the optimization landscape depends significantly on the estimator choice. Our results emphasize the importance of using carefully specified random graph models in data scientific applications.
△ Less
Submitted 6 February, 2020; v1 submitted 19 September, 2019;
originally announced September 2019.
-
Configuration Models of Random Hypergraphs
Authors:
Philip S. Chodrow
Abstract:
Many empirical networks are intrinsically polyadic, with interactions occurring within groups of agents of arbitrary size. There are, however, few flexible null models that can support statistical inference for such polyadic networks. We define a class of null random hypergraphs that hold constant both the node degree and edge dimension sequences, generalizing the classical dyadic configuration mo…
▽ More
Many empirical networks are intrinsically polyadic, with interactions occurring within groups of agents of arbitrary size. There are, however, few flexible null models that can support statistical inference for such polyadic networks. We define a class of null random hypergraphs that hold constant both the node degree and edge dimension sequences, generalizing the classical dyadic configuration model. We provide a Markov Chain Monte Carlo scheme for sampling from these models, and discuss connections and distinctions between our proposed models and previous approaches. We then illustrate these models through a triplet of applications. We start with two classical network topics -- triadic clustering and degree-assortativity. In each, we emphasize the importance of randomizing over hypergraph space rather than projected graph space, showing that this choice can dramatically alter statistical inference and study findings. We then define and study the edge intersection profile of a hypergraph as a measure of higher-order correlation between edges, and derive asymptotic approximations under the stub-labeled null. Our experiments emphasize the ability of explicit, statistically-grounded polyadic modeling to significantly enhance the toolbox of network data science. We close with suggestions for multiple avenues of future work.
△ Less
Submitted 13 December, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Log-minor distributions and an application to estimating mean subsystem entropy
Authors:
Alice C. Schwarze,
Philip S. Chodrow,
Mason A. Porter
Abstract:
A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$. The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even…
▽ More
A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$. The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix.
Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $k\times k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with $n$, the number of samples that are needed to bound the sampling error is asymptotically independent of $n$. Instead, it is sufficient to increase the number of samples in linear proportion to $k$ to achieve a desired sampling accuracy.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Local Symmetry and Global Structure in Adaptive Voter Models
Authors:
Philip S. Chodrow,
Peter J. Mucha
Abstract:
Adaptive voter models (AVMs) are simple mechanistic systems that model the emergence of mesoscopic structure from local networked processes driven by conflict and homophily. AVMs display rich behavior, including a phase transition from a fully-fragmented regime of "echo-chambers" to a regime of persistent disagreement governed by low-dimensional quasistable manifolds. Many extant methods for appro…
▽ More
Adaptive voter models (AVMs) are simple mechanistic systems that model the emergence of mesoscopic structure from local networked processes driven by conflict and homophily. AVMs display rich behavior, including a phase transition from a fully-fragmented regime of "echo-chambers" to a regime of persistent disagreement governed by low-dimensional quasistable manifolds. Many extant methods for approximating the behavior of AVMs are either restricted in scope, expensive in computation, or inaccurate in predicting important statistics. In this work, we develop a novel, second-order moment closure approximation method for binary-state rewire-to-random and rewire-to-same model variants. We incorporate a small amount of noise via a random mutation term, which renders the system ergodic. Using ergodicity, we then approximate the voting process, which is non-Markovian in the second moments of the system, with a Markovian term near the phase transition. This approximation exploits an asymmetry between different classes of voting events. The resulting scheme enables us to predict the location of the phase transition and the active edge density in the regime of persistent disagreement, across the entire space of parameters and opinion densities. Numerically, our results are nearly exact for the rewire-to-random model, and competitive with other current approaches for the rewire-to-same model. Moreover, our computations display constant scaling in the mean degree, enabling approximations for denser systems than previously possible. We conclude with suggestions for model refinements and extensions.
△ Less
Submitted 24 December, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.