-
Targeting influence in a harmonic opinion model
Authors:
Zachary M. Boyd,
Nicolas Fraiman,
Jeremy L. Marzuola,
Peter J. Mucha,
Braxton Osting
Abstract:
Influence propagation in social networks is a central problem in modern social network analysis, with important societal applications in politics and advertising. A large body of work has focused on cascading models, viral marketing, and finite-horizon diffusion. There is, however, a need for more developed, mathematically principled \emph{adversarial models}, in which multiple, opposed actors str…
▽ More
Influence propagation in social networks is a central problem in modern social network analysis, with important societal applications in politics and advertising. A large body of work has focused on cascading models, viral marketing, and finite-horizon diffusion. There is, however, a need for more developed, mathematically principled \emph{adversarial models}, in which multiple, opposed actors strategically select nodes whose influence will maximally sway the crowd to their point of view.
In the present work, we develop and analyze such a model based on harmonic functions and linear diffusion. We prove that our general problem is NP-hard and that the objective function is monotone and submodular; consequently, we can greedily approximate the solution within a constant factor. Introducing and analyzing a convex relaxation, we show that the problem can be approximately solved using smooth optimization methods. We illustrate the effectiveness of our approach on a variety of example networks.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Escape times for subgraph detection and graph partitioning
Authors:
Zachary M. Boyd,
Nicolas Fraiman,
Jeremy L. Marzuola,
Peter J. Mucha,
Braxton Osting
Abstract:
We provide a rearrangement based algorithm for fast detection of subgraphs of $k$ vertices with long escape times for directed or undirected networks. Complementing other notions of densest subgraphs and graph cuts, our method is based on the mean hitting time required for a random walker to leave a designated set and hit the complement. We provide a new relaxation of this notion of hitting time o…
▽ More
We provide a rearrangement based algorithm for fast detection of subgraphs of $k$ vertices with long escape times for directed or undirected networks. Complementing other notions of densest subgraphs and graph cuts, our method is based on the mean hitting time required for a random walker to leave a designated set and hit the complement. We provide a new relaxation of this notion of hitting time on a given subgraph and use that relaxation to construct a fast subgraph detection algorithm and a generalization to $K$-partitioning schemes. Using a modification of the subgraph detector on each component, we propose a graph partitioner that identifies regions where random walks live for comparably large times. Importantly, our method implicitly respects the directed nature of the data for directed graphs while also being applicable to undirected graphs. We apply the partitioning method for community detection to a large class of model and real-world data sets.
△ Less
Submitted 24 December, 2022;
originally announced December 2022.
-
Wasserstein Archetypal Analysis
Authors:
Katy Craig,
Braxton Osting,
Dong Wang,
Yiming Xu
Abstract:
Archetypal analysis is an unsupervised machine learning method that summarizes data using a convex polytope. In its original formulation, for fixed k, the method finds a convex polytope with k vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared Euclidean distance between the data and the polytope is minimal.
In the present wo…
▽ More
Archetypal analysis is an unsupervised machine learning method that summarizes data using a convex polytope. In its original formulation, for fixed k, the method finds a convex polytope with k vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared Euclidean distance between the data and the polytope is minimal.
In the present work, we consider an alternative formulation of archetypal analysis based on the Wasserstein metric, which we call Wasserstein archetypal analysis (WAA). In one dimension, there exists a unique solution of WAA and, in two dimensions, we prove existence of a solution, as long as the data distribution is absolutely continuous with respect to Lebesgue measure. We discuss obstacles to extending our result to higher dimensions and general data distributions. We then introduce an appropriate regularization of the problem, via a Renyi entropy, which allows us to obtain existence of solutions of the regularized problem for general data distributions, in arbitrary dimensions. We prove a consistency result for the regularized problem, ensuring that if the data are iid samples from a probability measure, then as the number of samples is increased, a subsequence of the archetype points converges to the archetype points for the limiting data distribution, almost surely. Finally, we develop and implement a gradient-based computational approach for the two-dimensional problem, based on the semi-discrete formulation of the Wasserstein metric. Our analysis is supported by detailed computational experiments.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Extremal graph realizations and graph Laplacian eigenvalues
Authors:
Braxton Osting
Abstract:
For a regular polyhedron (or polygon) centered at the origin, the coordinates of the vertices are eigenvectors of the graph Laplacian for the skeleton of that polyhedron (or polygon) associated with the first (non-trivial) eigenvalue. In this paper, we generalize this relationship. For a given graph, we study the eigenvalue optimization problem of maximizing the first (non-trivial) eigenvalue of t…
▽ More
For a regular polyhedron (or polygon) centered at the origin, the coordinates of the vertices are eigenvectors of the graph Laplacian for the skeleton of that polyhedron (or polygon) associated with the first (non-trivial) eigenvalue. In this paper, we generalize this relationship. For a given graph, we study the eigenvalue optimization problem of maximizing the first (non-trivial) eigenvalue of the graph Laplacian over non-negative edge weights. We show that the spectral realization of the graph using the eigenvectors corresponding to the solution of this problem, under certain assumptions, is a centered, unit-distance graph realization that has maximal total variance. This result gives a new method for generating unit-distance graph realizations and is based on convex duality. A drawback of this method is that the dimension of the realization is given by the multiplicity of the extremal eigenvalue, which is typically unknown prior to solving the eigenvalue optimization problem. Our results are illustrated with a number of examples.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
A dynamical systems based framework for dimension reduction
Authors:
Ryeongkyung Yoon,
Braxton Osting
Abstract:
We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems, which we call dynamical dimension reduction (DDR). In the DDR model, each point is evolved via a nonlinear flow towards a lower-dimensional subspace; the projection onto the subspace gives the low-dimensional embedding. Training the model involves identifying the nonlinear flow…
▽ More
We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems, which we call dynamical dimension reduction (DDR). In the DDR model, each point is evolved via a nonlinear flow towards a lower-dimensional subspace; the projection onto the subspace gives the low-dimensional embedding. Training the model involves identifying the nonlinear flow and the subspace. Following the equation discovery method, we represent the vector field that defines the flow using a linear combination of dictionary elements, where each element is a pre-specified linear/nonlinear candidate function. A regularization term for the average total kinetic energy is also introduced and motivated by optimal transport theory. We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method. We also show how the DDR method can be trained using a gradient-based optimization method, where the gradients are computed using the adjoint method from optimal control theory. The DDR method is implemented and compared on synthetic and example datasets to other dimension reductions methods, including PCA, t-SNE, and Umap.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Flat tori with large Laplacian eigenvalues in dimensions up to eight
Authors:
Chiu-Yen Kao,
Braxton Osting,
Jackson C. Turner
Abstract:
We consider the optimization problem of maximizing the $k$-th Laplacian eigenvalue, $λ_{k}$, over flat $d$-dimensional tori of fixed volume. For $k=1$, this problem is equivalent to the densest lattice sphere packing problem. For larger $k$, this is equivalent to the NP-hard problem of finding the $d$-dimensional (dual) lattice with longest $k$-th shortest lattice vector. As a result of extensive…
▽ More
We consider the optimization problem of maximizing the $k$-th Laplacian eigenvalue, $λ_{k}$, over flat $d$-dimensional tori of fixed volume. For $k=1$, this problem is equivalent to the densest lattice sphere packing problem. For larger $k$, this is equivalent to the NP-hard problem of finding the $d$-dimensional (dual) lattice with longest $k$-th shortest lattice vector. As a result of extensive computations, for $d \leq 8$, we obtain a sequence of flat tori, $T_{k,d}$, each of volume one, such that the $k$-th Laplacian eigenvalue of $T_{k,d}$ is very large; for each (finite) $k$ the $k$-th eigenvalue exceeds the value in (the $k\to \infty$ asymptotic) Weyl's law by a factor between 1.54 and 2.01, depending on the dimension. Stationarity conditions are derived and numerically verified for $T_{k,d}$ and we describe the degeneration of the tori as $k \to \infty$.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
A non-autonomous equation discovery method for time signal classification
Authors:
Ryeongkyung Yoon,
Harish S. Bhat,
Braxton Osting
Abstract:
Certain neural network architectures, in the infinite-layer limit, lead to systems of nonlinear differential equations. Motivated by this idea, we develop a framework for analyzing time signals based on non-autonomous dynamical equations. We view the time signal as a forcing function for a dynamical system that governs a time-evolving hidden variable. As in equation discovery, the dynamical system…
▽ More
Certain neural network architectures, in the infinite-layer limit, lead to systems of nonlinear differential equations. Motivated by this idea, we develop a framework for analyzing time signals based on non-autonomous dynamical equations. We view the time signal as a forcing function for a dynamical system that governs a time-evolving hidden variable. As in equation discovery, the dynamical system is represented using a dictionary of functions and the coefficients are learned from data. This framework is applied to the time signal classification problem. We show how gradients can be efficiently computed using the adjoint method, and we apply methods from dynamical systems to establish stability of the classifier. Through a variety of experiments, on both synthetic and real datasets, we show that the proposed method uses orders of magnitude fewer parameters than competing methods, while achieving comparable accuracy. We created the synthetic datasets using dynamical systems of increasing complexity; though the ground truth vector fields are often polynomials, we find consistently that a Fourier dictionary yields the best results. We also demonstrate how the proposed method yields graphical interpretability in the form of phase portraits.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
A metric on directed graphs and Markov chains based on hitting probabilities
Authors:
Zachary M. Boyd,
Nicolas Fraiman,
Jeremy L. Marzuola,
Peter J. Mucha,
Braxton Osting,
Jonathan Weare
Abstract:
The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state sp…
▽ More
The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chain derived from a directed graph. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics. We use possible degeneracies in the metric to develop an interesting structural theory of directed graphs and explore a related quotienting procedure. Our metric can be computed in $O(n^3)$ time, where $n$ is the number of states, and in examples we scale up to $n=10,000$ nodes and $\approx 38M$ edges on a desktop computer. In several examples, we explore the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs, visualization, structure recovering, dynamics exploration, and multiscale cluster detection.
△ Less
Submitted 18 January, 2021; v1 submitted 25 June, 2020;
originally announced June 2020.
-
A continuum limit for the PageRank algorithm
Authors:
Amber Yuan,
Jeff Calder,
Braxton Osting
Abstract:
Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological, or transportation networks, among many oth…
▽ More
Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological, or transportation networks, among many other applications. In this paper, we propose a new framework for rigorously studying continuum limits of learning algorithms on directed graphs. We use the new framework to study the PageRank algorithm, and show how it can be interpreted as a numerical scheme on a directed graph involving a type of normalized graph Laplacian. We show that the corresponding continuum limit problem, which is taken as the number of webpages grows to infinity, is a second-order, possibly degenerate, elliptic equation that contains reaction, diffusion, and advection terms. We prove that the numerical scheme is consistent and stable and compute explicit rates of convergence of the discrete solution to the solution of the continuum limit PDE. We give applications to proving stability and asymptotic regularity of the PageRank vector. Finally, we illustrate our results with numerical experiments and explore an application to data depth.
△ Less
Submitted 10 January, 2021; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Coarse Quad Layouts Through Robust Simplification of Cross Field Separatrix Partitions
Authors:
Ryan Viertel,
Braxton Osting,
Matthew Staten
Abstract:
Streamline-based quad meshing algorithms use smooth cross fields to partition surfaces into quadrilateral regions by tracing cross field separatrices. In practice, re-entrant corners and misalignment of singularities lead to small regions and limit cycles, negating some of the benefits a quad layout can provide in quad meshing. We introduce three novel methods to improve on a pipeline for coarse q…
▽ More
Streamline-based quad meshing algorithms use smooth cross fields to partition surfaces into quadrilateral regions by tracing cross field separatrices. In practice, re-entrant corners and misalignment of singularities lead to small regions and limit cycles, negating some of the benefits a quad layout can provide in quad meshing. We introduce three novel methods to improve on a pipeline for coarse quad partitioning. First, we formulate an efficient method to compute high-quality cross fields on curved surfaces by extending the diffusion generated method from Viertel and Osting (SISC, 2019). Next, we introduce a method for accurately computing the trajectory of streamlines through singular triangles that prevents tangential crossings. Finally, we introduce a robust method to produce coarse quad layouts by simplifying the partitions obtained via naive separatrix tracing. Our methods are tested on a database of 100 objects and the results are analyzed. The algorithm performs well both in terms of efficiency and visual results on the database when compared to state-of-the-art methods.
△ Less
Submitted 2 August, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
A diffusion generated method for computing Dirichlet partitions
Authors:
Dong Wang,
Braxton Osting
Abstract:
A Dirichlet $k$-partition of a closed $d$-dimensional surface is a collection of $k$ pairwise disjoint open subsets such that the sum of their first Laplace-Beltrami-Dirichlet eigenvalues is minimal. In this paper, we develop a simple and efficient diffusion generated method to compute Dirichlet $k$-partitions for $d$-dimensional flat tori and spheres. For the $2d$ flat torus, for most values of…
▽ More
A Dirichlet $k$-partition of a closed $d$-dimensional surface is a collection of $k$ pairwise disjoint open subsets such that the sum of their first Laplace-Beltrami-Dirichlet eigenvalues is minimal. In this paper, we develop a simple and efficient diffusion generated method to compute Dirichlet $k$-partitions for $d$-dimensional flat tori and spheres. For the $2d$ flat torus, for most values of $k=3$-9,11,12,15,16, and 20, we obtain hexagonal honeycombs. For the $3d$ flat torus and $k=2,4,8,16$, we obtain the rhombic dodecahedral honeycomb, the Weaire-Phelan honeycomb, and Kelvin's tessellation by truncated octahedra. For the $4d$ flat torus, for $k=4$, we obtain a constant extension of the rhombic dodecahedral honeycomb along the fourth direction and for $k=8$, we obtain a 24-cell honeycomb. For the $2d$ sphere, we also compute Dirichlet partitions for $k=3$-7,9,10,12,14,20. Our computational results agree with previous studies when a comparison is available. As far as we are aware, these are the first published results for Dirichlet partitions of the $4d$ flat torus.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
A generalized MBO diffusion generated motion for orthogonal matrix-valued fields
Authors:
Braxton Osting,
Dong Wang
Abstract:
We consider the problem of finding stationary points of the Dirichlet energy for orthogonal matrix-valued fields. Following the Ginzburg-Landau approach, this energy is relaxed by penalizing the matrix-valued field when it does not take orthogonal matrix values. A generalization of the MBO diffusion generated motion is introduced that effectively finds local minimizers of this energy by iterating…
▽ More
We consider the problem of finding stationary points of the Dirichlet energy for orthogonal matrix-valued fields. Following the Ginzburg-Landau approach, this energy is relaxed by penalizing the matrix-valued field when it does not take orthogonal matrix values. A generalization of the MBO diffusion generated motion is introduced that effectively finds local minimizers of this energy by iterating two steps until convergence. In the first step, as in the original method, the current matrix-valued field is evolved by the diffusion equation. In the second step, the field is pointwise reassigned to the closest orthogonal matrix, which can be computed via the singular value decomposition. We extend the Lyapunov function of Esedoglu and Otto to show that the method is non-increasing on iterates and hence, unconditionally stable. We also prove that spatially discretized iterates converge to a stationary solution in a finite number of iterations. The algorithm is implemented using the closest point method and non-uniform fast Fourier transform. We conclude with several numerical experiments on flat tori and closed surfaces, which, unsurprisingly, exhibit classical behavior from the Allen-Cahn and complex Ginzburg Landau equations, but also new phenomena.
△ Less
Submitted 3 November, 2017;
originally announced November 2017.
-
Spectral Sparsification of Simplicial Complexes for Clustering and Label Propagation
Authors:
Braxton Osting,
Sourabh Palande,
Bei Wang
Abstract:
As a generalization of the use of graphs to describe pairwise interactions, simplicial complexes can be used to model higher-order interactions between three or more objects in complex systems. There has been a recent surge in activity for the development of data analysis methods applicable to simplicial complexes, including techniques based on computational topology, higher-order random processes…
▽ More
As a generalization of the use of graphs to describe pairwise interactions, simplicial complexes can be used to model higher-order interactions between three or more objects in complex systems. There has been a recent surge in activity for the development of data analysis methods applicable to simplicial complexes, including techniques based on computational topology, higher-order random processes, generalized Cheeger inequalities, isoperimetric inequalities, and spectral methods. In particular, spectral learning methods (e.g. label propagation and clustering) that directly operate on simplicial complexes represent a new direction for analyzing such complex datasets.
To apply spectral learning methods to massive datasets modeled as simplicial complexes, we develop a method for sparsifying simplicial complexes that preserves the spectrum of the associated Laplacian matrices. We show that the theory of Spielman and Srivastava for the sparsification of graphs extends to simplicial complexes via the up Laplacian. In particular, we introduce a generalized effective resistance for simplices, provide an algorithm for sparsifying simplicial complexes at a fixed dimension, and give a specific version of the generalized Cheeger inequality for weighted simplicial complexes. Finally, we introduce higher-order generalizations of spectral clustering and label propagation for simplicial complexes and demonstrate via experiments the utility of the proposed spectral sparsification method for these applications.
△ Less
Submitted 1 February, 2019; v1 submitted 28 August, 2017;
originally announced August 2017.
-
An Approach to Quad Meshing Based on Harmonic Cross-Valued Maps and the Ginzburg-Landau Theory
Authors:
Ryan Viertel,
Braxton Osting
Abstract:
A generalization of vector fields, referred to as N-direction fields or cross fields when N = 4, has been recently introduced and studied for geometry processing, with applications in quadrilateral (quad) meshing, texture mapping, and parameterization. We make the observation that cross field design for two-dimensional quad meshing is related to the well-known Ginzburg-Landau problem from mathemat…
▽ More
A generalization of vector fields, referred to as N-direction fields or cross fields when N = 4, has been recently introduced and studied for geometry processing, with applications in quadrilateral (quad) meshing, texture mapping, and parameterization. We make the observation that cross field design for two-dimensional quad meshing is related to the well-known Ginzburg-Landau problem from mathematical physics. This yields a variety of theoretical tools for efficiently computing boundary-aligned quad meshes, with provable guarantees on the resulting mesh, such as the number of mesh defects and bounds on the defect locations. The procedure for generating the quad mesh is to (i) find a complex-valued "representation" field that minimizes the Ginzburg-Landau energy subject to a boundary constraint, (ii) convert the representation field into a boundary-aligned, smooth cross field, (iii) use separatrices of the cross field to partition the domain into four sided regions, and (iv) mesh each of these four-sided regions using standard techniques. Leveraging the Ginzburg-Landau theory, we prove that this procedure can be used to produce a cross field whose separatrices partition the domain into four sided regions. To minimize the Ginzburg-Landau energy for the representation field, we use an extension of the Merriman-Bence-Osher (MBO) threshold dynamics method, originally conceived as an algorithm to simulate mean curvature flow. Finally, we demonstrate the method on a variety of test domains.
△ Less
Submitted 7 November, 2018; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs
Authors:
Braxton Osting,
Jiechao Xiong,
Qianqian Xu,
Yuan Yao
Abstract:
Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of th…
▽ More
Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of the graph as a measurement for estimator stability (informativeness), we provide a new estimate of the Fiedler value for these two random graph models. In the asymptotic limit as the number of vertices tends to infinity, we prove the validity of the estimate. Based on our findings, for a small number of items to be compared, we recommend a two-stage sampling strategy where a greedy sampling method is used initially and random sampling \emph{without} replacement is used in the second stage. When a large number of items is to be compared, we recommend random sampling with replacement as this is computationally inexpensive and trivially parallelizable. Experiments on synthetic and real-world datasets support our analysis.
△ Less
Submitted 21 March, 2016; v1 submitted 28 February, 2015;
originally announced March 2015.
-
Minimal Dirichlet energy partitions for graphs
Authors:
Braxton Osting,
Chris D. White,
Edouard Oudet
Abstract:
Motivated by a geometric problem, we introduce a new non-convex graph partitioning objective where the optimality criterion is given by the sum of the Dirichlet eigenvalues of the partition components. A relaxed formulation is identified and a novel rearrangement algorithm is proposed, which we show is strictly decreasing and converges in a finite number of iterations to a local minimum of the rel…
▽ More
Motivated by a geometric problem, we introduce a new non-convex graph partitioning objective where the optimality criterion is given by the sum of the Dirichlet eigenvalues of the partition components. A relaxed formulation is identified and a novel rearrangement algorithm is proposed, which we show is strictly decreasing and converges in a finite number of iterations to a local minimum of the relaxed objective function. Our method is applied to several clustering problems on graphs constructed from synthetic data, MNIST handwritten digits, and manifold discretizations. The model has a semi-supervised extension and provides a natural representative for the clusters as well.
△ Less
Submitted 20 May, 2014; v1 submitted 22 August, 2013;
originally announced August 2013.
-
Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs
Authors:
Braxton Osting,
Christoph Brune,
Stanley J. Osher
Abstract:
Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking proble…
▽ More
Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.
△ Less
Submitted 4 June, 2014; v1 submitted 26 July, 2012;
originally announced July 2012.