Skip to main content

Showing 1–30 of 30 results for author: Kolda, T G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.05677  [pdf, other

    math.NA cs.LG

    Tensor Decomposition Meets RKHS: Efficient Algorithms for Smooth and Misaligned Data

    Authors: Brett W. Larsen, Tamara G. Kolda, Anru R. Zhang, Alex H. Williams

    Abstract: The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  2. arXiv:2305.06927  [pdf, other

    cs.LG math.OC stat.ML

    Convergence of Alternating Gradient Descent for Matrix Factorization

    Authors: Rachel Ward, Tamara G. Kolda

    Abstract: We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{σ_1(\mathbf{A})}{σ_r(\mathbf{A})})^2 \log(1/ε)$ iterations of alternating gradient descent suffice to reach an $ε$-optimal factorization… ▽ More

    Submitted 7 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

  3. arXiv:2202.06930  [pdf, other

    stat.ML cs.LG math.NA

    Tensor Moments of Gaussian Mixture Models: Theory and Applications

    Authors: João M. Pereira, Joe Kileel, Tamara G. Kolda

    Abstract: Gaussian mixture models (GMMs) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical… ▽ More

    Submitted 21 March, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

  4. arXiv:2201.10638  [pdf, ps, other

    math.NA cs.DS

    Sketching Matrix Least Squares via Leverage Scores Estimates

    Authors: Brett W. Larsen, Tamara G. Kolda

    Abstract: We consider the matrix least squares problem of the form $\| \mathbf{A} \mathbf{X}-\mathbf{B} \|_F^2$ where the design matrix $\mathbf{A} \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We propose to create a sketched version $\| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ where the sketched matrices $\tilde{\mathbf{A}}$ and $\tilde{\mathbf{B}}$ contain weighted subsets… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: This is detailed and standalone derivation of a result that already appears in (arXiv:2006.16438, Appendix A). arXiv admin note: substantial text overlap with arXiv:2006.16438

  5. arXiv:2110.14514  [pdf, other

    math.NA cs.LG cs.MS

    Streaming Generalized Canonical Polyadic Tensor Decompositions

    Authors: Eric Phipps, Nick Johnson, Tamara G. Kolda

    Abstract: In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observ… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

  6. arXiv:2104.11079  [pdf, other

    cs.AI cs.CE

    Randomized Algorithms for Scientific Computing (RASC)

    Authors: Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart

    Abstract: Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc… ▽ More

    Submitted 21 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

  7. arXiv:1909.04801  [pdf, ps, other

    cs.IT math.NA math.PR

    Faster Johnson-Lindenstrauss Transforms via Kronecker Products

    Authors: Ruhui Jin, Tamara G. Kolda, Rachel Ward

    Abstract: The Kronecker product is an important matrix operation with a wide range of applications in supporting fast linear transforms, including signal processing, graph theory, quantum computing and deep learning. In this work, we introduce a generalization of the fast Johnson-Lindenstrauss projection for embedding vectors with Kronecker product structure, the Kronecker fast Johnson-Lindenstrauss transfo… ▽ More

    Submitted 30 July, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: Information and Inference: A Journal of the IMA, 2020

  8. arXiv:1906.01687  [pdf, other

    math.NA cs.LG stat.ML

    Stochastic Gradients for Large-Scale Tensor Decomposition

    Authors: Tamara G. Kolda, David Hong

    Abstract: Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust… ▽ More

    Submitted 7 July, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Journal ref: SIAM Journal on Mathematics of Data Science, Vol. 2, No. 4, pp. 1066-1095, 2020

  9. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

    Authors: Grey Ballard, Alicia Klinvex, Tamara G. Kolda

    Abstract: Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its tru… ▽ More

    Submitted 21 August, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

    Journal ref: ACM Transactions on Mathematical Software, Vol. 46, No. 2, Article 13, June 2020

  10. Software for Sparse Tensor Decomposition on Emerging Computing Architectures

    Authors: Eric Phipps, Tamara G. Kolda

    Abstract: In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to differen… ▽ More

    Submitted 21 January, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

    Journal ref: SIAM Journal on Scientific Computing, Vol. 41, No. 3, pp. C269-C290, 22 pages, 2019

  11. arXiv:1808.07510  [pdf, other

    stat.ML cs.LG

    XPCA: Extending PCA for a Combination of Discrete and Continuous Variables

    Authors: Clifford Anderson-Bergman, Tamara G. Kolda, Kina Kincher-Winoto

    Abstract: Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal. In the probabilistic interpretation of low-rank PCA, the data has a normal multivariate distribution and, therefore, normal marginal distributions for each colum… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

  12. arXiv:1808.07452  [pdf, other

    math.NA cs.LG

    Generalized Canonical Polyadic Tensor Decomposition

    Authors: David Hong, Tamara G. Kolda, Jed A. Duersch

    Abstract: Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) low-rank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or Kullback-Leibler divergence, enabling tensor deco… ▽ More

    Submitted 21 January, 2019; v1 submitted 22 August, 2018; originally announced August 2018.

    Journal ref: SIAM Review, Vol. 62, No. 1, pp. 133-163, 2020

  13. arXiv:1607.08673  [pdf, other

    cs.SI physics.soc-ph

    Measuring and Modeling Bipartite Graphs with Community Structure

    Authors: Sinan Aksoy, Tamara G. Kolda, Ali Pinar

    Abstract: Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of large-scale bipartite graphs, also known as two-way graphs or two-mode networks. We propose two generative models that can be easily tuned to reproduce the characteristics of real-world networks, not just qualitatively, but quant… ▽ More

    Submitted 29 October, 2016; v1 submitted 28 July, 2016; originally announced July 2016.

    Journal ref: Journal of Complex Networks, Vol. 5, No. 4, pp. 581-603, 2017

  14. Parallel Tensor Compression for Large-Scale Scientific Data

    Authors: Woody Austin, Grey Ballard, Tamara G. Kolda

    Abstract: As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can comput… ▽ More

    Submitted 23 February, 2016; v1 submitted 22 October, 2015; originally announced October 2015.

    Journal ref: IPDPS'16: Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium, pp. 912-922, May 2016

  15. Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

    Authors: Grey Ballard, Ali Pinar, Tamara G. Kolda, C. Seshadhri

    Abstract: Given two sets of vectors, $A = \{{a_1}, \dots, {a_m}\}$ and $B=\{{b_1},\dots,{b_n}\}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}\cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach t… ▽ More

    Submitted 18 June, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

    Journal ref: ICDM 2015: Proceedings of the 2015 IEEE International Conference on Data Mining, pp. 11-20, November 2015

  16. arXiv:1404.5874  [pdf, ps, other

    cs.SI physics.soc-ph

    Using Triangles to Improve Community Detection in Directed Networks

    Authors: Christine Klymko, David Gleich, Tamara G. Kolda

    Abstract: In a graph, a community may be loosely defined as a group of nodes that are more closely connected to one another than to the rest of the graph. While there are a variety of metrics that can be used to specify the quality of a given community, one common theme is that flows tend to stay within communities. Hence, we expect cycles to play an important role in community detection. For undirected gra… ▽ More

    Submitted 23 April, 2014; originally announced April 2014.

    Comments: 10 pages, 3 figures

  17. arXiv:1403.2226  [pdf, ps, other

    physics.soc-ph cs.SI

    Accelerating Community Detection by Using K-core Subgraphs

    Authors: Chengbin Peng, Tamara G. Kolda, Ali Pinar

    Abstract: Community detection is expensive, and the cost generally depends at least linearly on the number of vertices in the graph. We propose working with a reduced graph that has many fewer nodes but nonetheless captures key community structure. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection by a… ▽ More

    Submitted 13 October, 2014; v1 submitted 10 March, 2014; originally announced March 2014.

    Comments: 15 pages, 8 figures

  18. Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs

    Authors: C. Seshadhri, Ali Pinar, Tamara G. Kolda

    Abstract: Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately-sized graphs with only millions of edges. Previous work has considered node and ed… ▽ More

    Submitted 14 January, 2014; v1 submitted 12 September, 2013; originally announced September 2013.

    Comments: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power of Wedge Sampling" (arxiv:1202.5230)

    Journal ref: Statistical Analysis and Data Mining, Vol. 7, No. 4, pp. 294-307, August 2014

  19. arXiv:1303.6385  [pdf, other

    cs.SI physics.soc-ph

    Dynamics of Trust Reciprocation in Heterogenous MMOG Networks

    Authors: Ayush Singhal, Karthik Subbian, Jaideep Srivastava, Tamara G. Kolda, Ali Pinar

    Abstract: Understanding the dynamics of reciprocation is of great interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided unprecedented access to large-scale data which enables us to study such complex human behavior in a more systematic manner. In this paper, we consider three different networks in the EverQuest2 game: chat, trad… ▽ More

    Submitted 18 April, 2013; v1 submitted 26 March, 2013; originally announced March 2013.

  20. arXiv:1302.6636  [pdf, other

    cs.SI physics.soc-ph

    A Scalable Generative Graph Model with Community Structure

    Authors: Tamara G. Kolda, Ali Pinar, Todd Plantenga, C. Seshadhri

    Abstract: Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed Block Two-Level Erdss-Renyi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as socia… ▽ More

    Submitted 9 December, 2013; v1 submitted 26 February, 2013; originally announced February 2013.

    Journal ref: SIAM Journal on Scientific Computing, Vol. 36, No. 5, pp. C424-C452, September 2014

  21. arXiv:1302.6220  [pdf, ps, other

    cs.SI cs.DS physics.soc-ph

    Directed closure measures for networks with reciprocity

    Authors: C. Seshadhri, Ali Pinar, Nurcan Durak, Tamara G. Kolda

    Abstract: The study of triangles in graphs is a standard tool in network analysis, leading to measures such as the \emph{transitivity}, i.e., the fraction of paths of length $2$ that participate in triangles. Real-world networks are often directed, and it can be difficult to "measure" this network structure meaningfully. We propose a collection of \emph{directed closure values} for measuring triangles in di… ▽ More

    Submitted 23 April, 2014; v1 submitted 25 February, 2013; originally announced February 2013.

    Comments: Updated version; new results on expected directed closures for reciprocal configuration model

  22. arXiv:1301.7744  [pdf, ps, other

    math.NA cs.MS

    Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors

    Authors: Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, Tamara G. Kolda

    Abstract: Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric te… ▽ More

    Submitted 9 April, 2014; v1 submitted 31 January, 2013; originally announced January 2013.

    MSC Class: 15-02 (Primary)

    Journal ref: SIAM Journal on Scientific Computing, Vol. 36, No. 5, pp. C453-C479, September 2014

  23. Counting Triangles in Massive Graphs with MapReduce

    Authors: Tamara G. Kolda, Ali Pinar, Todd Plantenga, C. Seshadhri, Christine Task

    Abstract: Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle-based and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood… ▽ More

    Submitted 9 December, 2013; v1 submitted 24 January, 2013; originally announced January 2013.

    Journal ref: SIAM Journal on Scientific Computing, Vol. 36, No. 5, pp. S44-S77, October 2014

  24. arXiv:1210.5288  [pdf, other

    cs.SI physics.soc-ph

    A Scalable Null Model for Directed Graphs Matching All Degree Distributions: In, Out, and Reciprocal

    Authors: Nurcan Durak, Tamara G. Kolda, Ali Pinar, C. Seshadhri

    Abstract: Degree distributions are arguably the most important property of real world networks. The classic edge configuration model or Chung-Lu model can generate an undirected graph with any desired degree distribution. This serves as a good null model to compare algorithms or perform experimental studies. Furthermore, there are scalable algorithms that implement these models and they are invaluable in th… ▽ More

    Submitted 25 April, 2013; v1 submitted 18 October, 2012; originally announced October 2012.

    Comments: Camera ready version for IEEE Workshop on Network Science; fixed some typos in table

    Journal ref: Proceedings of IEEE 2013 2nd International Network Science Workshop (NSW 2013), pp. 22--30

  25. arXiv:1207.7125  [pdf, other

    cs.SI physics.soc-ph

    Degree Relations of Triangles in Real-world Networks and Models

    Authors: Nurcan Durak, Ali Pinar, Tamara G. Kolda, C. Seshadhri

    Abstract: Triangles are an important building block and distinguishing feature of real-world networks, but their structure is still poorly understood. Despite numerous reports on the abundance of triangles, there is very little information on what these triangles look like. We initiate the study of degree-labeled triangles -- specifically, degree homogeneity versus heterogeneity in triangles. This yields ne… ▽ More

    Submitted 30 July, 2012; originally announced July 2012.

    Journal ref: CIKM '12: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, ACM, pp. 1712-1716, 2012

  26. Triadic Measures on Graphs: The Power of Wedge Sampling

    Authors: C. Seshadhri, Ali Pinar, Tamara G. Kolda

    Abstract: Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This ver… ▽ More

    Submitted 18 October, 2012; v1 submitted 23 February, 2012; originally announced February 2012.

    Journal ref: SDM13: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 10-18, May 2013

  27. arXiv:1112.3644  [pdf, other

    cs.SI physics.soc-ph

    Community structure and scale-free collections of Erdös-Rényi graphs

    Authors: C. Seshadhri, Tamara G. Kolda, Ali Pinar

    Abstract: Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such community must contain a dense Erdös-Rényi (ER) subgra… ▽ More

    Submitted 15 December, 2011; originally announced December 2011.

    Journal ref: Physical Review E 85(5):056109, 2012

  28. arXiv:1110.4925  [pdf, other

    cs.SI

    The Similarity between Stochastic Kronecker and Chung-Lu Graph Models

    Authors: Ali Pinar, C. Seshadhri, Tamara G. Kolda

    Abstract: The analysis of massive graphs is now becoming a very important part of science and industrial research. This has led to the construction of a large variety of graph models, each with their own advantages. The Stochastic Kronecker Graph (SKG) model has been chosen by the Graph500 steering committee to create supercomputer benchmarks for graph algorithms. The major reasons for this are its easy par… ▽ More

    Submitted 26 October, 2011; v1 submitted 21 October, 2011; originally announced October 2011.

    Journal ref: SDM12: Proceedings of the Twelfth SIAM International Conference on Data Mining, pp. 1071-1082, April 2012

  29. arXiv:1103.2068  [pdf, other

    cs.LG cs.DC stat.ML

    COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

    Authors: Justin D. Basilico, M. Arthur Munson, Tamara G. Kolda, Kevin R. Dixon, W. Philip Kegelmeyer

    Abstract: COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset… ▽ More

    Submitted 8 September, 2011; v1 submitted 10 March, 2011; originally announced March 2011.

    ACM Class: I.5; I.2.6; H.2.8

    Journal ref: ICDM 2011: Proceedings of the 2011 IEEE International Conference on Data Mining, pp. 41-50, 2011

  30. arXiv:1102.5046  [pdf, other

    cs.SI cs.DM physics.soc-ph

    An In-Depth Analysis of Stochastic Kronecker Graphs

    Authors: C. Seshadhri, Ali Pinar, Tamara G. Kolda

    Abstract: Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this paper, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer benchmark due to its favorable properties and easy p… ▽ More

    Submitted 2 January, 2013; v1 submitted 24 February, 2011; originally announced February 2011.

    Journal ref: Journal of the ACM 60(2):13 (32 pages), April 2013