Skip to main content

Showing 1–6 of 6 results for author: Monath, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2104.07061  [pdf, other

    cs.LG cs.DS physics.data-an stat.ML

    Exact and Approximate Hierarchical Clustering Using A*

    Authors: Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer, Andrew McCallum

    Abstract: Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 30 pages, 9 figures

  2. arXiv:2002.11661  [pdf, other

    cs.DS cs.LG physics.data-an stat.ML

    Data Structures & Algorithms for Exact Inference in Hierarchical Clustering

    Authors: Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Ji-Ah Lee, Patrick Flaherty, Kyle Cranmer, Andrew McGregor, Andrew McCallum

    Abstract: Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. Typically approximate algorithms are used for inference due to the combinatorial number of possible hierarchical clusterings. In contrast to existing methods, we present novel… ▽ More

    Submitted 22 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: 27 pages, 12 figures

  3. arXiv:2001.00076  [pdf, other

    cs.LG cs.DS stat.ML

    Scalable Hierarchical Clustering with Tree Grafting

    Authors: Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael Glass, Andrew McCallum

    Abstract: We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notio… ▽ More

    Submitted 31 December, 2019; originally announced January 2020.

    Comments: 23 pages (appendix included), published at KDD 2019

  4. arXiv:1907.10165  [pdf, other

    cs.LG cs.CL stat.ML

    Optimal Transport-based Alignment of Learned Character Representations for String Similarity

    Authors: Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum

    Abstract: String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE --a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. W… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: ACL Long Paper

  5. arXiv:1906.07859  [pdf, other

    cs.LG stat.ML

    Supervised Hierarchical Clustering with Exponential Linkage

    Authors: Nishant Yadav, Ari Kobren, Nicholas Monath, Andrew McCallum

    Abstract: In supervised clustering, standard techniques for learning a pairwise dissimilarity function often suffer from a discrepancy between the training and clustering objectives, leading to poor cluster quality. Rectifying this discrepancy necessitates matching the procedure for training the dissimilarity function to the clustering algorithm. In this paper, we introduce a method for training the dissimi… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Appears in ICML 2019

  6. arXiv:1704.01858  [pdf, other

    cs.LG stat.ML

    An Online Hierarchical Algorithm for Extreme Clustering

    Authors: Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, Andrew McCallum

    Abstract: Many modern clustering methods scale well to a large number of data items, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy algorithm for online hierarchical clustering that scales to both massive N and K--a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated b… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

    Comments: 20 pages. Code available here: https://github.com/iesl/xcluster