Skip to main content

Showing 1–34 of 34 results for author: Lafferty, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15927  [pdf, ps, other

    stat.ML cs.LG

    CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

    Authors: Awni Altabaa, Omar Montasser, John Lafferty

    Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2405.16727  [pdf, ps, other

    cs.LG

    Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

    Authors: Awni Altabaa, John Lafferty

    Abstract: Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of indiv… ▽ More

    Submitted 20 June, 2025; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: ICML 2025

  3. arXiv:2402.08856  [pdf, other

    cs.LG stat.ML

    Approximation of relation functions and attention mechanisms

    Authors: Awni Altabaa, John Lafferty

    Abstract: Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the ca… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 24 pages; added discussion on curse of dimensionality in v2

  4. arXiv:2310.03240  [pdf, other

    cs.LG

    Learning Hierarchical Relational Representations through Relational Convolutions

    Authors: Awni Altabaa, John Lafferty

    Abstract: An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations--that is, higher-order relational patterns among groups of objects. We introduce "relational convolutional networks", a neural architecture equi… ▽ More

    Submitted 26 September, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 31 pages

    Journal ref: Transactions on Machine Learning Research (TMLR), 2024

  5. arXiv:2309.06629  [pdf, other

    cs.AI cs.NE

    The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

    Authors: Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Simon Segert, Kamesh Krishnamurthy, Declan Campbell, Jacob Russin, Tyler Giallanza, Zack Dulberg, Randall O'Reilly, John Lafferty, Jonathan D. Cohen

    Abstract: A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck… ▽ More

    Submitted 1 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  6. arXiv:2304.00195  [pdf, other

    stat.ML cs.LG

    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

    Authors: Awni Altabaa, Taylor Webb, Jonathan Cohen, John Lafferty

    Abstract: An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit rel… ▽ More

    Submitted 12 April, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

    Comments: Published at ICLR 2024

  7. arXiv:2302.10392  [pdf, other

    q-bio.NC cs.LG

    From seeing to remembering: Images with harder-to-reconstruct representations leave stronger memory traces

    Authors: Qi Lin, Zifan Li, John Lafferty, Ilker Yildirim

    Abstract: Much of what we remember is not due to intentional selection, but simply a by-product of perceiving. This raises a foundational question about the architecture of the mind: How does perception interface with and influence memory? Here, inspired by a classic proposal relating perceptual processing to memory durability, the level-of-processing theory, we present a sparse coding model for compressing… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  8. arXiv:2205.13614  [pdf, other

    q-bio.NC cs.LG

    Emergent organization of receptive fields in networks of excitatory and inhibitory neurons

    Authors: Leon Lufkin, Ashish Puri, Ganlin Song, Xinyi Zhong, John Lafferty

    Abstract: Local patterns of excitation and inhibition that can generate neural waves are studied as a computational mechanism underlying the organization of neuronal tunings. Sparse coding algorithms based on networks of excitatory and inhibitory neurons are proposed that exhibit topographic maps as the receptive fields are adapted to input stimuli. Motivated by a leaky integrate-and-fire model of neural wa… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  9. arXiv:2106.06044  [pdf, other

    stat.ML cs.LG

    Convergence and Alignment of Gradient Descent with Random Backpropagation Weights

    Authors: Ganlin Song, Ruitu Xu, John Lafferty

    Abstract: Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure -- updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neura… ▽ More

    Submitted 22 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: 35 pages

  10. arXiv:2006.14781  [pdf, other

    stat.ML cs.LG math.OC

    The huge Package for High-dimensional Undirected Graph Estimation in R

    Authors: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

    Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fort… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: Published on JMLR in 2012

  11. arXiv:1907.08653  [pdf, other

    stat.ML cs.LG

    Surfing: Iterative optimization over incrementally trained deep networks

    Authors: Ganlin Song, Zhou Fan, John Lafferty

    Abstract: We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hatθ}(x) = \frac{1}{2}\|G_{\hatθ}(x) - y\|^2$ for certain families of deep networks $G_θ(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $θ_0$, we show that… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

  12. arXiv:1907.08646  [pdf, other

    math.ST cs.LG stat.ML

    Fair quantile regression

    Authors: Dana Yang, John Lafferty, David Pollard

    Abstract: Quantile regression is a tool for learning conditional distributions. In this paper we study quantile regression in the setting where a protected attribute is unavailable when fitting the model. This can lead to "unfair'' quantile estimators for which the effective quantiles are very different for the subpopulations defined by the protected attribute. We propose a procedure for adjusting the estim… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

  13. arXiv:1902.06034  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

    Authors: Michihiro Yasunaga, John Lafferty

    Abstract: Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture… ▽ More

    Submitted 25 April, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

    Comments: AAAI 2019

  14. arXiv:1805.06439  [pdf, other

    stat.ML cs.LG

    Prediction Rule Reshaping

    Authors: Matt Bonakdarpour, Sabyasachi Chatterjee, Rina Foygel Barber, John Lafferty

    Abstract: Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for comput… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

  15. arXiv:1803.01302  [pdf, other

    stat.ML cs.LG math.ST

    Distributed Nonparametric Regression under Communication Constraints

    Authors: Yuancheng Zhu, John Lafferty

    Abstract: This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to… ▽ More

    Submitted 23 June, 2018; v1 submitted 4 March, 2018; originally announced March 2018.

  16. arXiv:1710.00862  [pdf, other

    stat.ME cs.SI math.ST stat.AP

    Testing for Global Network Structure Using Small Subgraph Statistics

    Authors: Chao Gao, John Lafferty

    Abstract: We study the problem of testing for community structure in networks using relations between the observed frequencies of small subgraphs. We propose a simple test for the existence of communities based only on the frequencies of three-node subgraphs. The test statistic is shown to be asymptotically normal under a null assumption of no community structure, and to have power approaching one under a c… ▽ More

    Submitted 16 October, 2017; v1 submitted 2 October, 2017; originally announced October 2017.

  17. arXiv:1704.06742  [pdf, other

    stat.ME cs.SI math.ST

    Testing Network Structure Using Relations Between Small Subgraph Probabilities

    Authors: Chao Gao, John Lafferty

    Abstract: We study the problem of testing for structure in networks using relations between the observed frequencies of small subgraphs. We consider the statistics \begin{align*} T_3 & =(\text{edge frequency})^3 - \text{triangle frequency}\\ T_2 & =3(\text{edge frequency})^2(1-\text{edge frequency}) - \text{V-shape frequency} \end{align*} and prove a central limit theorem for $(T_2, T_3)$ under an Erdős-Rén… ▽ More

    Submitted 21 April, 2017; originally announced April 2017.

  18. arXiv:1605.07051  [pdf, other

    stat.ML cs.LG

    Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent

    Authors: Qinqing Zheng, John Lafferty

    Abstract: We address the rectangular matrix completion problem by lifting the unknown matrix to a positive semidefinite matrix in higher dimension, and optimizing a nonconvex objective over the semidefinite factor using a simple gradient descent scheme. With $O( μr^2 κ^2 n \max(μ, \log n))$ random observations of a $n_1 \times n_2$ $μ$-incoherent matrix of rank $r$ and condition number $κ$, where… ▽ More

    Submitted 21 November, 2016; v1 submitted 23 May, 2016; originally announced May 2016.

  19. arXiv:1506.06081  [pdf, other

    stat.ML cs.LG

    A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements

    Authors: Qinqing Zheng, John Lafferty

    Abstract: We propose a simple, scalable, and fast gradient descent algorithm to optimize a nonconvex objective for the rank minimization problem and a closely related family of semidefinite programs. With $O(r^3 κ^2 n \log n)$ random measurements of a positive semidefinite $n \times n$ matrix of rank $r$ and condition number $κ$, our method is guaranteed to converge linearly to the global optimum.

    Submitted 24 March, 2016; v1 submitted 19 June, 2015; originally announced June 2015.

    Comments: Fix a minor error in Appendix E

  20. arXiv:1301.2286  [pdf

    cs.LG stat.ML

    Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

    Authors: John Lafferty, Larry A. Wasserman

    Abstract: We present an iterative Markov chainMonte Carlo algorithm for computingreference priors and minimax risk forgeneral parametric families. Ourapproach uses MCMC techniques based onthe Blahut-Arimoto algorithm forcomputing channel capacity ininformation theory. We give astatistical analysis of the algorithm,bounding the number of samples requiredfor the stochastic algorithm to closelyapproximate th… ▽ More

    Submitted 10 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

    Report number: UAI-P-2001-PG-293-300

  21. arXiv:1301.0588  [pdf

    cs.LG cs.IR stat.ML

    Expectation-Propogation for the Generative Aspect Model

    Authors: Thomas P. Minka, John Lafferty

    Abstract: The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al (2001) can lead to inaccurate in… ▽ More

    Submitted 12 December, 2012; originally announced January 2013.

    Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

    Report number: UAI-P-2002-PG-352-359

  22. arXiv:1207.4172  [pdf

    cs.LG stat.ML

    Variational Chernoff Bounds for Graphical Models

    Authors: Pradeep Ravikumar, John Lafferty

    Abstract: Recent research has made significant progress on the problem of bounding log partition functions for exponential family graphical models. Such bounds have associated dual parameters that are often used as heuristic estimates of the marginal probabilities required in inference and learning. However these variational estimates do not give rigorous bounds on marginal probabilities, nor do they give e… ▽ More

    Submitted 11 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

    Report number: UAI-P-2004-PG-462-469

  23. arXiv:1206.6488  [pdf

    stat.ME cs.LG stat.ML

    The Nonparanormal SKEPTIC

    Authors: Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry Wasserman

    Abstract: We propose a semiparametric approach, named nonparanormal skeptic, for estimating high dimensional undirected graphical models. In terms of modeling, we consider the nonparanormal family proposed by Liu et al (2009). In terms of estimation, we exploit nonparametric rank-based correlation coefficient estimators including the Spearman's rho and Kendall's tau. In high dimensional settings, we prove t… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  24. arXiv:1206.6450  [pdf

    cs.LG stat.ML

    Conditional Sparse Coding and Grouped Multivariate Regression

    Authors: Min Xu, John Lafferty

    Abstract: We study the problem of multivariate regression where the data are naturally grouped, and a regression matrix is to be estimated for each group. We propose an approach in which a dictionary of low rank parameter matrices is estimated across groups, and a sparse linear combination of the dictionary elements is estimated to form a model within each group. We refer to the method as conditional sparse… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  25. arXiv:1206.6408  [pdf

    stat.ME astro-ph.IM cs.LG

    Sequential Nonparametric Regression

    Authors: Haijie Gu, John Lafferty

    Abstract: We present algorithms for nonparametric regression in settings where the data are obtained sequentially. While traditional estimators select bandwidths that depend upon the sample size, for sequential data the effective sample size is dynamically changing. We propose a linear time algorithm that adjusts the bandwidth for each new data point, and show that the estimator achieves the optimal minimax… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  26. arXiv:1206.4669  [pdf

    cs.LG stat.ML

    Sparse Additive Functional and Kernel CCA

    Authors: Sivaraman Balakrishnan, Kriti Puniyani, John Lafferty

    Abstract: Canonical Correlation Analysis (CCA) is a classical tool for finding correlations among the components of two random vectors. In recent years, CCA has been widely applied to the analysis of genomic data, where it is common for researchers to perform multiple assays on a single set of patient samples. Recent work has proposed sparse variants of CCA to address the high dimensionality of such data. H… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  27. arXiv:1201.0794  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Sparse Nonparametric Graphical Models

    Authors: John Lafferty, Han Liu, Larry Wasserman

    Abstract: We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributio… ▽ More

    Submitted 7 January, 2013; v1 submitted 3 January, 2012; originally announced January 2012.

    Comments: Published in at http://dx.doi.org/10.1214/12-STS391 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS391

    Journal ref: Statistical Science 2012, Vol. 27, No. 4, 519-537

  28. arXiv:0706.0534  [pdf, ps, other

    stat.ML cs.IT

    Compressed Regression

    Authors: Shuheng Zhou, John Lafferty, Larry Wasserman

    Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. This line of work shows that $\ell_1$-regularized least squares regression can accurately estimate a sparse linear model from $n$ noisy examples in $p$ dimensions, even if $p$ is much larger than $n$. In this pap… ▽ More

    Submitted 11 January, 2008; v1 submitted 4 June, 2007; originally announced June 2007.

    Comments: 59 pages, 5 figure, Submitted for review

    Journal ref: IEEE Transactions on Information Theory, Volume 55, No.2, pp 846--866, 2009

  29. A Model of Lexical Attraction and Repulsion

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as co… ▽ More

    Submitted 16 June, 1997; v1 submitted 12 June, 1997; originally announced June 1997.

    Comments: 8 pages, LaTeX source and postscript figures for ACL/EACL'97 paper

  30. Text Segmentation Using Exponential Models

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large co… ▽ More

    Submitted 12 June, 1997; v1 submitted 11 June, 1997; originally announced June 1997.

    Comments: 12 pages, LaTeX source and postscript figures for EMNLP-2 paper

  31. arXiv:cmp-lg/9509003  [pdf, ps

    cs.CL

    Cluster Expansions and Iterative Scaling for Maximum Entropy Language Models

    Authors: John D. Lafferty, Bernhard Suhm

    Abstract: The maximum entropy method has recently been successfully introduced to a variety of natural language applications. In each of these applications, however, the power of the maximum entropy method is achieved at the cost of a considerable increase in computational requirements. In this paper we present a technique, closely related to the classical cluster expansion from statistical mechanics, for… ▽ More

    Submitted 9 September, 1995; originally announced September 1995.

    Comments: 8 pages, uuencoded and compressed postscript

  32. arXiv:cmp-lg/9508003  [pdf, ps

    cs.CL

    A Robust Parsing Algorithm For Link Grammars

    Authors: Dennis Grinberg, John Lafferty, Daniel Sleator

    Abstract: In this paper we present a robust parsing algorithm based on the link grammar formalism for parsing natural languages. Our algorithm is a natural extension of the original dynamic programming recognition algorithm which recursively counts the number of linkages between two words in the input sentence. The modified algorithm uses the notion of a null link in order to allow a connection between an… ▽ More

    Submitted 2 August, 1995; originally announced August 1995.

    Comments: 17 pages, compressed postscript

    Report number: CMU-CS-TR-95-125

  33. arXiv:cmp-lg/9506014  [pdf, ps

    cs.CL

    Inducing Features of Random Fields

    Authors: S. Della Pietra, V. Della Pietra, J. Lafferty

    Abstract: We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data… ▽ More

    Submitted 13 June, 1995; originally announced June 1995.

    Comments: 34 pages, compressed postscript

    Report number: CMU-CS-95-144

  34. Towards History-based Grammars: Using Richer Models for Probabilistic Parsing

    Authors: Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, Salim Roukos

    Abstract: We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree… ▽ More

    Submitted 3 May, 1994; originally announced May 1994.

    Comments: 6 pages

    Journal ref: Proceedings, DARPA Speech and Natural Language Workshop, 1992