Skip to main content

Showing 1–15 of 15 results for author: Bélanger, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  2. arXiv:2006.03555  [pdf, other

    cs.LG cs.CL stat.ML

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequen… ▽ More

    Submitted 30 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This arXiv submission has been deprecated. Please see "Rethinking Attention with Performers" at arXiv:2009.14794 for the most updated version of the paper

  3. Population-Based Black-Box Optimization for Biological Sequence Design

    Authors: Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley

    Abstract: The use of black-box optimization for the design of new biological sequences is an emerging research area with potentially revolutionary impact. The cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds of large batches of sequences--a setting that off-the-shelf black-box optimization methods are ill-equipped to handle. We find that the perfor… ▽ More

    Submitted 10 July, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  4. arXiv:1811.08545  [pdf, other

    physics.chem-ph stat.ML

    Rapid Prediction of Electron-Ionization Mass Spectrometry using Neural Networks

    Authors: Jennifer N. Wei, David Belanger, Ryan P. Adams, D. Sculley

    Abstract: When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously-collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library's coverage by augmenting it with synt… ▽ More

    Submitted 17 March, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: 12 pages, 5 figures

    Journal ref: ACS Cent. Sci. 2019 5 (4) 700-708

  5. A new look at weather-related health impacts through functional regression

    Authors: Pierre Masselot, Fateh Chebana, Taha B. M. J. Ouarda, Diane Bélanger, André St-Hilaire, Pierre Gosselin

    Abstract: A major challenge of climate change adaptation is to assess the effect of changing weather on human health. In spite of an increasing literature on the weather-related health subject, many aspect of the relationship are not known, limiting the predictive power of epidemiologic models. The present paper proposes new models to improve the performances of the currently used ones. The proposed models… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Journal ref: Masselot, P., Chebana, F., Ouarda, T.B.M.J., Bélanger, D., St-Hilaire, A., Gosselin, P., 2018. A new look at weather-related health impacts through functional regression. Scientific Reports 8, 15241

  6. arXiv:1802.08665  [pdf, other

    stat.ML cs.LG

    Learning Latent Permutations with Gumbel-Sinkhorn Networks

    Authors: Gonzalo Mena, David Belanger, Scott Linderman, Jasper Snoek

    Abstract: Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, and sort data. Learning in such models is difficult, however, because exact marginalization over these combinatorial objects is intractable. In response, this paper introduces a collection of new methods for end-to-end learning in such models that approximate discret… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Journal ref: ICLR 2018

  7. Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality

    Authors: Pierre Masselot, Fateh Chebana, Diane Bélanger, André St-Hilaire, Belkacem Abdous, Pierre Gosselin, Taha B. M. J. Ouarda

    Abstract: In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated seri… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

  8. EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

    Authors: Pierre Masselot, Fateh Chebana, Diane Bélanger, André St-Hilaire, Belkacem Abdous, Pierre Gosselin, Taha B. M. J. Ouarda

    Abstract: In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Journal ref: Science of The Total Environment 612 (2018): 1018-1029

  9. arXiv:1703.05667  [pdf, other

    stat.ML cs.LG

    End-to-End Learning for Structured Prediction Energy Networks

    Authors: David Belanger, Bishan Yang, Andrew McCallum

    Abstract: Structured Prediction Energy Networks (SPENs) are a simple, yet expressive family of structured prediction models (Belanger and McCallum, 2016). An energy function over candidate structured outputs is given by a deep network, and predictions are formed by gradient-based optimization. This paper presents end-to-end learning for SPENs, where the energy function is discriminatively trained by back-pr… ▽ More

    Submitted 15 July, 2017; v1 submitted 16 March, 2017; originally announced March 2017.

    Comments: ICML 2017

  10. arXiv:1701.04851  [pdf, other

    cs.CV stat.ML

    Synthesizing Normalized Faces from Facial Identity Features

    Authors: Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

    Abstract: We present a method for synthesizing a frontal, neutral-expression image of a person's face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance,… ▽ More

    Submitted 17 October, 2017; v1 submitted 17 January, 2017; originally announced January 2017.

  11. arXiv:1609.02116  [pdf, other

    stat.ML cs.CL cs.LG

    Ask the GRU: Multi-Task Learning for Deep Text Recommendations

    Authors: Trapit Bansal, David Belanger, Andrew McCallum

    Abstract: In a variety of application domains the content to be recommended to users is associated with text. This includes research papers, movies with associated plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can be extended naturally to leverage text by employing an explicit mapping from text to factors. This enables recommendations for new, unseen… ▽ More

    Submitted 9 September, 2016; v1 submitted 7 September, 2016; originally announced September 2016.

    Comments: 8 pages

    ACM Class: I.2.7; I.2.6

  12. arXiv:1511.06350  [pdf, other

    cs.LG stat.ML

    Structured Prediction Energy Networks

    Authors: David Belanger, Andrew McCallum

    Abstract: We introduce structured prediction energy networks (SPENs), a flexible framework for structured prediction. A deep architecture is used to define an energy function of candidate labels, and then predictions are produced by using back-propagation to iteratively optimize the energy with respect to the labels. This deep architecture captures dependencies between labels that would lead to intractable… ▽ More

    Submitted 23 June, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICML 2016

  13. arXiv:1503.01397  [pdf, other

    stat.ML cs.CL cs.LG

    Bethe Projections for Non-Local Inference

    Authors: Luke Vilnis, David Belanger, Daniel Sheldon, Andrew McCallum

    Abstract: Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as po… ▽ More

    Submitted 28 November, 2016; v1 submitted 4 March, 2015; originally announced March 2015.

    Comments: minor bug fix to appendix. appeared in UAI 2015

  14. arXiv:1503.01228  [pdf, other

    cs.LG cs.CV stat.ML

    Bethe Learning of Conditional Random Fields via MAP Decoding

    Authors: Kui Tang, Nicholas Ruozzi, David Belanger, Tony Jebara

    Abstract: Many machine learning tasks can be formulated in terms of predicting structured outputs. In frameworks such as the structured support vector machine (SVM-Struct) and the structured perceptron, discriminative functions are learned by iteratively applying efficient maximum a posteriori (MAP) decoding. However, maximum likelihood estimation (MLE) of probabilistic models over these same structured spa… ▽ More

    Submitted 4 March, 2015; originally announced March 2015.

    Comments: 19 pages (9 supplementary), 10 figures (3 supplementary)

  15. arXiv:1502.04081  [pdf, other

    stat.ML cs.CL cs.LG

    A Linear Dynamical System Model for Text

    Authors: David Belanger, Sham Kakade

    Abstract: Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference… ▽ More

    Submitted 31 May, 2015; v1 submitted 13 February, 2015; originally announced February 2015.

    Comments: Accepted at International Conference of Machine Learning 2015