Skip to main content

Showing 1–21 of 21 results for author: De Cao, N

.
  1. arXiv:2404.03381  [pdf, other

    cs.CL

    Learning to Plan and Generate Text with Citations

    Authors: Constanza Fierro, Reinald Kim Amplayo, Fantine Huot, Nicola De Cao, Joshua Maynez, Shashi Narayan, Mirella Lapata

    Abstract: The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptua… ▽ More

    Submitted 23 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at ACL 2024

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2304.00472  [pdf, other

    cs.DB cs.AI

    Querying Large Language Models with SQL

    Authors: Mohammed Saeed, Nicola De Cao, Paolo Papotti

    Abstract: In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we env… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted for presentation at EDBT 2024 as Vision paper

  5. arXiv:2112.08340  [pdf, other

    cs.CL cs.LG stat.ML

    GenIE: Generative Information Extraction

    Authors: Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West

    Abstract: Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealis… ▽ More

    Submitted 13 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted at NAACL 2022

  6. arXiv:2112.06837  [pdf, other

    cs.CL cs.LG

    Sparse Interventions in Language Models with Differentiable Masking

    Authors: Nicola De Cao, Leon Schmid, Dieuwke Hupkes, Ivan Titov

    Abstract: There has been a lot of interest in understanding what information is captured by hidden representations of language models (LMs). Typically, interpretation methods i) do not guarantee that the model actually uses the encoded information, and ii) do not discover small subsets of neurons responsible for a considered phenomenon. Inspired by causal mediation analysis, we propose a method that discove… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 12 pages, 4 figures, 6 tables

  7. arXiv:2109.03792  [pdf, other

    cs.CL cs.AI stat.ML

    Highly Parallel Autoregressive Entity Linking with Discriminative Correction

    Authors: Nicola De Cao, Wilker Aziz, Ivan Titov

    Abstract: Generative approaches have been recently shown to be effective for both Entity Disambiguation and Entity Linking (i.e., joint mention detection and disambiguation). However, the previously proposed autoregressive formulation for EL suffers from i) high computational cost due to a complex (deep) decoder, ii) non-parallelizable decoding that scales with the source sequence length, and iii) the need… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP2021 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Code at https://github.com/nicola-decao/efficient-autoregressive-EL . 8 pages, 1 figure, 3 tables

  8. arXiv:2104.08164  [pdf, other

    cs.CL cs.AI cs.LG

    Editing Factual Knowledge in Language Models

    Authors: Nicola De Cao, Wilker Aziz, Ivan Titov

    Abstract: The factual knowledge acquired during pre-training and stored in the parameters of Language Models (LMs) can be useful in downstream tasks (e.g., question answering or textual inference). However, some facts can be incorrectly induced or become obsolete over time. We present KnowledgeEditor, a method which can be used to edit this knowledge and, thus, fix 'bugs' or unexpected predictions without t… ▽ More

    Submitted 8 September, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted at EMNLP2021 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Code at https://github.com/nicola-decao/KnowledgeEditor . 16 pages, 6 figures, 2 tables

  9. arXiv:2103.12528  [pdf, other

    cs.CL cs.AI stat.ML

    Multilingual Autoregressive Entity Linking

    Authors: Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

    Abstract: We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 20 pages, 8 figures, and 11 tables

  10. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  11. arXiv:2012.15156  [pdf, other

    cs.CL

    A Memory Efficient Baseline for Open Domain Question Answering

    Authors: Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave

    Abstract: Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  12. arXiv:2010.00904  [pdf, other

    cs.CL cs.IR cs.LG stat.ML

    Autoregressive Entity Retrieval

    Authors: Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni

    Abstract: Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per Wikipedia article). The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering. Current approaches can be understood as classifiers among atomic… ▽ More

    Submitted 24 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted (spotlight) at International Conference on Learning Representations (ICLR) 2021. Code at https://github.com/facebookresearch/GENRE. 20 pages, 9 figures, 8 tables

  13. arXiv:2010.00577  [pdf, other

    cs.CL cs.LG stat.ML

    Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

    Authors: Michael Sejr Schlichtkrull, Nicola De Cao, Ivan Titov

    Abstract: Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models. However, there has been little work on interpreting them, and specifically on understanding which parts of the graphs (e.g. syntactic trees or co-reference structures) contribute to a prediction. In this work, we introduce a post-hoc method for interpreting the predictions of GNN… ▽ More

    Submitted 3 October, 2022; v1 submitted 1 October, 2020; originally announced October 2020.

  14. arXiv:2009.02252  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    KILT: a Benchmark for Knowledge Intensive Language Tasks

    Authors: Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

    Abstract: Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research… ▽ More

    Submitted 27 May, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: accepted at NAACL 2021

  15. arXiv:2006.04437  [pdf, other

    stat.ML cs.LG

    The Power Spherical distribution

    Authors: Nicola De Cao, Wilker Aziz

    Abstract: There is a growing interest in probabilistic models defined in hyper-spherical spaces, be it to accommodate observed data or latent structure. The von Mises-Fisher (vMF) distribution, often regarded as the Normal distribution on the hyper-sphere, is a standard modeling choice: it is an exponential family and thus enjoys important statistical results, for example, known Kullback-Leibler (KL) diverg… ▽ More

    Submitted 15 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 2 figures, 1 table. Code at: https://github.com/nicola-decao/power_spherical

  16. arXiv:2004.14992  [pdf, other

    cs.CL stat.ML

    How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

    Authors: Nicola De Cao, Michael Schlichtkrull, Wilker Aziz, Ivan Titov

    Abstract: Attribution methods assess the contribution of inputs to the model prediction. One way to do so is erasure: a subset of inputs is considered irrelevant if it can be removed without affecting the prediction. Though conceptually simple, erasure's objective is intractable and approximate search remains expensive with modern deep NLP models. Erasure is also susceptible to the hindsight bias: the fact… ▽ More

    Submitted 2 March, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Accepted at the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Source code available at https://github.com/nicola-decao/diffmask . 18 pages, 15 figures, 4 tables

  17. arXiv:1904.04676  [pdf, other

    stat.ML cs.LG

    Block Neural Autoregressive Flow

    Authors: Nicola De Cao, Ivan Titov, Wilker Aziz

    Abstract: Normalising flows (NFS) map two density functions via a differentiable bijection whose Jacobian determinant can be computed efficiently. Recently, as an alternative to hand-crafted bijections, Huang et al. (2018) proposed neural autoregressive flow (NAF) which is a universal approximator for density functions. Their flow is a neural network (NN) whose parameters are predicted by another NN. The la… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: 12 pages, 3 figures, 3 tables

  18. arXiv:1808.09920  [pdf, other

    cs.CL stat.ML

    Question Answering by Reasoning Across Documents with Graph Convolutional Networks

    Authors: Nicola De Cao, Wilker Aziz, Ivan Titov

    Abstract: Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a neural model which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph while edges encode relations between diff… ▽ More

    Submitted 27 September, 2022; v1 submitted 29 August, 2018; originally announced August 2018.

    Journal ref: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 2306--2317

  19. arXiv:1807.04689  [pdf, other

    stat.ML cs.LG

    Explorations in Homeomorphic Variational Auto-Encoding

    Authors: Luca Falorsi, Pim de Haan, Tim R. Davidson, Nicola De Cao, Maurice Weiler, Patrick Forré, Taco S. Cohen

    Abstract: The manifold hypothesis states that many kinds of high-dimensional data are concentrated near a low-dimensional manifold. If the topology of this data manifold is non-trivial, a continuous encoder network cannot embed it in a one-to-one manner without creating holes of low density in the latent space. This is at odds with the Gaussian prior assumption typically made in Variational Auto-Encoders (V… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: 16 pages, 8 figures, ICML workshop on Theoretical Foundations and Applications of Deep Generative Models

  20. arXiv:1805.11973  [pdf, other

    stat.ML cs.LG

    MolGAN: An implicit generative model for small molecular graphs

    Authors: Nicola De Cao, Thomas Kipf

    Abstract: Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumve… ▽ More

    Submitted 27 September, 2022; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: Code at https://github.com/nicola-decao/MolGAN

    Journal ref: ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models

  21. arXiv:1804.00891  [pdf, other

    stat.ML cs.LG

    Hyperspherical Variational Auto-Encoders

    Authors: Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak

    Abstract: The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we p… ▽ More

    Submitted 27 September, 2022; v1 submitted 3 April, 2018; originally announced April 2018.

    Comments: Code at http://github.com/nicola-decao/s-vae-tf and https://github.com/nicola-decao/s-vae-pytorch, Blogpost: https://nicola-decao.github.io/s-vae

    Journal ref: Uncertainty in Artificial Intelligence (UAI). Proceedings of the Thirty-Fourth Conference (2018) 856- 865