Skip to main content

Showing 1–9 of 9 results for author: Valenzuela-Escarcega, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.00475  [pdf, ps, other

    cs.CL cs.IR cs.LG

    From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction

    Authors: Robert Vacareanu, Marco A. Valenzuela-Escarcega, George C. G. Barbosa, Rebecca Sharp, Mihai Surdeanu

    Abstract: While deep learning approaches to information extraction have had many successes, they can be difficult to augment or maintain as needs shift. Rule-based methods, on the other hand, can be more easily modified. However, crafting rules requires expertise in linguistics and the domain of interest, making it infeasible for most users. Here we attempt to combine the advantages of these two directions… ▽ More

    Submitted 16 January, 2022; originally announced February 2022.

  2. arXiv:2001.07295  [pdf, other

    cs.AI cs.MM cs.SE

    AutoMATES: Automated Model Assembly from Text, Equations, and Software

    Authors: Adarsh Pyarelal, Marco A. Valenzuela-Escarcega, Rebecca Sharp, Paul D. Hein, Jon Stephens, Pratik Bhandari, HeuiChan Lim, Saumya Debray, Clayton T. Morrison

    Abstract: Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scien… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

    Comments: 8 pages, 6 figures, accepted to Modeling the World's Systems 2019

    ACM Class: D.3.3; D.3.4; H.1.0; I.2.2; I.2.5; I.2.7; I.6.4; I.6.5

  3. arXiv:1805.11545  [pdf, other

    cs.CL

    Lightly-supervised Representation Learning with Global Interpretability

    Authors: Marco A. Valenzuela-Escárcega, Ajay Nagesh, Mihai Surdeanu

    Abstract: We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning. Our algorithm iteratively learns custom embeddings for both the multi-word enti… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  4. arXiv:1711.00529  [pdf, other

    cs.CL

    Text Annotation Graphs: Annotating Complex Natural Language Phenomena

    Authors: Angus G. Forbes, Kristine Lee, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

    Abstract: This paper introduces a new web-based software tool for annotating text, Text Annotation Graphs, or TAG. It provides functionality for representing complex relationships between words and word phrases that are not available in other software tools, including the ability to define and visualize relationships between the relationships themselves (semantic hypergraphs). Additionally, we include an ap… ▽ More

    Submitted 1 March, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Accepted to LREC'18, http://lrec2018.lrec-conf.org/en/conference-programme/accepted-papers/

  5. arXiv:1709.00149  [pdf, other

    cs.AI cs.CL cs.IR cs.LG

    Learning what to read: Focused machine reading

    Authors: Enrique Noriega-Atala, Marco A. Valenzuela-Escarcega, Clayton T. Morrison, Mihai Surdeanu

    Abstract: Recent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today's scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing ove… ▽ More

    Submitted 1 September, 2017; originally announced September 2017.

    Comments: 6 pages, 1 figure, 1 algorithm, 2 tables, accepted to EMNLP 2017

    ACM Class: H.3.3; I.2.6; I.2.7

  6. arXiv:1606.09604  [pdf, other

    cs.CL

    SnapToGrid: From Statistical to Interpretable Models for Biomedical Information Extraction

    Authors: Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu

    Abstract: We propose an approach for biomedical information extraction that marries the advantages of machine learning models, e.g., learning directly from data, with the benefits of rule-based approaches, e.g., interpretability. Our approach starts by training a feature-based statistical model, then converts this model to a rule-based variant by converting its features to rules, and "snapping to grid" the… ▽ More

    Submitted 30 June, 2016; originally announced June 2016.

  7. arXiv:1606.08089  [pdf, other

    cs.CL

    This before That: Causal Precedence in the Biomedical Domain

    Authors: Gus Hahn-Powell, Dane Bell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

    Abstract: Causal precedence between biochemical interactions is crucial in the biomedical domain, because it transforms collections of individual interactions, e.g., bindings and phosphorylations, into the causal mechanisms needed to inform meaningful search and inference. Here, we analyze causal precedence in the biomedical domain as distinct from open-domain, temporal precedence. First, we describe a nove… ▽ More

    Submitted 26 June, 2016; originally announced June 2016.

    Comments: To appear in the proceedings of the 2016 Workshop on Biomedical Natural Language Processing (BioNLP 2016)

  8. arXiv:1603.03758  [pdf, other

    cs.CL

    Sieve-based Coreference Resolution in the Biomedical Domain

    Authors: Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

    Abstract: We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do n… ▽ More

    Submitted 2 September, 2016; v1 submitted 11 March, 2016; originally announced March 2016.

    Comments: This paper appears in LREC 2016

  9. arXiv:1509.07513  [pdf, other

    cs.CL

    Description of the Odin Event Extraction Framework and Rule Language

    Authors: Marco A. Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu

    Abstract: This document describes the Odin framework, which is a domain-independent platform for developing rule-based event extraction models. Odin aims to be powerful (the rule language allows the modeling of complex syntactic structures) and robust (to recover from syntactic parsing errors, syntactic patterns can be freely mixed with surface, token-based patterns), while remaining simple (some domain gra… ▽ More

    Submitted 24 September, 2015; originally announced September 2015.