Skip to main content

Showing 1–39 of 39 results for author: Vilares, D

.
  1. arXiv:2505.16855  [pdf, ps, other

    cs.CL

    Nested Named Entity Recognition as Single-Pass Sequence Labeling

    Authors: Alberto Muñoz-Ortiz, David Vilares, Caio COrro, Carlos Gómez-Rodríguez

    Abstract: We cast nested named entity recognition (NNER) as a sequence labeling task by leveraging prior work that linearizes constituency structures, effectively reducing the complexity of this structured prediction problem to straightforward token classification. By combining these constituency linearizations with pretrained encoders, our method captures nested entities while performing exactly $n$ taggin… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Submitted to EMNLP 2025

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2505.11693  [pdf, ps, other

    cs.CL

    Hierarchical Bracketing Encodings for Dependency Parsing as Tagging

    Authors: Ana Ezquerro, David Vilares, Anssi Yli-Jyrä, Carlos Gómez-Rodríguez

    Abstract: We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We prove that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025. Original submission; camera-ready coming soon

  3. arXiv:2502.20866  [pdf, other

    cs.CL

    Better Benchmarking LLMs for Zero-Shot Dependency Parsing

    Authors: Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

    Abstract: While LLMs excel in zero-shot tasks, their performance in linguistic challenges like syntactic parsing has been less scrutinized. This paper studies state-of-the-art open-weight LLMs on the task by comparing them to baselines that do not have access to the input sentence, including baselines that have not been used in this context such as random projective trees or optimal linear arrangements. The… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: Accepted at NoDaLiDa/Baltic-HLT 2025

  4. arXiv:2410.17972  [pdf, other

    cs.CL

    Dependency Graph Parsing as Sequence Labeling

    Authors: Ana Ezquerro, David Vilares, Carlos Gómez-Rodríguez

    Abstract: Various linearizations have been proposed to cast syntactic dependency parsing as sequence labeling. However, these approaches do not support more complex graph-based representations, such as semantic dependencies or enhanced universal dependencies, as they cannot handle reentrancy or cycles. By extending them, we define a range of unbounded and bounded linearizations that can be used to cast grap… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted at EMNLP-2024

  5. arXiv:2406.16071  [pdf, other

    cs.CL

    Dancing in the syntax forest: fast, accurate and explainable sentiment analysis with SALSA

    Authors: Carlos Gómez-Rodríguez, Muhammad Imran, David Vilares, Elena Solera, Olga Kellert

    Abstract: Sentiment analysis is a key technology for companies and institutions to gauge public opinion on products, services or events. However, for large-scale sentiment analysis to be accessible to entities with modest computational resources, it needs to be performed in a resource-efficient way. While some efficient sentiment analysis systems exist, they tend to apply shallow heuristics, which do not ta… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for publication at SEPLN-CEDI2024: Seminar of the Spanish Society for Natural Language Processing at the 7th Spanish Conference on Informatics

    MSC Class: 68T50 ACM Class: I.2.7

  6. arXiv:2405.06483  [pdf, other

    cs.CL

    LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing

    Authors: Ana Ezquerro, David Vilares

    Abstract: This paper describes our participation in SemEval 2024 Task 3, which focused on Multimodal Emotion Cause Analysis in Conversations. We developed an early prototype for an end-to-end system that uses graph-based methods from dependency parsing to identify causal emotion relations in multi-party conversations. Our model comprises a neural transformer-based encoder for contextualizing multimodal conv… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted at SemEval 2024

  7. arXiv:2402.02782  [pdf, other

    cs.CL

    From Partial to Strictly Incremental Constituent Parsing

    Authors: Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

    Abstract: We study incremental constituent parsers to assess their capacity to output trees based on prefix representations alone. Guided by strictly left-to-right generative language models and tree-decoding modules, we build parsers that adhere to a strong definition of incrementality across languages. This builds upon work that asserted incrementality, but that mostly only enforced it on either the encod… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024

  8. arXiv:2310.14319  [pdf, other

    cs.CL cs.FL

    4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees

    Authors: Carlos Gómez-Rodríguez, Diego Roca, David Vilares

    Abstract: We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word. The bits in each word's label represent (1) whether it is a right or left dependent, (2) whether it is the outermost (left/right) dependent of its parent, (3) whether it has any left children and (4) whether it has any right children. We show that… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at EMNLP 2023

    MSC Class: 68T50 ACM Class: I.2.7

  9. arXiv:2309.16254  [pdf, other

    cs.CL

    On the Challenges of Fully Incremental Neural Dependency Parsing

    Authors: Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

    Abstract: Since the popularization of BiLSTMs and Transformer-based bidirectional encoders, state-of-the-art syntactic parsers have lacked incrementality, requiring access to the whole sentence and deviating from human language processing. This paper explores whether fully incremental dependency parsing with modern architectures can be competitive. We build parsers combining strictly left-to-right neural en… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted at IJCNLP-AACL 2023

  10. arXiv:2309.11165  [pdf, other

    cs.CL

    Assessment of Pre-Trained Models Across Languages and Grammars

    Authors: Alberto Muñoz-Ortiz, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by casting parsing as sequence labeling. To do so, we select a few LLMs and study them on 13 diverse UD treebanks for dependency parsing and 10 treebanks for constituent parsing. Our results show th… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at IJCNLP-AACL 2023

  11. Contrasting Linguistic Patterns in Human and LLM-Generated News Text

    Authors: Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares

    Abstract: We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in total. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric, and sociolinguistic aspects. The results reveal various measurable differen… ▽ More

    Submitted 2 September, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Published at Artificial Intelligence Review vol. 57, 265

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: Artificial Intelligence Review 57, 265 (2024)

  12. arXiv:2305.15119  [pdf, other

    cs.CL

    Another Dead End for Morphological Tags? Perturbed Inputs and Parsing

    Authors: Alberto Muñoz-Ortiz, David Vilares

    Abstract: The usefulness of part-of-speech tags for parsing has been heavily questioned due to the success of word-contextualized parsers. Yet, most studies are limited to coarse-grained tags and high quality written content; while we know little about their influence when it comes to models in production that face lexical errors. We expand these setups and design an adversarial attack to verify if the use… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at Findings of ACL 2023

  13. arXiv:2210.15219  [pdf, other

    cs.CL

    Parsing linearizations appreciate PoS tags - but some are fussy about errors

    Authors: Alberto Muñoz-Ortiz, Mark Anderson, David Vilares, Carlos Gómez-Rodríguez

    Abstract: PoS tags, once taken for granted as a useful resource for syntactic parsing, have become more situational with the popularization of deep learning. Recent work on the impact of PoS tags on graph- and transition-based parsers suggests that they are only useful when tagging accuracy is prohibitively high, or in low-resource scenarios. However, such an analysis is lacking for the emerging sequence la… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted at AACL 2022

  14. arXiv:2209.06699  [pdf, other

    cs.CL

    The Fragility of Multi-Treebank Parsing Evaluation

    Authors: Iago Alonso-Alonso, David Vilares, Carlos Gómez-Rodríguez

    Abstract: Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typolo… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted at COLING 2022

  15. arXiv:2205.09350  [pdf, other

    cs.CL

    Cross-lingual Inflection as a Data Augmentation Method for Parsing

    Authors: Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares

    Abstract: We propose a morphology-based method for low-resource (LR) dependency parsing. We train a morphological inflector for target LR languages, and apply it to related rich-resource (RR) treebanks to create cross-lingual (x-inflected) treebanks that resemble the target LR language. We use such inflected treebanks to train parsers in zero- (training on x-inflected treebanks) and few-shot (training on x-… ▽ More

    Submitted 20 May, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: 10 pages, 7 tables, 5 figures. Workshop on Insights from Negative Results in NLP 2022 (co-located with ACL)

  16. arXiv:2204.12820  [pdf, other

    cs.CL

    LyS_ACoruña at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing

    Authors: Iago Alonso-Alonso, David Vilares, Carlos Gómez-Rodríguez

    Abstract: This paper addressed the problem of structured sentiment analysis using a bi-affine semantic dependency parser, large pre-trained language models, and publicly available translation models. For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages that can be adequately processed by cross-lingua… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: To be published at SemEval-2022

  17. arXiv:2108.07556  [pdf, other

    cs.CL

    Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

    Authors: Alberto Muñoz-Ortiz, Michalina Strzyz, David Vilares

    Abstract: Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. H… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: Accepted at RANLP 2021 (https://ranlp.org/ranlp2021)

  18. arXiv:2105.02947  [pdf, other

    cs.CL cs.LG

    On the logistical difficulties and findings of Jopara Sentiment Analysis

    Authors: Marvin M. Agüero-Torales, David Vilares, Antonio G. López-Herrera

    Abstract: This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether… ▽ More

    Submitted 11 May, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: Accepted in the CALCS 2021 (co-located with NAACL 2021) - Fifth Workshop on Computational Approaches to Linguistic Code Switching, to appear (June 2021)

    MSC Class: 68-02 68T50 68T07 91D30

    Journal ref: Proceedings on CALCS 2021 (co-located with NAACL 2021) - Fifth Workshop on Computational Approaches to Linguistic Code Switching

  19. Bertinho: Galician BERT Representations

    Authors: David Vilares, Marcos Garcia, Carlos Gómez-Rodríguez

    Abstract: This paper presents a monolingual BERT model for Galician. We follow the recent trend that shows that it is feasible to build robust monolingual BERT models even for relatively low-resource languages, while performing better than the well-known official multilingual BERT (mBERT). More particularly, we release two monolingual Galician BERT models, built using 6 and 12 transformer layers, respective… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted in the journal Procesamiento del Lenguaje Natural

    Journal ref: Procesamiento del Lenguaje Natural. 66 (2021) 13-26

  20. arXiv:2011.00596  [pdf, ps, other

    cs.CL

    Bracketing Encodings for 2-Planar Dependency Parsing

    Authors: Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs in sequence labeling parsing. First, we show that existing bracketing encodings for parsing as labeling can only handle a very mild extension of projective trees. Second, we overcome this limi… ▽ More

    Submitted 22 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: COLING2020 (long papers), 13 pages (incl. appendix) with corrected parsing speeds for Danish and Gothic

    MSC Class: 68T50 ACM Class: I.2.7

  21. arXiv:2011.00584  [pdf, ps, other

    cs.CL cs.FL

    A Unifying Theory of Transition-based and Sequence Labeling Parsing

    Authors: Carlos Gómez-Rodríguez, Michalina Strzyz, David Vilares

    Abstract: We define a mapping from transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees. This not only establishes a theoretical relation between transition-based parsing and sequence-labeling parsing, but also provides a method to obtain new encodings for fast and simple sequence labeling parsing from the many existing transition-based… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: Camera-ready version (final peer-reviewed manuscript) to appear at proceedings of COLING 2020. 18 pages (incl. appendices)

    MSC Class: 68T50; 68Q45 ACM Class: F.4.3; I.2.7

  22. arXiv:2010.00633  [pdf, other

    cs.CL

    Discontinuous Constituent Parsing as Sequence Labeling

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: This paper reduces discontinuous parsing to sequence labeling. It first shows that existing reductions for constituent parsing as labeling do not support discontinuities. Second, it fills this gap and proposes to encode tree discontinuities as nearly ordered permutations of the input sequence. Third, it studies whether such discontinuous representations are learnable. The experiments show that des… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: To appear in EMNLP 2020

  23. arXiv:2002.01685  [pdf, other

    cs.CL cs.LG

    Parsing as Pretraining

    Authors: David Vilares, Michalina Strzyz, Anders Søgaard, Carlos Gómez-Rodríguez

    Abstract: Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures -- and no decoding. We first cast cons… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

    Comments: AAAI 2020 - The Thirty-Fourth AAAI Conference on Artificial Intelligence

  24. arXiv:1909.01053  [pdf, other

    cs.CL cs.LG

    Towards Making a Dependency Parser See

    Authors: Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We explore whether it is possible to leverage eye-tracking data in an RNN dependency parser (for English) when such information is only available during training, i.e., no aggregated or token-level gaze features are used at inference time. To do so, we train a multitask learning model that parses sentences as sequence labeling and leverages gaze features as auxiliary tasks. Our method also learns… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Camera-ready version to appear at EMNLP 2019 (final peer-reviewed manuscript). 8 pages (incl. appendix)

    MSC Class: 68T50 ACM Class: I.2.7

  25. arXiv:1908.03480  [pdf, other

    cs.CL

    Artificially Evolved Chunks for Morphosyntactic Analysis

    Authors: Mark Anderson, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We introduce a language-agnostic evolutionary technique for automatically extracting chunks from dependency treebanks. We evaluate these chunks on a number of morphosyntactic tasks, namely POS tagging, morphological feature tagging, and dependency parsing. We test the utility of these chunks in a host of different ways. We first learn chunking as one task in a shared multi-task framework together… ▽ More

    Submitted 21 August, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

    Comments: To be published in proceedings of the 18th International Workshop on Treebanks and Linguistic Theories

  26. Sequence Labeling Parsing by Learning Across Representations

    Authors: Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We use parsing as sequence labeling as a common framework to learn across constituency and dependency syntactic abstractions. To do so, we cast the problem as multitask learning (MTL). First, we show that adding a parsing paradigm as an auxiliary loss consistently improves the performance on the other paradigm. Secondly, we explore an MTL sequence labeling model that parses both representations, a… ▽ More

    Submitted 7 January, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Revised version after fixing evaluation bug

  27. arXiv:1906.04701  [pdf, other

    cs.CL

    HEAD-QA: A Healthcare Dataset for Complex Reasoning

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show t… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: ACL 2019 (short papers)

  28. arXiv:1905.11037  [pdf, other

    cs.CL

    Harry Potter and the Action Prediction Challenge from Natural Language

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. 'Alohomora' to open a door)… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: NAACL 2019 (short papers)

  29. arXiv:1902.10985  [pdf, other

    cs.CL

    Better, Faster, Stronger Sequence Tagging Constituent Parsers

    Authors: David Vilares, Mostafa Abdou, Anders Søgaard

    Abstract: Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model… ▽ More

    Submitted 14 October, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

    Comments: NAACL 2019 (long papers). Contains corrigendum

  30. arXiv:1902.10505  [pdf, other

    cs.CL cs.LG

    Viable Dependency Parsing as Sequence Labeling

    Authors: Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez

    Abstract: We recast dependency parsing as a sequence labeling problem, exploring several encodings of dependency trees as labels. While dependency parsing by means of sequence labeling had been attempted in existing work, results suggested that the technique was impractical. We show instead that with a conventional BiLSTM-based model it is possible to obtain fast and accurate parsers. These parsers are conc… ▽ More

    Submitted 29 March, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: Camera-ready version to appear at NAACL 2019 (final peer-reviewed manuscript). 8 pages (incl. appendix)

    MSC Class: 68T50 ACM Class: I.2.7

  31. arXiv:1810.08997  [pdf, ps, other

    cs.CL

    Transition-based Parsing with Lighter Feed-Forward Networks

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: We explore whether it is possible to build lighter parsers, that are statistically equivalent to their corresponding standard version, for a wide set of languages showing different structures and morphologies. As testbed, we use the Universal Dependencies and transition-based dependency parsers trained on feed-forward networks. For these, most existing research assumes de facto standard embedded f… ▽ More

    Submitted 21 October, 2018; originally announced October 2018.

    Comments: UD Workshop (co-located with EMNLP 2018)

  32. arXiv:1810.08994  [pdf, other

    cs.CL

    Constituent Parsing as Sequence Labeling

    Authors: Carlos Gómez-Rodríguez, David Vilares

    Abstract: We introduce a method to reduce constituent parsing to sequence labeling. For each word w_t, it generates a label that encodes: (1) the number of ancestors in the tree that the words w_t and w_{t+1} have in common, and (2) the nonterminal symbol at the lowest common ancestor. We first prove that the proposed encoding function is injective for any tree without unary branches. In practice, the appro… ▽ More

    Submitted 17 September, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: EMNLP 2018 (Long Papers). Revised version with improved results after fixing evaluation bug

  33. arXiv:1805.09055  [pdf, ps, other

    cs.CL

    Grounding the Semantics of Part-of-Day Nouns Worldwide using Twitter

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: The usage of part-of-day nouns, such as 'night', and their time-specific greetings ('good night'), varies across languages and cultures. We show the possibilities that Twitter offers for studying the semantics of these terms and its variability between countries. We mine a worldwide sample of multilingual tweets with temporal greetings, and study how their frequencies vary in relation with local t… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: In PEOPLES2018 short papers (NAACL workshop), 6 pages, 5 figures, 1 table

  34. arXiv:1805.09007  [pdf, other

    cs.CL

    A Transition-based Algorithm for Unrestricted AMR Parsing

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: Non-projective parsing can be useful to handle cycles and reentrancy in AMR graphs. We explore this idea and introduce a greedy left-to-right non-projective transition-based parser. At each parsing configuration, an oracle decides whether to create a concept or whether to connect a pair of existing concepts. The algorithm handles reentrancy and arbitrary cycles natively, i.e. within the transition… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: In NAACL 2018 (short papers): 8 pages, 1 Figure

  35. arXiv:1708.05269  [pdf, other

    cs.CL

    Towards Syntactic Iberian Polarity Classification

    Authors: David Vilares, Marcos Garcia, Miguel A. Alonso, Carlos Gómez-Rodríguez

    Abstract: Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines. Thus, rules are also dependent and require adaptation, especially in multilingual scenarios. We tackle this challenge in the context of the Iberian Peninsula, releasing the first symbolic syntax-based Iberian system with rules shared across five off… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: 7 pages, 5 tables. Contribution to the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA-2017) at EMNLP 2017

  36. arXiv:1707.03228  [pdf, other

    cs.CL

    A non-projective greedy dependency parser with bidirectional LSTMs

    Authors: David Vilares, Carlos Gómez-Rodríguez

    Abstract: The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods a… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: 12 pages, 2 figures, 5 tables

    Journal ref: In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 152-162, Vancouver, Canada, 2017

  37. How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on Rule-Based Sentiment Analysis

    Authors: Carlos Gómez-Rodríguez, Iago Alonso-Alonso, David Vilares

    Abstract: Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of pa… ▽ More

    Submitted 24 October, 2017; v1 submitted 7 June, 2017; originally announced June 2017.

    Comments: 19 pages. Accepted for publication in Artificial Intelligence Review. This update only adds the DOI link to comply with journal's terms

    MSC Class: 68T50; 97R40 ACM Class: I.2.7

  38. Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis

    Authors: David Vilares, Carlos Gómez-Rodríguez, Miguel A. Alonso

    Abstract: We present a novel unsupervised approach for multilingual sentiment analysis driven by compositional syntax-based rules. On the one hand, we exploit some of the main advantages of unsupervised algorithms: (1) the interpretability of their output, in contrast with most supervised models, which behave as a black box and (2) their robustness across different corpora and domains. On the other hand, by… ▽ More

    Submitted 5 January, 2017; v1 submitted 17 June, 2016; originally announced June 2016.

    Comments: 19 pages, 5 Tables, 6 Figures. This is the authors version of a work that was accepted for publication in Knowledge-Based Systems

    Journal ref: Knowledge-Based Systems, 118:45-55, 2017

  39. arXiv:1507.08449  [pdf, ps, other

    cs.CL

    One model, two languages: training bilingual parsers with harmonized treebanks

    Authors: David Vilares, Carlos Gómez-Rodríguez, Miguel A. Alonso

    Abstract: We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingual p… ▽ More

    Submitted 19 May, 2016; v1 submitted 30 July, 2015; originally announced July 2015.

    Comments: 7 pages, 4 tables, 1 figure