-
CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification
Abstract: This paper presents our approach to the SemEval-2025 Task~6 (PromiseEval), which focuses on verifying promises in corporate ESG (Environmental, Social, and Governance) reports. We explore three model architectures to address the four subtasks of promise identification, supporting evidence assessment, clarity evaluation, and verification timing. Our first model utilizes ESG-BERT with task-specific… ▽ More
Submitted 29 May, 2025; originally announced May 2025.
Comments: Accepted to SemEval-2025 Task 6 (ACL 2025)
-
Deep Reinforcement Learning Algorithms for Option Hedging
Abstract: Dynamic hedging is a financial strategy that consists in periodically transacting one or multiple financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of… ▽ More
Submitted 16 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.
-
arXiv:2408.08971 [pdf, ps, other]
A Multi-Task and Multi-Label Classification Model for Implicit Discourse Relation Recognition
Abstract: We propose a novel multi-label classification approach to implicit discourse relation recognition (IDRR). Our approach features a multi-task model that jointly learns multi-label representations of implicit discourse relations across all three sense levels in the PDTB 3.0 framework. The model can also be adapted to the traditional single-label IDRR setting by selecting the sense with the highest p… ▽ More
Submitted 8 July, 2025; v1 submitted 16 August, 2024; originally announced August 2024.
Comments: Accepted at SIGDIAL 2025
-
Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment
Abstract: This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts. Our model, developed as a part of the recent SemEval task, is based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model in addition to dataset augmentation through paraphrase generation from ChatGPT. The scope of the study e… ▽ More
Submitted 1 July, 2024; originally announced July 2024.
Comments: 15 pages, 8 figures, 1 table, Proceedings of 5th International Conference on Natural Language Processing and Applications (NLPA 2024)
Journal ref: Computer Science & Information Technology (CS & IT), ISSN : 2231 - 5403, Volume 14, Number 11, June 2024
-
Deep Hedging with Market Impact
Abstract: Dynamic hedging is the practice of periodically transacting financial instruments to offset the risk caused by an investment or a liability. Dynamic hedging optimization can be framed as a sequential decision problem; thus, Reinforcement Learning (RL) models were recently proposed to tackle this task. However, existing RL works for hedging do not consider market impact caused by the finite liquidi… ▽ More
Submitted 22 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.
Comments: 13 pages, 5 figures
-
Toponym Identification in Epidemiology Articles - A Deep Learning Approach
Abstract: When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this… ▽ More
Submitted 27 April, 2019; v1 submitted 24 April, 2019; originally announced April 2019.
Comments: 12 pages. pre-print from Proceedings of CICLing 2019: 20th International Conference on Computational Linguistics and Intelligent Text Processing
-
arXiv:1709.02843 [pdf, ps, other]
CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification
Abstract: This paper describes the system deployed by the CLaC-EDLK team to the "SemEval 2016, Complex Word Identification task". The goal of the task is to identify if a given word in a given context is "simple" or "complex". Our system relies on linguistic features and cognitive complexity. We used several supervised models, however the Random Forest model outperformed the others. Overall our best configu… ▽ More
Submitted 8 September, 2017; originally announced September 2017.
Comments: In Proceedings of the International Workshop on Semantic Evaluation (SemEval-2016), a workshop of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-2016) pp 982-985. June 16-17, San Diego, California
-
The CLaC Discourse Parser at CoNLL-2015
Abstract: This paper describes our submission (kosseim15) to the CoNLL-2015 shared task on shallow discourse parsing. We used the UIMA framework to develop our parser and used ClearTK to add machine learning functionality to the UIMA framework. Overall, our parser achieves a result of 17.3 F1 on the identification of discourse relations on the blind CoNLL-2015 test set, ranking in sixth place.
Submitted 19 August, 2017; originally announced August 2017.
Comments: Proceedings of the Nineteenth Conference on Computational Natural Language Learning Shared Task (CoNLL 2015). Beijing, China
-
arXiv:1708.05803 [pdf, ps, other]
Measuring the Effect of Discourse Relations on Blog Summarization
Abstract: The work presented in this paper attempts to evaluate and quantify the use of discourse relations in the context of blog summarization and compare their use to more traditional and factual texts. Specifically, we measured the usefulness of 6 discourse relations - namely comparison, contingency, illustration, attribution, topic-opinion, and attributive for the task of text summarization from blogs.… ▽ More
Submitted 19 August, 2017; originally announced August 2017.
Comments: In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 1401-1409, October 2013, Nagoya, Japan
-
ClaC: Semantic Relatedness of Words and Phrases
Abstract: The measurement of phrasal semantic relatedness is an important metric for many natural language processing applications. In this paper, we present three approaches for measuring phrasal semantics, one based on a semantic network model, another on a distributional similarity model, and a hybrid between the two. Our hybrid approach achieved an F-measure of 77.4% on the task of evaluating the semant… ▽ More
Submitted 18 August, 2017; originally announced August 2017.
Comments: In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013),June, Atlanta, Georgia, USA, pp. 108-113
-
arXiv:1708.05800 [pdf, ps, other]
On the Contribution of Discourse Structure on Text Complexity Assessment
Abstract: This paper investigates the influence of discourse features on text complexity assessment. To do so, we created two data sets based on the Penn Discourse Treebank and the Simple English Wikipedia corpora and compared the influence of coherence, cohesion, surface, lexical and syntactic features to assess text complexity. Results show that with both data sets coherence features are more correlated… ▽ More
Submitted 18 August, 2017; originally announced August 2017.
Comments: In Proceedings of the 17th Annual SigDial Meeting on Discourse and Dialogue (SigDial 2016). pp 166-174. September 13-15. Los Angeles, USA
-
arXiv:1708.05798 [pdf, ps, other]
The CLaC Discourse Parser at CoNLL-2016
Abstract: This paper describes our submission "CLaC" to the CoNLL-2016 shared task on shallow discourse parsing. We used two complementary approaches for the task. A standard machine learning approach for the parsing of explicit relations, and a deep learning approach for non-explicit relations. Overall, our parser achieves an F1-score of 0.2106 on the identification of discourse relations (0.3110 for expli… ▽ More
Submitted 18 August, 2017; originally announced August 2017.
Comments: In Proceedings of the Twentieth Conference on Computational Natural Language Learning: Shared Task. pp 92-99. July 7-12, 2016. Berlin, Germany
-
CLaC @ QATS: Quality Assessment for Text Simplification
Abstract: This paper describes our approach to the 2016 QATS quality assessment shared task. We trained three independent Random Forest classifiers in order to assess the quality of the simplified texts in terms of grammaticality, meaning preservation and simplicity. We used the language model of Google-Ngram as feature to predict the grammaticality. Meaning preservation is predicted using two complementary… ▽ More
Submitted 18 August, 2017; originally announced August 2017.
Comments: In Proceedings of the Workshop Shared task on Quality Assessment for Text Simplification (QATS-2016), a workshop of the 10th Language Resources and Evaluation Conference (LREC-2016), pp. 53-56, May 23-28, Portoroz, Slovenia
-
arXiv:1708.03541 [pdf, ps, other]
Automatic Identification of AltLexes using Monolingual Parallel Corpora
Abstract: The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as "since" or "but", are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) ar… ▽ More
Submitted 11 August, 2017; originally announced August 2017.
Comments: 6 pages, Proceedings of Recent Advances in Natural Language Processing (RANLP 2017)
-
Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks
Abstract: Argument labeling of explicit discourse relations is a challenging task. The state of the art systems achieve slightly above 55% F-measure but require hand-crafted features. In this paper, we propose a Long Short Term Memory (LSTM) based model for argument labeling. We experimented with multiple configurations of our model. Using the PDTB dataset, our best model achieved an F1 measure of 23.05% wi… ▽ More
Submitted 7 September, 2017; v1 submitted 10 August, 2017; originally announced August 2017.
Comments: Proceedings of Recent Advances in Natural Language Processing (RANLP 2017), pp. 309-315, 4-6 September, Varna, Bulgaria
-
N-gram and Neural Language Models for Discriminating Similar Languages
Abstract: This paper describes our submission (named clac) to the 2016 Discriminating Similar Languages (DSL) shared task. We participated in the closed Sub-task 1 (Set A) with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with a bidirectional long short term memory (BiLSTM) layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning… ▽ More
Submitted 10 August, 2017; originally announced August 2017.
Comments: 8 pages
Journal ref: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). A workshop of the 26th International Conference on Computational Linguistics (COLING 2016, Osaka, Japan), pp 243-250 (2016)
-
arXiv:1707.06357 [pdf, ps, other]
Improving Discourse Relation Projection to Build Discourse Annotated Corpora
Abstract: The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approac… ▽ More
Submitted 19 July, 2017; originally announced July 2017.
-
arXiv:1706.09856 [pdf, ps, other]
Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations
Abstract: In this paper, we present an approach to exploit phrase tables generated by statistical machine translation in order to map French discourse connectives to discourse relations. Using this approach, we created ConcoLeDisCo, a lexicon of French discourse connectives and their PDTB relations. When evaluated against LEXCONN, ConcoLeDisCo achieves a recall of 0.81 and an Average Precision of 0.68 for t… ▽ More
Submitted 29 June, 2017; originally announced June 2017.
-
Automatic Disambiguation of French Discourse Connectives
Abstract: Discourse connectives (e.g. however, because) are terms that can explicitly convey a discourse relation within a text. While discourse connectives have been shown to be an effective clue to automatically identify discourse relations, they are not always used to convey such relations, thus they should first be disambiguated between discourse-usage non-discourse-usage. In this paper, we investigate… ▽ More
Submitted 17 April, 2017; originally announced April 2017.
Journal ref: International Journal of Computational Linguistics and Applications, vol. 7, no. 1, 2016, pp. 11-30