Skip to main content

Showing 1–8 of 8 results for author: Beelen, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.10086  [pdf, ps, other

    cs.CL cs.DL

    Metadata Might Make Language Models Better

    Authors: Kaspar Beelen, Daniel van Strien

    Abstract: This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhance… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  2. arXiv:2111.15592  [pdf, other

    cs.CV cs.LG

    MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale

    Authors: Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, Katherine McDonough

    Abstract: We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital). This library transforms the way historians can use maps by turning extensive, homogeneous map sets into searchable primary sources. MapReader allows users with little or no computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divid… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: 13 pages, 9 figures

  3. arXiv:2105.11321  [pdf, other

    cs.CL cs.LG

    Neural Language Models for Nineteenth-Century English

    Authors: Kasra Hosseini, Kaspar Beelen, Giovanni Colavizza, Mariona Coll Ardanuy

    Abstract: We present four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate i… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure

  4. arXiv:2005.11140  [pdf, other

    cs.CL

    Living Machines: A study of atypical animacy

    Authors: Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson, Barbara McGillivray

    Abstract: This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first dataset for atypical animacy detection, based on n… ▽ More

    Submitted 19 November, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: 12 pages, 1 figures

  5. arXiv:1711.05603  [pdf, other

    cs.CL

    Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

    Authors: Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps

    Abstract: Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a s… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management (CIKM2017)

  6. arXiv:1710.01127  [pdf, other

    cs.IR cs.DL

    Finding Talk About the Past in the Discourse of Non-Historians

    Authors: Alex Olieman, Kaspar Beelen, Jaap Kamps

    Abstract: A heightened interest in the presence of the past has given rise to the new field of memory studies, but there is a lack of search and research tools to support studying how and why the past is evoked in diachronic discourses. Searching for temporal references is not straightforward. It entails bridging the gap between conceptually-based information needs on one side, and term-based inverted index… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Comments: Presented at Drift-a-LOD 2017

  7. arXiv:1708.01162  [pdf, other

    cs.IR

    Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

    Authors: Alex Olieman, Kaspar Beelen, Milan van Lange, Jaap Kamps, Maarten Marx

    Abstract: Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "rig… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: Accepted for presentation at SEMANTiCS '17

  8. arXiv:1706.07643  [pdf, other

    cs.CY

    Computational Controversy

    Authors: Benjamin Timmermans, Tobias Kuhn, Kaspar Beelen, Lora Aroyo

    Abstract: Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have appeared, using new data sources such as Wikipedia, which help us now better understand these phenomena. How… ▽ More

    Submitted 30 August, 2017; v1 submitted 23 June, 2017; originally announced June 2017.

    Comments: In Proceedings of the 9th International Conference on Social Informatics (SocInfo) 2017