Skip to main content

Showing 1–12 of 12 results for author: Niekler, A

.
  1. arXiv:2305.02350  [pdf, other

    cs.CL cs.LG

    Using Language Models on Low-end Hardware

    Authors: Fabian Ziegner, Janos Borst, Andreas Niekler, Martin Potthast

    Abstract: This paper evaluates the viability of using fixed language models for training text classification networks on low-end hardware. We combine language models with a CNN architecture and put together a comprehensive benchmark with 8 datasets covering single-label and multi-label classification of topic, sentiment, and genre. Our observations are distilled into a list of trade-offs, concluding that th… ▽ More

    Submitted 8 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 5+4 pages, 6 tables; fixed affiliation

  2. arXiv:2211.16947  [pdf, other

    cs.CY cs.AI

    Using Text Classification with a Bayesian Correction for Estimating Overreporting in the Creditor Reporting System on Climate Adaptation Finance

    Authors: Janos Borst, Thomas Wencker, Andreas Niekler

    Abstract: Development funds are essential to finance climate change adaptation and are thus an important part of international climate policy. % However, the absence of a common reporting practice makes it difficult to assess the amount and distribution of such funds. Research has questioned the credibility of reported figures, indicating that adaptation financing is in fact lower than published figures sug… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: 9+4 Pages, 3 figures, 4 tables

    ACM Class: J.1; I.2.7

  3. arXiv:2110.02708  [pdf, other

    cs.CL

    Application of the interactive Leipzig Corpus Miner as a generic research platform for the use in the social sciences

    Authors: Christian Kahmann, Andreas Niekler, Gregor Wiedemann

    Abstract: This article introduces to the interactive Leipzig Corpus Miner (iLCM) - a newly released, open-source software to perform automatic content analysis. Since the iLCM is based on the R-programming language, its generic text mining procedures provided via a user-friendly graphical user interface (GUI) can easily be extended using the integrated IDE RStudio-Server or numerous other interfaces in the… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  4. Small-Text: Active Learning for Text Classification in Python

    Authors: Christopher Schröder, Lydia Müller, Andreas Niekler, Martin Potthast

    Abstract: We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitati… ▽ More

    Submitted 7 October, 2023; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: This revision fixes the number of query strategies for modAL, which had remained unchanged from an earlier iteration of the table that did not yet include multi-label strategies

  5. arXiv:2107.05687  [pdf, other

    cs.CL cs.LG

    Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers

    Authors: Christopher Schröder, Andreas Niekler, Martin Potthast

    Abstract: Active learning is the iterative construction of a classification model through targeted labeling, enabling significant labeling cost savings. As most research on active learning has been carried out before transformer-based language models ("transformers") became popular, despite its practical importance, comparably few papers have investigated how transformers can be combined with active learnin… ▽ More

    Submitted 20 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: ACL 2022 Findings

  6. Supporting Land Reuse of Former Open Pit Mining Sites using Text Classification and Active Learning

    Authors: Christopher Schröder, Kim Bürgl, Yves Annanias, Andreas Niekler, Lydia Müller, Daniel Wiegreffe, Christian Bender, Christoph Mengs, Gerik Scheuermann, Gerhard Heyer

    Abstract: Open pit mines left many regions worldwide inhospitable or uninhabitable. To put these regions back into use, entire stretches of land must be renaturalized. For the sustainable subsequent use or transfer to a new primary use, many contaminated sites and soil information have to be permanently managed. In most cases, this information is available in the form of expert reports in unstructured data… ▽ More

    Submitted 22 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

    Journal ref: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021

  7. arXiv:2008.07267  [pdf, other

    cs.CL cs.LG

    A Survey of Active Learning for Text Classification using Deep Neural Networks

    Authors: Christopher Schröder, Andreas Niekler

    Abstract: Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years. For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity. By using the superior text classification performance of NNs for AL, we can either increase a model's performance using the same amount of data or reduce the data and therefo… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  8. arXiv:1805.11404  [pdf, other

    cs.IR cs.CL

    iLCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data

    Authors: Andreas Niekler, Arnim Bleier, Christian Kahmann, Lisa Posch, Gregor Wiedemann, Kenan Erdogan, Gerhard Heyer, Markus Strohmaier

    Abstract: The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data with text mining methods as well as requirements for the reproducibility of data-driven research desi… ▽ More

    Submitted 11 May, 2018; originally announced May 2018.

    Comments: 11th edition of the Language Resources and Evaluation Conference (LREC)

  9. arXiv:1711.05538  [pdf, other

    cs.CL

    Detecting and assessing contextual change in diachronic text documents using context volatility

    Authors: Christian Kahmann, Andreas Niekler, Gerhard Heyer

    Abstract: Terms in diachronic text corpora may exhibit a high degree of semantic dynamics that is only partially captured by the common notion of semantic change. The new measure of context volatility that we propose models the degree by which terms change context in a text collection over time. The computation of context volatility for a word relies on the significance-values of its co-occurrent terms and… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

  10. arXiv:1707.03255  [pdf

    cs.CL

    Modeling the dynamics of domain specific terminology in diachronic corpora

    Authors: Gerhard Heyer, Cathleen Kantner, Andreas Niekler, Max Overbeck, Gregor Wiedemann

    Abstract: In terminology work, natural language processing, and digital humanities, several studies address the analysis of variations in context and meaning of terms in order to detect semantic change and the evolution of terms. We distinguish three different approaches to describe contextual variations: methods based on the analysis of patterns and linguistic clues, methods exploring the latent semantic s… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: http://openarchive.cbs.dk/handle/10398/9323; Proceedings of the 12th International conference on Terminology and Knowledge Engineering (TKE 2016)

  11. arXiv:1707.03253  [pdf, other

    cs.CL

    Leipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data Analysis

    Authors: Andreas Niekler, Gregor Wiedemann, Gerhard Heyer

    Abstract: This paper presents the "Leipzig Corpus Miner", a technical infrastructure for supporting qualitative and quantitative content analysis. The infrastructure aims at the integration of 'close reading' procedures on individual documents with procedures of 'distant reading', e.g. lexical characteristics of large document collections. Therefore information retrieval systems, lexicometric statistics and… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: https://hal.archives-ouvertes.fr/hal-01005878; Proceedings of Terminology and Knowledge Engineering 2014 (TKE'14), Berlin

  12. arXiv:1707.03217  [pdf, other

    cs.IR

    Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries

    Authors: Gregor Wiedemann, Andreas Niekler

    Abstract: This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often cannot describe their research objective with a small set of key terms, especially when dealing with theoretical or rather abstract research interests. Instead,… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: https://hal.archives-ouvertes.fr/hal-01005879; Proceedings of Terminology and Knowledge Engineering 2014 (TKE'14), Berlin