Skip to main content

Showing 1–12 of 12 results for author: Leidner, J L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.11835  [pdf, other

    cs.CL

    Improved Models for Media Bias Detection and Subcategorization

    Authors: Tim Menzner, Jochen L. Leidner

    Abstract: We present improved models for the granular detection and sub-classification news media bias in English news articles. We compare the performance of zero-shot versus fine-tuned large pre-trained neural transformer language models, explore how the level of detail of the classes affects performance on a novel taxonomy of 27 news bias-types, and demonstrate how using synthetically generated example d… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  2. arXiv:2407.10829  [pdf, other

    cs.CL cs.AI cs.IR

    BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy

    Authors: Tim Menzner, Jochen L. Leidner

    Abstract: The increasing consumption of news online in the 21st century coincided with increased publication of disinformation, biased reporting, hate speech and other unwanted Web content. We describe BiasScanner, an application that aims to strengthen democracy by supporting news consumers with scrutinizing news articles they are reading online. BiasScanner contains a server-side pre-trained large languag… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures, 1 table

    ACM Class: I.2.7; H.3.3

  3. arXiv:2406.09938  [pdf, ps, other

    cs.CL cs.AI

    Experiments in News Bias Detection with Pre-Trained Neural Transformers

    Authors: Tim Menzner, Jochen L. Leidner

    Abstract: The World Wide Web provides unrivalled access to information globally, including factual news reporting and commentary. However, state actors and commercial players increasingly spread biased (distorted) or fake (non-factual) information to promote their agendas. We compare several large, pre-trained language models on the task of sentence-level news bias detection and sub-type classification, pro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.07227  [pdf, other

    cs.CV cs.IR

    Which Country Is This? Automatic Country Ranking of Street View Photos

    Authors: Tim Menzner, Jochen L. Leidner, Florian Mittag

    Abstract: In this demonstration, we present Country Guesser, a live system that guesses the country that a photo is taken in. In particular, given a Google Street View image, our federated ranking model uses a combination of computer vision, machine learning and text retrieval methods to compute a ranking of likely countries of the location shown in a given image from Street View. Interestingly, using text-… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2405.07766  [pdf, other

    cs.CL cs.AI

    Challenges and Opportunities of NLP for HR Applications: A Discussion Paper

    Authors: Jochen L. Leidner, Mark Stevenson

    Abstract: Over the course of the recent decade, tremendous progress has been made in the areas of machine learning and natural language processing, which opened up vast areas of potential application use cases, including hiring and human resource management. We review the use cases for text analytics in the realm of human resources/personnel management, including actually realized as well as potential but n… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures, 1 table

    ACM Class: I.2.7; I.2.1

  6. arXiv:2311.11701  [pdf, other

    cs.IR cs.AI cs.CL cs.HC

    Control in Hybrid Chatbots

    Authors: Thomas RĂ¼del, Jochen L. Leidner

    Abstract: Customer data typically is held in database systems, which can be seen as rule-based knowledge base, whereas businesses increasingly want to benefit from the capabilities of large, pre-trained language models. In this technical report, we describe a case study of how a commercial rule engine and an integrated neural chatbot may be integrated, and what level of control that particular integration… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 12 pages, 3 figures

    Report number: Kauz-TR-2023-1 MSC Class: 68T50; 68T07 ACM Class: I.2.7; H.3.3

  7. arXiv:2201.07725  [pdf, other

    cs.CL stat.ME

    Data-to-Value: An Evaluation-First Methodology for Natural Language Projects

    Authors: Jochen L. Leidner

    Abstract: Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: 9 pages, 6 figures, 4 tables

    MSC Class: 91B02; 68U15; 68T50; 62H99 ACM Class: I.2.7; D.2.9; I.7.m; H.0

  8. arXiv:2010.08319  [pdf, other

    cs.CL cs.IR cs.LG

    Detecting ESG topics using domain-specific language models and data augmentation approaches

    Authors: Tim Nugent, Nicole Stelea, Jochen L. Leidner

    Abstract: Despite recent advances in deep learning-based language modelling, many natural language processing (NLP) tasks in the financial domain remain challenging due to the paucity of appropriately labelled data. Other issues that can limit task performance are differences in word distribution between the general corpora - typically used to pre-train language models - and financial corpora, which often e… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: 11 pages, 5 tables, 1 figure

    ACM Class: I.2.7

  9. arXiv:1904.06483  [pdf, other

    cs.IR cs.LG

    Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

    Authors: Daniel Pfeifer, Jochen L. Leidner

    Abstract: We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with o… ▽ More

    Submitted 13 April, 2019; originally announced April 2019.

  10. arXiv:1807.00257  [pdf

    cs.IR cs.CY

    Information Retrieval in the Cloud

    Authors: Jochen L. Leidner

    Abstract: There has been a recent trend to migrate IT infrastructure into the cloud. In this paper, we discuss the impact of this trend on searching for textual and other data, i.e. the distributed indexing and retrieval of information, from an organizational context. Keywords: information retrieval (IR); federated search; cloud search.

    Submitted 30 June, 2018; originally announced July 2018.

    Comments: 6 pages, 1 figure, 1 table

    ACM Class: H.3.0; C.2.4

  11. arXiv:0911.5438  [pdf

    cs.DC

    Building and Installing a Hadoop/MapReduce Cluster from Commodity Components

    Authors: Jochen L. Leidner, Gary Berosik

    Abstract: This tutorial presents a recipe for the construction of a compute cluster for processing large volumes of data, using cheap, easily available personal computer hardware (Intel/AMD based PCs) and freely available open source software (Ubuntu Linux, Apache Hadoop).

    Submitted 28 November, 2009; originally announced November 2009.

    Comments: Technical Report; 15 pages, 1 figure

    ACM Class: C.1.4

  12. arXiv:cs/0207058   

    cs.CL cs.IR

    Question Answering over Unstructured Data without Domain Restrictions

    Authors: Jochen L. Leidner

    Abstract: Information needs are naturally represented as questions. Automatic Natural-Language Question Answering (NLQA) has only recently become a practical task on a larger scale and without domain constraints. This paper gives a brief introduction to the field, its history and the impact of systematic evaluation competitions. It is then demonstrated that an NLQA system for English can be built and… ▽ More

    Submitted 18 July, 2002; v1 submitted 14 July, 2002; originally announced July 2002.

    Comments: 8 pages, 6 figures, 5 tables. To appear in Proc. TaCoS'02, Potsdam, Germany

    ACM Class: I.2.7; H.3.1