Skip to main content

Showing 1–20 of 20 results for author: Fafalios, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.02635  [pdf, other

    cs.DL cs.DB

    FastCat Catalogues: Interactive Entity-based Exploratory Analysis of Archival Documents

    Authors: Georgios Rinakakis, Kostas Petrakis, Yannis Tzitzikas, Pavlos Fafalios

    Abstract: We describe FastCat Catalogues, a Web application that supports researchers studying archival material, such as historians, in exploring and quantitatively analysing the data (transcripts) of archival documents. The application was designed based on real information needs provided by a large group of researchers, makes use of JSON technology, and is configurable for use over any type of archival d… ▽ More

    Submitted 20 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: This is a preprint of a paper accepted for publication at the 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

  2. A Workflow Model for Holistic Data Management and Semantic Interoperability in Quantitative Archival Research

    Authors: Pavlos Fafalios, Yannis Marketakis, Anastasia Axaridou, Yannis Tzitzikas, Martin Doerr

    Abstract: Archival research is a complicated task that involves several diverse activities for the extraction of evidence and knowledge from a set of archival documents. The involved activities are usually unconnected, in terms of data connection and flow, making difficult their recursive revision and execution, as well as the inspection of provenance information at data element level. This paper proposes a… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: This is a preprint of an article accepted for publication in Digital Scholarship in the Humanities (DSH), published by Oxford University Press

  3. The SeaLiT Ontology -- An Extension of CIDOC-CRM for the Modeling and Integration of Maritime History Information

    Authors: Pavlos Fafalios, Athina Kritsotaki, Martin Doerr

    Abstract: We describe the construction and use of the SeaLiT Ontology, an extension of the ISO standard CIDOC-CRM for the modelling and integration of maritime history information. The ontology has been developed gradually, following a bottom-up approach that required the analysis of large amounts of real primary data (archival material) as well as knowledge and validation by domain experts (maritime histor… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: This is a preprint of an article accepted for publication at the ACM Journal on Computing and Cultural Heritage (JOCCH)

  4. arXiv:2210.09100  [pdf, other

    cs.DB

    Estimating the Cost of Executing Link Traversal based SPARQL Queries

    Authors: Antonis Sklavos, Pavlos Fafalios, Yannis Tzitzikas

    Abstract: An increasing number of organisations in almost all fields have started adopting semantic web technologies for publishing their data as open, linked and interoperable (RDF) datasets, queryable through the SPARQL language and protocol. Link traversal has emerged as a SPARQL query processing method that exploits the Linked Data principles and the dynamic nature of the Web to dynamically discover dat… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  5. Towards Semantic Interoperability in Historical Research: Documenting Research Data and Knowledge with Synthesis

    Authors: Pavlos Fafalios, Konstantina Konsolaki, Lida Charami, Kostas Petrakis, Manos Paterakis, Dimitris Angelakis, Yannis Tzitzikas, Chrysoula Bekiari, Martin Doerr

    Abstract: A vast area of research in historical science concerns the documentation and study of artefacts and related evidence. Current practice mostly uses spreadsheets or simple relational databases to organise the information as rows with multiple columns of related attributes. This form offers itself for data analysis and scholarly interpretation, however it also poses problems including i) the difficul… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: This is a preprint of an article accepted for publication at the 20th International Semantic Web Conference (ISWC 2021)

  6. arXiv:2105.13733  [pdf, other

    cs.DL cs.DB

    FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities

    Authors: Pavlos Fafalios, Kostas Petrakis, Georgios Samaritakis, Korina Doerr, Athina Kritsotaki, Yannis Tzitzikas, Martin Doerr

    Abstract: Descriptive and empirical sciences, such as History, are the sciences that collect, observe and describe phenomena in order to explain them and draw interpretative conclusions about influences, driving forces and impacts under given circumstances. Spreadsheet software and relational database management systems are still the dominant tools for quantitative analysis and overall data management in th… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: This is a preprint of an article accepted for publication at the ACM Journal on Computing and Cultural Heritage (JOCCH)

  7. Better Together -- An Ensemble Learner for Combining the Results of Ready-made Entity Linking Systems

    Authors: Renato Stoffalette João, Pavlos Fafalios, Stefan Dietze

    Abstract: Entity linking (EL) is the task of automatically identifying entity mentions in text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. Throughout the past decade, a plethora of EL systems and pipelines have become available, where performance of individual systems varies heavily across corpora, languages or domains. Linking performance varies even between d… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

  8. Exploiting stance hierarchies for cost-sensitive stance detection of Web documents

    Authors: Arjun Roy, Pavlos Fafalios, Asif Ekbal, Xiaofei Zhu, Stefan Dietze

    Abstract: Fact checking is an essential challenge when combating fake news. Identifying documents that agree or disagree with a particular statement (claim) is a core task in this process. In this context, stance detection aims at identifying the position (stance) of a document towards a claim. Most approaches address this task through a 4-class classification model where the class distribution is highly im… ▽ More

    Submitted 17 May, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: This is a pre-print version of the Journal paper published in J Intell Inf Syst (2021) (Springer). https://rdcu.be/ckLiC

  9. TweetsCOV19 -- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

    Authors: Dimitar Dimitrov, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, Stefan Dietze

    Abstract: Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigati… ▽ More

    Submitted 15 August, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

  10. arXiv:2001.09762  [pdf, other

    cs.CY

    Bias in Data-driven AI Systems -- An Introductory Survey

    Authors: Eirini Ntoutsi, Pavlos Fafalios, Ujwal Gadiraju, Vasileios Iosifidis, Wolfgang Nejdl, Maria-Esther Vidal, Salvatore Ruggieri, Franco Turini, Symeon Papadopoulos, Emmanouil Krasanakis, Ioannis Kompatsiaris, Katharina Kinder-Kurlanda, Claudia Wagner, Fariba Karimi, Miriam Fernandez, Harith Alani, Bettina Berendt, Tina Kruegel, Christian Heinze, Klaus Broelemann, Gjergji Kasneci, Thanassis Tiropanis, Steffen Staab

    Abstract: AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their desig… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 19 pages, 1 figure

  11. How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?

    Authors: Pavlos Fafalios, Yannis Tzitzikas

    Abstract: The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation method is through link traversal, where a query is… ▽ More

    Submitted 13 December, 2018; originally announced January 2019.

    Comments: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019)

  12. arXiv:1812.10387  [pdf, ps, other

    cs.CL cs.IR cs.LG stat.ML

    Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty

    Authors: Renato Stoffalette João, Pavlos Fafalios, Stefan Dietze

    Abstract: Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefuln… ▽ More

    Submitted 13 December, 2018; originally announced December 2018.

    Comments: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019)

  13. Towards a Ranking Model for Semantic Layers over Digital Archives

    Authors: Pavlos Fafalios, Vaibhav Kasturia, Wolfgang Nejdl

    Abstract: Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn can be queried through a structured query languag… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  14. Ranking Archived Documents for Structured Queries on Semantic Layers

    Authors: Pavlos Fafalios, Vaibhav Kasturia, Wolfgang Nejdl

    Abstract: Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  15. Viewpoint Discovery and Understanding in Social Networks

    Authors: Mainul Quraishi, Pavlos Fafalios, Eelco Herder

    Abstract: The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, ec… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  16. arXiv:1810.11017  [pdf, ps, other

    cs.SI cs.DL cs.IR

    Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

    Authors: Pavlos Fafalios, Vasileios Iosifidis, Kostas Stefanidis, Eirini Ntoutsi

    Abstract: How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In par… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

    Comments: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018)

  17. Building and Querying Semantic Layers for Web Archives (Extended Version)

    Authors: Pavlos Fafalios, Helge Holzmann, Vaibhav Kasturia, Wolfgang Nejdl

    Abstract: Web archiving is the process of collecting portions of the Web to ensure that the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into a usable and useful information source. In this paper, we focus on this problem a… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

    Comments: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018)

    Journal ref: International Journal on Digital Libraries, ISSN: 1432-5012 (Print) 1432-1300 (Online), 2018

  18. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets

    Authors: Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, Stefan Dietze

    Abstract: Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, span… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  19. arXiv:1810.10004  [pdf, ps, other

    cs.IR cs.LG stat.ML

    Time-Aware and Corpus-Specific Entity Relatedness

    Authors: Nilamadhaba Mohapatra, Vasileios Iosifidis, Asif Ekbal, Stefan Dietze, Pavlos Fafalios

    Abstract: Entity relatedness has emerged as an important feature in a plethora of applications such as information retrieval, entity recommendation and entity linking. Given an entity, for instance a person or an organization, entity relatedness measures can be exploited for generating a list of highly-related entities. However, the relation of an entity to some other entity depends on several factors, with… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  20. arXiv:1810.09780  [pdf, ps, other

    cs.DB

    Heuristics-based Query Reordering for Federated Queries in SPARQL 1.1 and SPARQL-LD

    Authors: Thanos Yannakis, Pavlos Fafalios, Yannis Tzitzikas

    Abstract: The federated query extension of SPARQL 1.1 allows executing queries distributed over different SPARQL endpoints. SPARQL-LD is a recent extension of SPARQL 1.1 which enables to directly query any HTTP web source containing RDF data, like web pages embedded with RDFa, JSON-LD or Microformats, without requiring the declaration of named graphs. This makes possible to query a large number of data sour… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.