Skip to main content

Showing 1–6 of 6 results for author: Gossen, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2001.07524  [pdf, other

    cs.LG cs.AI stat.ML

    Node Masking: Making Graph Neural Networks Generalize and Scale Better

    Authors: Pushkar Mishra, Aleksandra Piktus, Gerard Goossen, Fabrizio Silvestri

    Abstract: Graph Neural Networks (GNNs) have received a lot of interest in the recent times. From the early spectral architectures that could only operate on undirected graphs per a transductive learning paradigm to the current state of the art spatial ones that can apply inductively to arbitrary graphs, GNNs have seen significant contributions from the research community. In this paper, we utilize some theo… ▽ More

    Submitted 16 May, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

  2. arXiv:1707.09217  [pdf, ps, other

    cs.DL cs.IR

    Extracting Event-Centric Document Collections from Large-Scale Web Archives

    Authors: Gerhard Gossen, Elena Demidova, Thomas Risse

    Abstract: Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These collections constitute an increasingly important source for researchers in the social sciences, the historical sciences and journalists interested in studying past events. However, there are currently no access methods that help users… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: To be published in the proceedings of the Conference on Theory and Practice of Digital Libraries (TPDL) 2017

  3. Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives

    Authors: Tarcisio Souza, Elena Demidova, Thomas Risse, Helge Holzmann, Gerhard Gossen, Julian Szymanski

    Abstract: Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provid… ▽ More

    Submitted 2 February, 2017; originally announced February 2017.

  4. iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

    Authors: Gerhard Gossen, Elena Demidova, Thomas Risse

    Abstract: Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social… ▽ More

    Submitted 19 December, 2016; originally announced December 2016.

    Comments: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 2015

    Journal ref: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 75--84) (2015)

  5. The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification

    Authors: Gerhard Gossen, Elena Demidova, Thomas Risse

    Abstract: Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choice of seed URLs influences the quality of the resu… ▽ More

    Submitted 19 December, 2016; originally announced December 2016.

    Comments: Published in the Proceedings of the European Conference on Information Retrieval (ECIR) 2015

  6. Analyzing Web Archives Through Topic and Event Focused Sub-collections

    Authors: Gerhard Gossen, Elena Demidova, Thomas Risse

    Abstract: Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodol… ▽ More

    Submitted 16 December, 2016; originally announced December 2016.

    Comments: Published in the proceedings of the 8th ACM Conference on Web Science 2016

    Journal ref: Proceedings of the 8th ACM Conference on Web Science (2016, pp. 291--295)