-
Analysis of the Usability of Automatically Enriched Cultural Heritage Data
Authors:
Julien Antoine Raemy,
Robert Sanderson
Abstract:
This chapter presents the potential of interoperability and standardised data publication for cultural heritage resources, with a focus on community-driven approaches and web standards for usability. The Linked Open Usable Data (LOUD) design principles, which rely on JSON-LD as lingua franca, serve as the foundation.
We begin by exploring the significant advances made by the International Image…
▽ More
This chapter presents the potential of interoperability and standardised data publication for cultural heritage resources, with a focus on community-driven approaches and web standards for usability. The Linked Open Usable Data (LOUD) design principles, which rely on JSON-LD as lingua franca, serve as the foundation.
We begin by exploring the significant advances made by the International Image Interoperability Framework (IIIF) in promoting interoperability for image-based resources. The principles and practices of IIIF have paved the way for Linked Art, which expands the use of linked data by demonstrating how it can easily facilitate the integration and sharing of semantic cultural heritage data across portals and institutions.
To provide a practical demonstration of the concepts discussed, the chapter highlights the implementation of LUX, the Yale Collections Discovery platform. LUX serves as a compelling case study for the use of linked data at scale, demonstrating the real-world application of automated enrichment in the cultural heritage domain.
Rooted in empirical study, the analysis presented in this chapter delves into the broader context of community practices and semantic interoperability. By examining the collaborative efforts and integration of diverse cultural heritage resources, the research sheds light on the potential benefits and challenges associated with LOUD.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Real-Time Notification for Resource Synchronization
Authors:
Martin Klein,
Robert Sanderson,
Herbert Van de Sompel,
Michael L. Nelson
Abstract:
Web applications frequently leverage resources made available by remote web servers. As resources are created, updated, deleted, or moved, these applications face challenges to remain in lockstep with the server's change dynamics. Several approaches exist to help meet this challenge for use cases where "good enough" synchronization is acceptable. But when strict resource coverage or low synchroniz…
▽ More
Web applications frequently leverage resources made available by remote web servers. As resources are created, updated, deleted, or moved, these applications face challenges to remain in lockstep with the server's change dynamics. Several approaches exist to help meet this challenge for use cases where "good enough" synchronization is acceptable. But when strict resource coverage or low synchronization latency is required, commonly accepted Web-based solutions remain elusive. This paper details characteristics of an approach that aims at decreasing synchronization latency while maintaining desired levels of accuracy. The approach builds on pushing change notifications and pulling changed resources and it is explored with an experiment based on a DBpedia Live instance.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
Web Synchronization Simulations using the ResourceSync Framework
Authors:
Bernhard Haslhofer,
Simeon Warner,
Carl Lagoze,
Martin Klein,
Robert Sanderson,
Herbert van de Sompel,
Michael L. Nelson
Abstract:
Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis fo…
▽ More
Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis for understanding its properties. We then describe experiments in which simulations of a variety of synchronization scenarios illustrate the effects of model configuration on consistency, latency, and data transfer efficiency. These results provide insight into which congurations are appropriate for various application scenarios.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
ResourceSync: Leveraging Sitemaps for Resource Synchronization
Authors:
Bernhard Haslhofer,
Simeon Warner,
Carl Lagoze,
Martin Klein,
Robert Sanderson,
Michael L. Nelson,
Herbert van de Sompel
Abstract:
Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization protocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work…
▽ More
Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization protocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work to implement this protocol for arXiv.org and also provide an experimental prototype for the English Wikipedia as well as a client API.
△ Less
Submitted 7 May, 2013;
originally announced May 2013.
-
Designing the W3C Open Annotation Data Model
Authors:
Robert Sanderson,
Paolo Ciccarese,
Herbert Van de Sompel
Abstract:
The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for t…
▽ More
The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource. This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made. It also motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.
△ Less
Submitted 24 April, 2013;
originally announced April 2013.
-
Open Annotations on Multimedia Web Resources
Authors:
Bernhard Haslhofer,
Robert Sanderson,
Rainer Simon,
Herbert van de Sompel
Abstract:
Many Web portals allow users to associate additional information with existing multimedia resources such as images, audio, and video. However, these portals are usually closed systems and user-generated annotations are almost always kept locked up and remain inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data p…
▽ More
Many Web portals allow users to associate additional information with existing multimedia resources such as images, audio, and video. However, these portals are usually closed systems and user-generated annotations are almost always kept locked up and remain inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. We present the current state of the Open Annotation Model, explain our design rationale, and describe how the model can represent user annotations on multimedia Web resources. Applying this model in Web portals and devices, which support user annotations, should allow clients to easily publish and consume, thus exchange annotations on multimedia Web resources via common Web standards.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.
-
Evaluating the SharedCanvas Manuscript Data Model in CATCHPlus
Authors:
Robert Sanderson,
Hennie Brugman,
Benjamin Albritton,
Herbert Van de Sompel
Abstract:
In this paper, we present the SharedCanvas model for describing the layout of culturally important, hand-written objects such as medieval manuscripts, which is intended to be used as a common input format to presentation interfaces. The model is evaluated using two collections from CATCHPlus not consulted during the design phase, each with their own complex requirements, in order to determine if f…
▽ More
In this paper, we present the SharedCanvas model for describing the layout of culturally important, hand-written objects such as medieval manuscripts, which is intended to be used as a common input format to presentation interfaces. The model is evaluated using two collections from CATCHPlus not consulted during the design phase, each with their own complex requirements, in order to determine if further development is required or if the model is ready for general usage. The model is applied to the new collections, revealing several new areas of concern for user interface production and discovery of the constituent resources. However, the fundamental information modelling aspects of SharedCanvas and the underlying Open Annotation Collaboration ontology are demonstrated to be sufficient to cover the challenging new requirements. The distributed, Linked Open Data approach is validated as an important methodology to seamlessly allow simultaneous interaction with multiple repositories, and at the same time to facilitate both scholarly commentary and crowd-sourcing of the production of transcriptions.
△ Less
Submitted 17 October, 2011;
originally announced October 2011.
-
The Open Annotation Collaboration (OAC) Model
Authors:
Bernhard Haslhofer,
Rainer Simon,
Robert Sanderson,
Herbert van de Sompel
Abstract:
Annotations allow users to associate additional information with existing resources. Using proprietary and closed systems on the Web, users are already able to annotate multimedia resources such as images, audio and video. So far, however, this information is almost always kept locked up and inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia…
▽ More
Annotations allow users to associate additional information with existing resources. Using proprietary and closed systems on the Web, users are already able to annotate multimedia resources such as images, audio and video. So far, however, this information is almost always kept locked up and inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. This should allow clients to easily publish and consume, thus exchange annotations about resources via common Web standards. We first present the current status of the Open Annotation Collaboration, an international initiative that is currently working on annotation interoperability specifications based on best practices from the Linked Data effort. Then we present two use cases and early prototypes that make use of the proposed annotation model and present lessons learned and discuss yet open technical issues.
△ Less
Submitted 25 June, 2011;
originally announced June 2011.
-
Analyzing the Persistence of Referenced Web Resources with Memento
Authors:
Robert Sanderson,
Mark Phillips,
Herbert Van de Sompel
Abstract:
In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are studied to determine if the nature of the repository, or of its content, has a bearing on the availability of the web resources cited by that content. Memento…
▽ More
In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are studied to determine if the nature of the repository, or of its content, has a bearing on the availability of the web resources cited by that content. Memento makes it possible to automate discovery of archived resources and to consider the time between the publication of the research and the archiving of the referenced URLs. This automation allows us to process more than 160000 URLs, the largest known such study, and the repository metadata allows consideration of the results by discipline. The results are startling: 45% (66096) of the URLs referenced from arXiv still exist, but are not preserved for future generations, and 28% of resources referenced by UNT papers have been lost. Moving forwards, we provide some initial recommendations, including that repositories should publish URL lists extracted from papers that could be used as seeds for web archiving systems.
△ Less
Submitted 17 May, 2011;
originally announced May 2011.
-
SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemination
Authors:
Robert Sanderson,
Benjamin Albritton,
Rafael Schwemmer,
Herbert Van de Sompel
Abstract:
In this paper we present a model based on the principles of Linked Data that can be used to describe the interrelationships of images, texts and other resources to facilitate the interoperability of repositories of medieval manuscripts or other culturally important handwritten documents. The model is designed from a set of requirements derived from the real world use cases of some of the largest d…
▽ More
In this paper we present a model based on the principles of Linked Data that can be used to describe the interrelationships of images, texts and other resources to facilitate the interoperability of repositories of medieval manuscripts or other culturally important handwritten documents. The model is designed from a set of requirements derived from the real world use cases of some of the largest digitized medieval content holders, and instantiations of the model are intended as the input to collection-independent page turning and scholarly presentation interfaces. A canvas painting paradigm, such as in PDF and SVG, was selected based on the lack of a one to one correlation between image and page, and to fulfill complex requirements such as when the full text of a page is known, but only fragments of the physical object remain. The model is implemented using technologies such as OAI-ORE Aggregations and OAC Annotations, as the fundamental building blocks of emerging Linked Digital Libraries. The model and implementation are evaluated through prototypes of both content providing and consuming applications. Although the system was designed from requirements drawn from the medieval manuscript domain, it is applicable to any layout-oriented presentation of images of text.
△ Less
Submitted 14 April, 2011;
originally announced April 2011.
-
An HTTP-Based Versioning Mechanism for Linked Data
Authors:
Herbert Van de Sompel,
Robert Sanderson,
Michael L. Nelson,
Lyudmila L. Balakireva,
Harihar Shankar,
Scott Ainsworth
Abstract:
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also available, for example, as resource versions in Content Management Systems or archival resources in Web Archives such as the Internet Archive. This paper introduces a resource versioning mechanism that is fully based on HTT…
▽ More
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also available, for example, as resource versions in Content Management Systems or archival resources in Web Archives such as the Internet Archive. This paper introduces a resource versioning mechanism that is fully based on HTTP and uses datetime as a global version indicator. The approach allows "follow your nose" style navigation both from the current time-generic resource to associated time-specific version resources as well as among version resources. The proposed versioning mechanism is congruent with the Architecture of the World Wide Web, and is based on the Memento framework that extends HTTP with transparent content negotiation in the datetime dimension. The paper shows how the versioning approach applies to Linked Data, and by means of a demonstrator built for DBpedia, it also illustrates how it can be used to conduct a time-series analysis across versions of Linked Data descriptions.
△ Less
Submitted 18 March, 2010;
originally announced March 2010.
-
Making Web Annotations Persistent over Time
Authors:
Robert Sanderson,
Herbert Van de Sompel
Abstract:
As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotati…
▽ More
As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.
△ Less
Submitted 19 March, 2010; v1 submitted 12 March, 2010;
originally announced March 2010.
-
Memento: Time Travel for the Web
Authors:
Herbert Van de Sompel,
Michael L. Nelson,
Robert Sanderson,
Lyudmila L. Balakireva,
Scott Ainsworth,
Harihar Shankar
Abstract:
The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actua…
▽ More
The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol-wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a significant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web.
△ Less
Submitted 6 November, 2009; v1 submitted 5 November, 2009;
originally announced November 2009.
-
Adding eScience Assets to the Data Web
Authors:
Herbert Van de Sompel,
Carl Lagoze,
Michael L. Nelson,
Simeon Warner,
Robert Sanderson,
Pete Johnston
Abstract:
Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Arch…
▽ More
Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will ensure the integration of the products of scholarly research into the Data Web.
△ Less
Submitted 11 June, 2009;
originally announced June 2009.
-
A Web-Based Resource Model for eScience: Object Reuse & Exchange
Authors:
Carl Lagoze,
Herbert Van de Sompel,
Michael Nelson,
Simeon Warner,
Robert Sanderson,
Pete Johnston
Abstract:
Work in the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) focuses on an important aspect of infrastructure for eScience: the specification of the data model and a suite of implementation standards to identify and describe compound objects. These are objects that aggregate multiple sources of content including text, images, data, visualization tools, and the like. These aggregati…
▽ More
Work in the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) focuses on an important aspect of infrastructure for eScience: the specification of the data model and a suite of implementation standards to identify and describe compound objects. These are objects that aggregate multiple sources of content including text, images, data, visualization tools, and the like. These aggregations are an essential product of eScience, and will become increasingly common in the age of data-driven scholarship. The OAI-ORE specifications conform to the core concepts of the Web architecture and the semantic Web, ensuring that applications that use them will integrate well into the general Web environment.
△ Less
Submitted 4 November, 2008;
originally announced November 2008.
-
Object Re-Use & Exchange: A Resource-Centric Approach
Authors:
Carl Lagoze,
Herbert Van de Sompel,
Michael L. Nelson,
Simeon Warner,
Robert Sanderson,
Pete Johnston
Abstract:
The OAI Object Reuse and Exchange (OAI-ORE) framework recasts the repository-centric notion of digital object to a bounded aggregation of Web resources. In this manner, digital library content is more integrated with the Web architecture, and thereby more accessible to Web applications and clients. This generalized notion of an aggregation that is independent of repository containment conforms m…
▽ More
The OAI Object Reuse and Exchange (OAI-ORE) framework recasts the repository-centric notion of digital object to a bounded aggregation of Web resources. In this manner, digital library content is more integrated with the Web architecture, and thereby more accessible to Web applications and clients. This generalized notion of an aggregation that is independent of repository containment conforms more closely with notions in eScience and eScholarship, where content is distributed across multiple services and databases. We provide a motivation for the OAI-ORE project, review previous interoperability efforts, describe draft ORE specifications and report on promising results from early experimentation that illustrate improved interoperability and reuse of digital objects.
△ Less
Submitted 14 April, 2008;
originally announced April 2008.