-
Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections
Authors:
Orfeas Menis Mastromichalakis,
Jason Liartis,
Kristina Rose,
Antoine Isaac,
Giorgos Stamou
Abstract:
Cultural Heritage (CH) data hold invaluable knowledge, reflecting the history, traditions, and identities of societies, and shaping our understanding of the past and present. However, many CH collections contain outdated or offensive descriptions that reflect historical biases. CH Institutions (CHIs) face significant challenges in curating these data due to the vast scale and complexity of the tas…
▽ More
Cultural Heritage (CH) data hold invaluable knowledge, reflecting the history, traditions, and identities of societies, and shaping our understanding of the past and present. However, many CH collections contain outdated or offensive descriptions that reflect historical biases. CH Institutions (CHIs) face significant challenges in curating these data due to the vast scale and complexity of the task. To address this, we develop an AI-powered tool that detects offensive terms in CH metadata and provides contextual insights into their historical background and contemporary perception. We leverage a multilingual vocabulary co-created with marginalized communities, researchers, and CH professionals, along with traditional NLP techniques and Large Language Models (LLMs). Available as a standalone web app and integrated with major CH platforms, the tool has processed over 7.9 million records, contextualizing the contentious terms detected in their metadata. Rather than erasing these terms, our approach seeks to inform, making biases visible and providing actionable insights for creating more inclusive and accessible CH collections.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Toward using GANs in astrophysical Monte-Carlo simulations
Authors:
Ahab Isaac,
Wesley Armour,
Karel Adámek
Abstract:
Accurate modelling of spectra produced by X-ray sources requires the use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an…
▽ More
Accurate modelling of spectra produced by X-ray sources requires the use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an example of the Maxwell-Jüttner distribution that describes the speed of relativistic electrons, that the generative adversarial network (GAN) is capable of statistically replicating the distribution. The average value of the Kolmogorov-Smirnov test is 0.5 for samples generated by the neural network, showing that the generated distribution cannot be distinguished from the true distribution.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Design and Performance of a Novel Low Energy Multi-Species Beamline for the ALPHA Antihydrogen Experiment
Authors:
C. J. Baker,
W. Bertsche,
A. Capra,
C. L. Cesar,
M. Charlton,
A. J. Christensen,
R. Collister,
A. Cridland Mathad,
S. Eriksson,
A. Evans,
N. Evetts,
S. Fabbri,
J. Fajans,
T. Friesen,
M. C. Fujiwara,
D. R. Gill,
P. Grandemange,
P. Granum,
J. S. Hangst,
M. E. Hayden,
D. Hodgkinson,
C. A. Isaac,
M. A. Johnson,
J. M. Jones,
S. A. Jones
, et al. (25 additional authors not shown)
Abstract:
The ALPHA Collaboration, based at the CERN Antiproton Decelerator, has recently implemented a novel beamline for low-energy ($\lesssim$ 100 eV) positron and antiproton transport between cylindrical Penning traps that have strong axial magnetic fields. Here, we describe how a combination of semianalytical and numerical calculations were used to optimise the layout and design of this beamline. Using…
▽ More
The ALPHA Collaboration, based at the CERN Antiproton Decelerator, has recently implemented a novel beamline for low-energy ($\lesssim$ 100 eV) positron and antiproton transport between cylindrical Penning traps that have strong axial magnetic fields. Here, we describe how a combination of semianalytical and numerical calculations were used to optimise the layout and design of this beamline. Using experimental measurements taken during the initial commissioning of the instrument, we evaluate its performance and validate the models used for its development. By combining data from a range of sources, we show that the beamline has a high transfer efficiency, and estimate that the percentage of particles captured in the experiments from each bunch is (78 $\pm$ 3)% for up to $10^{5}$ antiprotons, and (71 $\pm$ 5)% for bunches of up to $10^{7}$ positrons.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer
Authors:
Yue Ju,
Alka Isac,
Yimin Nie
Abstract:
The analysis of long sequence data remains challenging in many real-world applications. We propose a novel architecture, ChunkFormer, that improves the existing Transformer framework to handle the challenges while dealing with long time series. Original Transformer-based models adopt an attention mechanism to discover global information along a sequence to leverage the contextual data. Long sequen…
▽ More
The analysis of long sequence data remains challenging in many real-world applications. We propose a novel architecture, ChunkFormer, that improves the existing Transformer framework to handle the challenges while dealing with long time series. Original Transformer-based models adopt an attention mechanism to discover global information along a sequence to leverage the contextual data. Long sequential data traps local information such as seasonality and fluctuations in short data sequences. In addition, the original Transformer consumes more resources by carrying the entire attention matrix during the training course. To overcome these challenges, ChunkFormer splits the long sequences into smaller sequence chunks for the attention calculation, progressively applying different chunk sizes in each stage. In this way, the proposed model gradually learns both local and global information without changing the total length of the input sequences. We have extensively tested the effectiveness of this new architecture on different business domains and have proved the advantage of such a model over the existing Transformer-based models.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Limit on the Electric Charge of Antihydrogen
Authors:
A. Capra,
C. Amole,
M. D. Ashkezari,
M. Baquero-Ruiz,
W. Bertsche,
E. Butler,
C. L. Cesar,
M. Charlton,
S. Eriksson,
J. Fajans,
T. Friesen,
M. C. Fujiwara,
D. R. Gill,
A. Gutierrez,
J. S. Hangst,
W. N. Hardy,
M. E. Hayden,
C. A. Isaac,
S. Jonsell,
L . Kurchaninov,
A. Little,
J. T. K. McKenna,
S. Menary,
S. C. Napoli,
P. Nolan
, et al. (15 additional authors not shown)
Abstract:
The ALPHA collaboration has successfully demonstrated the production and the confinement of cold antihydrogen, $\overline{\mathrm{H}}$. An analysis of trapping data allowed a stringent limit to be placed on the electric charge of the simplest antiatom. Charge neutrality of matter is known to a very high precision, hence a neutrality limit of $\overline{\mathrm{H}}$ provides a test of CPT invarianc…
▽ More
The ALPHA collaboration has successfully demonstrated the production and the confinement of cold antihydrogen, $\overline{\mathrm{H}}$. An analysis of trapping data allowed a stringent limit to be placed on the electric charge of the simplest antiatom. Charge neutrality of matter is known to a very high precision, hence a neutrality limit of $\overline{\mathrm{H}}$ provides a test of CPT invariance. The experimental technique is based on the measurement of the deflection of putatively charged $\overline{\mathrm{H}}$ in an electric field. The tendency for trapped $\overline{\mathrm{H}}$ atoms to be displaced by electrostatic fields is measured and compared to the results of a detailed simulation of $\overline{\mathrm{H}}$ dynamics in the trap. An extensive survey of the systematic errors is performed, with particular attention to those due to the silicon vertex detector, which is the device used to determine the $\overline{\mathrm{H}}$ annihilation position. The limit obtained on the charge of the $\overline{\mathrm{H}}$ atom is \mbox{$ Q = (-1.3\pm1.8\pm0.4)\times10^{-8}$}, representing the first precision measurement with $\overline{\mathrm{H}}$.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Permutohedral Attention Module for Efficient Non-Local Neural Networks
Authors:
Samuel Joutard,
Reuben Dorent,
Amanda Isaac,
Sebastien Ourselin,
Tom Vercauteren,
Marc Modat
Abstract:
Medical image processing tasks such as segmentation often require capturing non-local information. As organs, bones, and tissues share common characteristics such as intensity, shape, and texture, the contextual information plays a critical role in correctly labeling them. Segmentation and labeling is now typically done with convolutional neural networks (CNNs) but the context of the CNN is limite…
▽ More
Medical image processing tasks such as segmentation often require capturing non-local information. As organs, bones, and tissues share common characteristics such as intensity, shape, and texture, the contextual information plays a critical role in correctly labeling them. Segmentation and labeling is now typically done with convolutional neural networks (CNNs) but the context of the CNN is limited by the receptive field which itself is limited by memory requirements and other properties. In this paper, we propose a new attention module, that we call Permutohedral Attention Module (PAM), to efficiently capture non-local characteristics of the image. The proposed method is both memory and computationally efficient. We provide a GPU implementation of this module suitable for 3D medical imaging problems. We demonstrate the efficiency and scalability of our module with the challenging task of vertebrae segmentation and labeling where context plays a crucial role because of the very similar appearance of different vertebrae.
△ Less
Submitted 27 August, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.
-
Knowledge Graphs in the Libraries and Digital Humanities Domain
Authors:
Bernhard Haslhofer,
Antoine Isaac,
Rainer Simon
Abstract:
Knowledge graphs represent concepts (e.g., people, places, events) and their semantic relationships. As a data structure, they underpin a digital information system, support users in resource discovery and retrieval, and are useful for navigation and visualization purposes. Within the libaries and humanities domain, knowledge graphs are typically rooted in knowledge organization systems, which hav…
▽ More
Knowledge graphs represent concepts (e.g., people, places, events) and their semantic relationships. As a data structure, they underpin a digital information system, support users in resource discovery and retrieval, and are useful for navigation and visualization purposes. Within the libaries and humanities domain, knowledge graphs are typically rooted in knowledge organization systems, which have a century-old tradition and have undergone their digital transformation with the advent of the Web and Linked Data. Being exposed to the Web, metadata and concept definitions are now forming an interconnected and decentralized global knowledge network that can be curated and enriched by community-driven editorial processes. In the future, knowledge graphs could be vehicles for formalizing and connecting findings and insights derived from the analysis of possibly large-scale corpora in the libraries and digital humanities domain.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Rightsstatements.org White Paper: Requirements for the Technical Infrastructure for Standardized International Rights Statements
Authors:
Sascha Adler,
Plaban Kumar Bhowmik,
Valentine Charles,
Esmé Cowles,
Karen Estlund,
Antoine Isaac,
Tom Johnson,
M. A. Matienzo,
Patrick Peiffer,
Mark Raadgever,
Richard J. Urban,
Maarten Zeinstra
Abstract:
This document is part of the deliverables created by the RightsStatements.org consortium. It provides the technical requirements for implementation of the Standardized International Rights Statements. These requirements are based on the principles and specifications found in the normative Recommendations for Standardized International Rights Statements. This document replaces and supersedes the pr…
▽ More
This document is part of the deliverables created by the RightsStatements.org consortium. It provides the technical requirements for implementation of the Standardized International Rights Statements. These requirements are based on the principles and specifications found in the normative Recommendations for Standardized International Rights Statements. This document replaces and supersedes the previously released Recommendations for the Technical Infrastructure for Standardized Rights Statements, released by this working group. The Requirements for the Technical Infrastructure for Standardized International Rights Statements describes the expected behaviours for a service that enables the delivery of human and machine-readable representations of the rights statements. It documents the fundamental decisions that informed the development of a data model grounded in Linked Data approaches. This document also provides proposed implementation guidelines and a non-normative set of examples for incorporating rights statements into provider metadata.
△ Less
Submitted 24 August, 2022; v1 submitted 1 December, 2015;
originally announced July 2016.
-
Recommendations for the Technical Infrastructure for Standardized International Rights Statements
Authors:
Valentine Charles,
Esmé Cowles,
Karen Estlund,
Antoine Isaac,
Tom Johnson,
M. A. Matienzo,
Patrick Peiffer,
Richard J. Urban,
Maarten Zeinstra
Abstract:
This white paper is the product of a joint Digital Public Library of America (DPLA)-Europeana working group organized to develop minimum rights statement metadata standards for organizations that contribute to DPLA and Europeana. This white paper deals specifically with the technical infrastructure of a common namespace (rightsstatements.org) that hosts the rights statements to be used by (at mini…
▽ More
This white paper is the product of a joint Digital Public Library of America (DPLA)-Europeana working group organized to develop minimum rights statement metadata standards for organizations that contribute to DPLA and Europeana. This white paper deals specifically with the technical infrastructure of a common namespace (rightsstatements.org) that hosts the rights statements to be used by (at minimum) the DPLA and Europeana. These recommendations for a common technical infrastructure for rights statements outline a simple, flexible, and extensible framework to host the rights statements at rightsstatements.org. This white paper specifically outlines the management of rights statements as linked open data. The rights statements are published according to Best Practices for Publishing RDF Vocabularies. They are encoded into dereferenceable URIs, express further information encoded in RDF, and link to existing vocabularies and standards. The rights statements adhere to expressions of existing rights vocabularies. Furthermore the paper reviews the publication and implementation to make the rights statements available through human-readable web pages augmented with machine-readable formats.
△ Less
Submitted 24 August, 2022; v1 submitted 1 December, 2015;
originally announced December 2015.
-
In situ electromagnetic field diagnostics with an electron plasma in a Penning-Malmberg trap
Authors:
C. Amole,
M. D. Ashkezari,
M. Baquero-Ruiz,
W. Bertsche,
E. Butler,
A. Capra,
C. L. Cesar,
M. Charlton,
A. Deller,
N. Evetts,
S. Eriksson,
J. Fajans,
T. Friesen,
M. C. Fujiwara,
D. R. Gill,
A. Gutierrez,
J. S. Hangst,
W. N. Hardy,
M. E. Hayden,
C. A. Isaac,
S. Jonsell,
L. Kurchaninov,
A. Little,
N. Madsen,
J. T. K. McKenna
, et al. (15 additional authors not shown)
Abstract:
We demonstrate a novel detection method for the cyclotron resonance frequency of an electron plasma in a Penning-Malmberg trap. With this technique, the electron plasma is used as an in situ diagnostic tool for measurement of the static magnetic field and the microwave electric field in the trap. The cyclotron motion of the electron plasma is excited by microwave radiation and the temperature chan…
▽ More
We demonstrate a novel detection method for the cyclotron resonance frequency of an electron plasma in a Penning-Malmberg trap. With this technique, the electron plasma is used as an in situ diagnostic tool for measurement of the static magnetic field and the microwave electric field in the trap. The cyclotron motion of the electron plasma is excited by microwave radiation and the temperature change of the plasma is measured non-destructively by monitoring the plasma's quadrupole mode frequency. The spatially-resolved microwave electric field strength can be inferred from the plasma temperature change and the magnetic field is found through the cyclotron resonance frequency. These measurements were used extensively in the recently reported demonstration of resonant quantum interactions with antihydrogen.
△ Less
Submitted 4 May, 2014;
originally announced May 2014.
-
Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model
Authors:
Valentine Charles,
Antoine Isaac,
Kate Fernie,
Costis Dallas,
Dimitris Gavrilis,
Stavros Angelis
Abstract:
Mapping between different data models in a data aggregation context always presents significant interoperability challenges. In this paper, we describe the challenges faced and solutions developed when mapping the CARARE schema designed for archaeological and architectural monuments and sites to the Europeana Data Model (EDM), a model based on Linked Data principles, for the purpose of integrating…
▽ More
Mapping between different data models in a data aggregation context always presents significant interoperability challenges. In this paper, we describe the challenges faced and solutions developed when mapping the CARARE schema designed for archaeological and architectural monuments and sites to the Europeana Data Model (EDM), a model based on Linked Data principles, for the purpose of integrating more than two million metadata records from national monument collections and databases across Europe into the Europeana digital library.
△ Less
Submitted 27 December, 2013; v1 submitted 12 June, 2013;
originally announced June 2013.
-
Hierarchical structuring of Cultural Heritage objects within large aggregations
Authors:
Shenghui Wang,
Antoine Isaac,
Valentine Charles,
Rob Koopman,
Anthi Agoropoulou,
Titia van der Werf
Abstract:
Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hier- archically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with…
▽ More
Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hier- archically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.
△ Less
Submitted 27 December, 2013; v1 submitted 12 June, 2013;
originally announced June 2013.
-
Key Choices in the Design of Simple Knowledge Organization System (SKOS)
Authors:
Thomas Baker,
Sean Bechhofer,
Antoine Isaac,
Alistair Miles,
Guus Schreiber,
Ed Summers
Abstract:
Simple Knowledge Organization System (SKOS) provides a data model and vocabulary for expressing Knowledge Organization Systems (KOSs) such as thesauri and classification schemes in Semantic Web applications. This paper presents the main components of SKOS and their formal expression in Web Ontology Language (OWL), providing an extensive account of the design decisions taken by the Semantic Web Dep…
▽ More
Simple Knowledge Organization System (SKOS) provides a data model and vocabulary for expressing Knowledge Organization Systems (KOSs) such as thesauri and classification schemes in Semantic Web applications. This paper presents the main components of SKOS and their formal expression in Web Ontology Language (OWL), providing an extensive account of the design decisions taken by the Semantic Web Deployment (SWD) Working Group of the World Wide Web Consortium (W3C), which between 2006 and 2009 brought SKOS to the status of W3C Recommendation. The paper explains key design principles such as "minimal ontological commitment" and systematically cites the requirements and issues that influenced the design of SKOS components.
By reconstructing the discussion around alternative features and design options and presenting the rationale for design decisions, the paper aims at providing insight into how SKOS turned out as it did, and why. Assuming that SKOS, like any other successful technology, may eventually be subject to revision and improvement, the critical account offered here may help future editors approach such a task with deeper understanding.
△ Less
Submitted 5 February, 2013;
originally announced February 2013.
-
Finding Quality Issues in SKOS Vocabularies
Authors:
Christian Mader,
Bernhard Haslhofer,
Antoine Isaac
Abstract:
The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SK…
▽ More
The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.
△ Less
Submitted 6 June, 2012;
originally announced June 2012.
-
Informatics Issues Used in the Production Dashboard
Authors:
Alin Isac,
Claudia Isac
Abstract:
The aim of this paper is to present some computer aspects regarding the implementation and the employing of a dashboard in relation to the production activity. The paper begins with the theoretical presentation of the managerial perspective regarding the necessity of using the dashboard. The main functions of the dashboard in the production activity and the way it is employed are presented in th…
▽ More
The aim of this paper is to present some computer aspects regarding the implementation and the employing of a dashboard in relation to the production activity. The paper begins with the theoretical presentation of the managerial perspective regarding the necessity of using the dashboard. The main functions of the dashboard in the production activity and the way it is employed are presented in the second part of the paper.
△ Less
Submitted 29 May, 2009;
originally announced May 2009.
-
LCSH, SKOS and Linked Data
Authors:
Ed Summers,
Antoine Isaac,
Clay Redding,
Dan Krech
Abstract:
A technique for converting Library of Congress Subject Headings MARCXML to Simple Knowledge Organization System (SKOS) RDF is described. Strengths of the SKOS vocabulary are highlighted, as well as possible points for extension, and the integration of other semantic web vocabularies such as Dublin Core. An application for making the vocabulary available as linked-data on the Web is also describe…
▽ More
A technique for converting Library of Congress Subject Headings MARCXML to Simple Knowledge Organization System (SKOS) RDF is described. Strengths of the SKOS vocabulary are highlighted, as well as possible points for extension, and the integration of other semantic web vocabularies such as Dublin Core. An application for making the vocabulary available as linked-data on the Web is also described.
△ Less
Submitted 3 July, 2008; v1 submitted 19 May, 2008;
originally announced May 2008.