-
"Forgetting" in Machine Learning and Beyond: A Survey
Authors:
Alyssa Shuang Sha,
Bernardo Pereira Nunes,
Armin Haller
Abstract:
This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model…
▽ More
This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model performance and enhance data privacy. Moreover, the paper discusses current challenges, future directions, and ethical considerations regarding the integration of forgetting mechanisms into machine learning models.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Syntactic Complexity Identification, Measurement, and Reduction Through Controlled Syntactic Simplification
Authors:
Muhammad Salman,
Armin Haller,
Sergio J. Rodríguez Méndez
Abstract:
Text simplification is one of the domains in Natural Language Processing (NLP) that offers an opportunity to understand the text in a simplified manner for exploration. However, it is always hard to understand and retrieve knowledge from unstructured text, which is usually in the form of compound and complex sentences. There are state-of-the-art neural network-based methods to simplify the sentenc…
▽ More
Text simplification is one of the domains in Natural Language Processing (NLP) that offers an opportunity to understand the text in a simplified manner for exploration. However, it is always hard to understand and retrieve knowledge from unstructured text, which is usually in the form of compound and complex sentences. There are state-of-the-art neural network-based methods to simplify the sentences for improved readability while replacing words with plain English substitutes and summarising the sentences and paragraphs. In the Knowledge Graph (KG) creation process from unstructured text, summarising long sentences and substituting words is undesirable since this may lead to information loss. However, KG creation from text requires the extraction of all possible facts (triples) with the same mentions as in the text. In this work, we propose a controlled simplification based on the factual information in a sentence, i.e., triple. We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences. This simplification process will retain the original wording with a simple structure of possible domain facts in each sentence, i.e., triples. The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity (SC), followed by reduction through a controlled syntactic simplification process. Last, an experiment for a dataset re-annotation is also conducted through GPT3; we aim to publish this refined corpus as a resource. This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference. The code and data is available at www.github.com/sallmanm/SynSim.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
A System's Approach Taxonomy for User-Centred XAI: A Survey
Authors:
Ehsan Emamirad,
Pouya Ghiasnezhad Omran,
Armin Haller,
Shirley Gregor
Abstract:
Recent advancements in AI have coincided with ever-increasing efforts in the research community to investigate, classify and evaluate various methods aimed at making AI models explainable. However, most of existing attempts present a method-centric view of eXplainable AI (XAI) which is typically meaningful only for domain experts. There is an apparent lack of a robust qualitative and quantitative…
▽ More
Recent advancements in AI have coincided with ever-increasing efforts in the research community to investigate, classify and evaluate various methods aimed at making AI models explainable. However, most of existing attempts present a method-centric view of eXplainable AI (XAI) which is typically meaningful only for domain experts. There is an apparent lack of a robust qualitative and quantitative performance framework that evaluates the suitability of explanations for different types of users. We survey relevant efforts, and then, propose a unified, inclusive and user-centred taxonomy for XAI based on the principles of General System's Theory, which serves us as a basis for evaluating the appropriateness of XAI approaches for all user types, including both developers and end users.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
TNNT: The Named Entity Recognition Toolkit
Authors:
Sandaru Seneviratne,
Sergio J. Rodríguez Méndez,
Xuecheng Zhang,
Pouya G. Omran,
Kerry Taylor,
Armin Haller
Abstract:
Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed deci…
▽ More
Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed decisions. This paper presents TNNT, a toolkit that automates the extraction of categorised named entities from unstructured information encoded in source documents, using diverse state-of-the-art Natural Language Processing (NLP) tools and NER models. TNNT integrates 21 different NER models as part of a Knowledge Graph Construction Pipeline (KGCP) that takes a document set as input and processes it based on the defined settings, applying the selected blocks of NER models to output the results. The toolkit generates all results with an integrated summary of the extracted entities, enabling enhanced data analysis to support the KGCP, and also, to aid further NLP tasks.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
How question quality drives Web performance in community question answering sites
Authors:
Alyssa Shuang Sha,
Yingnan Shi,
Armin Haller
Abstract:
Users are posting millions of questions on Community question answering sites each day. The quality of those questions significantly affects the satisfactions of the sites' users and, therefore, sites' traffic. We gathered 15 question-quality related features from one of the largest CQA sites and the site's pageview data to estimate the scale of the effect in the corresponding time series. By usin…
▽ More
Users are posting millions of questions on Community question answering sites each day. The quality of those questions significantly affects the satisfactions of the sites' users and, therefore, sites' traffic. We gathered 15 question-quality related features from one of the largest CQA sites and the site's pageview data to estimate the scale of the effect in the corresponding time series. By using a Grey Relational Analysis, we rank those question quality features and estimate the relative strength of these factors on a page's view numbers. Our results show that the features of question quality have a significant influence on web performance. We generate a ranked list of features and find that digital popularity and textual features can drive the page traffic more than questioner related features and question difficulty. The implications of the findings for Web growth and future research are discussed.
△ Less
Submitted 22 December, 2020; v1 submitted 11 December, 2020;
originally announced December 2020.
-
SOSA: A Lightweight Ontology for Sensors, Observations, Samples, and Actuators
Authors:
Krzysztof Janowicz,
Armin Haller,
Simon J D Cox,
Danh Le Phuoc,
Maxime Lefrancois
Abstract:
The Sensor, Observation, Sample, and Actuator (SOSA) ontology provides a formal but lightweight general-purpose specification for modeling the interaction between the entities involved in the acts of observation, actuation, and sampling. SOSA is the result of rethinking the W3C-XG Semantic Sensor Network (SSN) ontology based on changes in scope and target audience, technical developments, and less…
▽ More
The Sensor, Observation, Sample, and Actuator (SOSA) ontology provides a formal but lightweight general-purpose specification for modeling the interaction between the entities involved in the acts of observation, actuation, and sampling. SOSA is the result of rethinking the W3C-XG Semantic Sensor Network (SSN) ontology based on changes in scope and target audience, technical developments, and lessons learned over the past years. SOSA also acts as a replacement of SSN's Stimulus Sensor Observation (SSO) core. It has been developed by the first joint working group of the Open Geospatial Consortium (OGC) and the World Wide Web Consortium (W3C) on \emph{Spatial Data on the Web}. In this work, we motivate the need for SOSA, provide an overview of the main classes and properties, and briefly discuss its integration with the new release of the SSN ontology as well as various other alignments to specifications such as OGC's Observations and Measurements (O\&M), Dolce-Ultralite (DUL), and other prominent ontologies. We will also touch upon common modeling problems and application areas related to publishing and searching observation, sampling, and actuation data on the Web. The SOSA ontology and standard can be accessed at \url{https://www.w3.org/TR/vocab-ssn/}.
△ Less
Submitted 25 December, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
An Ontology based System for Cloud Infrastructure Services Discovery
Authors:
Miranda Zhang,
Rajiv Ranjan,
Armin Haller,
Dimitrios Georgakopoulos,
Michael Menzel,
Surya Nepal
Abstract:
The Cloud infrastructure services landscape advances steadily leaving users in the agony of choice. As a result, Cloud service identification and discovery remains a hard problem due to different service descriptions, non standardised naming conventions and heterogeneous types and features of Cloud services. In this paper, we present an OWL based ontology, the Cloud Computing Ontology (CoCoOn) tha…
▽ More
The Cloud infrastructure services landscape advances steadily leaving users in the agony of choice. As a result, Cloud service identification and discovery remains a hard problem due to different service descriptions, non standardised naming conventions and heterogeneous types and features of Cloud services. In this paper, we present an OWL based ontology, the Cloud Computing Ontology (CoCoOn) that defines functional and non functional concepts, attributes and relations of infrastructure services. We also present a system...
△ Less
Submitted 1 December, 2012;
originally announced December 2012.
-
Investigating Decision Support Techniques for Automating Cloud Service Selection
Authors:
Miranda Zhang,
Rajiv Ranjan,
Armin Haller,
Dimitrios Georgakopoulos,
Peter Strazdins
Abstract:
The compass of Cloud infrastructure services advances steadily leaving users in the agony of choice. To be able to select the best mix of service offering from an abundance of possibilities, users must consider complex dependencies and heterogeneous sets of criteria. Therefore, we present a PhD thesis proposal on investigating an intelligent decision support system for selecting Cloud based infras…
▽ More
The compass of Cloud infrastructure services advances steadily leaving users in the agony of choice. To be able to select the best mix of service offering from an abundance of possibilities, users must consider complex dependencies and heterogeneous sets of criteria. Therefore, we present a PhD thesis proposal on investigating an intelligent decision support system for selecting Cloud based infrastructure services (e.g. storage, network, CPU).
△ Less
Submitted 10 October, 2012;
originally announced October 2012.
-
A Declarative Recommender System for Cloud Infrastructure Services Selection
Authors:
Miranda Zhang,
Rajiv Ranjan,
Surya Nepal,
Michael Menzel,
Armin Haller
Abstract:
The cloud infrastructure services landscape advances steadily leaving users in the agony of choice...
The cloud infrastructure services landscape advances steadily leaving users in the agony of choice...
△ Less
Submitted 29 November, 2012; v1 submitted 7 October, 2012;
originally announced October 2012.