-
Tiny Transformers for Environmental Sound Classification at the Edge
Authors:
David Elliott,
Carlos E. Otero,
Steven Wyatt,
Evan Martino
Abstract:
With the growth of the Internet of Things and the rise of Big Data, data processing and machine learning applications are being moved to cheap and low size, weight, and power (SWaP) devices at the edge, often in the form of mobile phones, embedded systems, or microcontrollers. The field of Cyber-Physical Measurements and Signature Intelligence (MASINT) makes use of these devices to analyze and exp…
▽ More
With the growth of the Internet of Things and the rise of Big Data, data processing and machine learning applications are being moved to cheap and low size, weight, and power (SWaP) devices at the edge, often in the form of mobile phones, embedded systems, or microcontrollers. The field of Cyber-Physical Measurements and Signature Intelligence (MASINT) makes use of these devices to analyze and exploit data in ways not otherwise possible, which results in increased data quality, increased security, and decreased bandwidth. However, methods to train and deploy models at the edge are limited, and models with sufficient accuracy are often too large for the edge device. Therefore, there is a clear need for techniques to create efficient AI/ML at the edge. This work presents training techniques for audio models in the field of environmental sound classification at the edge. Specifically, we design and train Transformers to classify office sounds in audio clips. Results show that a BERT-based Transformer, trained on Mel spectrograms, can outperform a CNN using 99.85% fewer parameters. To achieve this result, we first tested several audio feature extraction techniques designed for Transformers, using ESC-50 for evaluation, along with various augmentations. Our final model outperforms the state-of-the-art MFCC-based CNN on the office sounds dataset, using just over 6,000 parameters -- small enough to run on a microcontroller.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Mining Functionally Related Genes with Semi-Supervised Learning
Authors:
Kaiyu Shen,
Razvan Bunescu,
Sarah E. Wyatt
Abstract:
The study of biological processes can greatly benefit from tools that automatically predict gene functions or directly cluster genes based on shared functionality. Existing data mining methods predict protein functionality by exploiting data obtained from high-throughput experiments or meta-scale information from public databases. Most existing prediction tools are targeted at predicting protein f…
▽ More
The study of biological processes can greatly benefit from tools that automatically predict gene functions or directly cluster genes based on shared functionality. Existing data mining methods predict protein functionality by exploiting data obtained from high-throughput experiments or meta-scale information from public databases. Most existing prediction tools are targeted at predicting protein functions that are described in the gene ontology (GO). However, in many cases biologists wish to discover functionally related genes for which GO terms are inadequate. In this paper, we introduce a rich set of features and use them in conjunction with semisupervised learning approaches in order to expand an initial set of seed genes to a larger cluster of functionally related genes. Among all the semi-supervised methods that were evaluated, the framework of learning with positive and unlabeled examples (LPU) is shown to be especially appropriate for mining functionally related genes. When evaluated on experimentally validated benchmark data, the LPU approaches1 significantly outperform a standard supervised learning algorithm as well as an established state-of-the-art method. Given an initial set of seed genes, our best performing approach could be used to mine functionally related genes in a wide range of organisms.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Lost or found? Discovering data needed for research
Authors:
Kathleen Gregory,
Paul Groth,
Andrea Scharnhorst,
Sally Wyatt
Abstract:
Finding data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research. This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves. We examine the data needs and discovery stra…
▽ More
Finding data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research. This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves. We examine the data needs and discovery strategies of respondents, propose a typology for data reuse and probe the role of social interactions and literature search in data discovery. We consider how data communities can be conceptualized according to data uses and propose practical applications of our findings for designers of data discovery systems and repositories. Specifically, we consider how to design for a diversity of practices, how communities of use can serve as an entry point for design and the role of metadata in supporting both sensemaking and social interactions.
△ Less
Submitted 2 April, 2020; v1 submitted 1 September, 2019;
originally announced September 2019.
-
Understanding Data Search as a Socio-technical Practice
Authors:
Kathleen Gregory,
Helena Cousijn,
Paul Groth,
Andrea Scharnhorst,
Sally Wyatt
Abstract:
Open research data are heralded as having the potential to increase effectiveness, productivity, and reproducibility in science, but little is known about the actual practices involved in data search. The socio-technical problem of locating data for reuse is often reduced to the technological dimension of designing data search systems. We combine a bibliometric study of the current academic discou…
▽ More
Open research data are heralded as having the potential to increase effectiveness, productivity, and reproducibility in science, but little is known about the actual practices involved in data search. The socio-technical problem of locating data for reuse is often reduced to the technological dimension of designing data search systems. We combine a bibliometric study of the current academic discourse around data search with interviews with data seekers. In this article, we explore how adopting a contextual, socio-technical perspective can help to understand user practices and behavior and ultimately help to improve the design of data discovery systems.
△ Less
Submitted 18 February, 2019; v1 submitted 15 January, 2018;
originally announced January 2018.
-
Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines
Authors:
Kathleen Gregory,
Paul Groth,
Helena Cousijn,
Andrea Scharnhorst,
Sally Wyatt
Abstract:
A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data. Two analytical frameworks rooted in information retrieval and science technology studies are used to id…
▽ More
A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data. Two analytical frameworks rooted in information retrieval and science technology studies are used to identify key similarities in practices as a first step toward developing a model describing data retrieval.
△ Less
Submitted 12 March, 2020; v1 submitted 21 July, 2017;
originally announced July 2017.
-
Mapping EINS -- An exercise in mapping the Network of Excellence in Internet Science
Authors:
Almila Akdag Salah,
Sally Wyatt,
Samir Passi,
Andrea Scharnhorst
Abstract:
This paper demonstrates the application of bibliometric mapping techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the mapping of EINS as case -- an FP7 funded Network of Excellence. Finally, we discuss how these techniques c…
▽ More
This paper demonstrates the application of bibliometric mapping techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the mapping of EINS as case -- an FP7 funded Network of Excellence. Finally, we discuss how these techniques can be used to serve as knowledge maps for interdisciplinary working experts.
△ Less
Submitted 16 July, 2013; v1 submitted 21 April, 2013;
originally announced April 2013.