-
Methods for Linking Data to Online Resources and Ontologies with Applications to Neurophysiology
Authors:
Matthew Avaylon,
Ryan Ly,
Andrew Tritt,
Benjamin Dichter,
Kristofer E. Bouchard,
Christopher J. Mungall,
Oliver Ruebel
Abstract:
Across many domains, large swaths of digital assets are being stored across distributed data repositories, e.g., the DANDI Archive [8]. The distribution and diversity of these repositories impede researchers from formally defining terminology within experiments, integrating information across datasets, and easily querying, reusing, and analyzing data that follow the FAIR principles [15]. As such,…
▽ More
Across many domains, large swaths of digital assets are being stored across distributed data repositories, e.g., the DANDI Archive [8]. The distribution and diversity of these repositories impede researchers from formally defining terminology within experiments, integrating information across datasets, and easily querying, reusing, and analyzing data that follow the FAIR principles [15]. As such, it has become increasingly important to have a standardized method to attach contextual metadata to datasets. Neuroscience is an exemplary use case of this issue due to the complex multimodal nature of experiments. Here, we present the HDMF External Resources Data (HERD) standard and related tools, enabling researchers to annotate new and existing datasets by mapping external references to the data without requiring modification of the original dataset. We integrated HERD closely with Neurodata Without Borders (NWB) [2], a widely used data standard for sharing and storing neurophysiology data. By integrating with NWB, our tools provide neuroscientists with the capability to more easily create and manage neurophysiology data in compliance with controlled sets of terms, enhancing rigor and accuracy of data and facilitating data reuse.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
SciOps: Achieving Productivity and Reliability in Data-Intensive Research
Authors:
Erik C. Johnson,
Thinh T. Nguyen,
Benjamin K. Dichter,
Frank Zappulla,
Montgomery Kosma,
Kabilar Gunalan,
Yaroslav O. Halchenko,
Shay Q. Neufeld,
Kristen Ratan,
Nicholas J. Edwards,
Susanne Ressl,
Sarah R. Heilbronner,
Michael Schirner,
Petra Ritter,
Brock Wester,
Satrajit Ghosh,
Maryann E. Martone,
Franco Pestilli,
Dimitri Yatsenko
Abstract:
Scientists are increasingly leveraging advances in instruments, automation, and collaborative tools to scale up their experiments and research goals, leading to new bursts of discovery. Various scientific disciplines, including neuroscience, have adopted key technologies to enhance collaboration, reproducibility, and automation. Drawing inspiration from advancements in the software industry, we pr…
▽ More
Scientists are increasingly leveraging advances in instruments, automation, and collaborative tools to scale up their experiments and research goals, leading to new bursts of discovery. Various scientific disciplines, including neuroscience, have adopted key technologies to enhance collaboration, reproducibility, and automation. Drawing inspiration from advancements in the software industry, we present a roadmap to enhance the reliability and scalability of scientific operations for diverse research teams tackling large and complex projects. We introduce a five-level Capability Maturity Model describing the principles of rigorous scientific operations in projects ranging from small-scale exploratory studies to large-scale, multi-disciplinary research endeavors. Achieving higher levels of operational maturity necessitates the adoption of new, technology-enabled methodologies, which we refer to as SciOps. This concept is derived from the DevOps methodologies that have revolutionized the software industry. SciOps involves digital research environments that seamlessly integrate computational, automation, and AI-driven efforts throughout the research cycle-from experimental design and data collection to analysis and dissemination, ultimately leading to closed-loop discovery. This maturity model offers a framework for assessing and improving operational practices in multidisciplinary research teams, guiding them towards greater efficiency and effectiveness in scientific inquiry.
△ Less
Submitted 6 November, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
A Comparison of Neuroelectrophysiology Databases
Authors:
Priyanka Subash,
Alex Gray,
Misque Boswell,
Samantha L. Cohen,
Rachael Garner,
Sana Salehi,
Calvary Fisher,
Samuel Hobel,
Satrajit Ghosh,
Yaroslav Halchenko,
Benjamin Dichter,
Russell A. Poldrack,
Chris Markiewicz,
Dora Hermes,
Arnaud Delorme,
Scott Makeig,
Brendan Behan,
Alana Sparks,
Stephen R Arnott,
Zhengjia Wang,
John Magnotti,
Michael S. Beauchamp,
Nader Pouratian,
Arthur W. Toga,
Dominique Duncan
Abstract:
As data sharing has become more prevalent, three pillars - archives, standards, and analysis tools - have emerged as critical components in facilitating effective data sharing and collaboration. This paper compares four freely available intracranial neuroelectrophysiology data repositories: Data Archive for the BRAIN Initiative (DABI), Distributed Archives for Neurophysiology Data Integration (DAN…
▽ More
As data sharing has become more prevalent, three pillars - archives, standards, and analysis tools - have emerged as critical components in facilitating effective data sharing and collaboration. This paper compares four freely available intracranial neuroelectrophysiology data repositories: Data Archive for the BRAIN Initiative (DABI), Distributed Archives for Neurophysiology Data Integration (DANDI), OpenNeuro, and Brain-CODE. The aim of this review is to describe archives that provide researchers with tools to store, share, and reanalyze both human and non-human neurophysiology data based on criteria that are of interest to the neuroscientific community. The Brain Imaging Data Structure (BIDS) and Neurodata Without Borders (NWB) are utilized by these archives to make data more accessible to researchers by implementing a common standard. As the necessity for integrating large-scale analysis into data repository platforms continues to grow within the neuroscientific community, this article will highlight the various analytical and customizable tools developed within the chosen archives that may advance the field of neuroinformatics.
△ Less
Submitted 30 August, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Recurrent Exponential-Family Harmoniums without Backprop-Through-Time
Authors:
Joseph G. Makin,
Benjamin K. Dichter,
Philip N. Sabes
Abstract:
Exponential-family harmoniums (EFHs), which extend restricted Boltzmann machines (RBMs) from Bernoulli random variables to other exponential families (Welling et al., 2005), are generative models that can be trained with unsupervised-learning techniques, like contrastive divergence (Hinton et al. 2006; Hinton, 2002), as density estimators for static data. Methods for extending RBMs--and likewise E…
▽ More
Exponential-family harmoniums (EFHs), which extend restricted Boltzmann machines (RBMs) from Bernoulli random variables to other exponential families (Welling et al., 2005), are generative models that can be trained with unsupervised-learning techniques, like contrastive divergence (Hinton et al. 2006; Hinton, 2002), as density estimators for static data. Methods for extending RBMs--and likewise EFHs--to data with temporal dependencies have been proposed previously (Sutskever and Hinton, 2007; Sutskever et al., 2009), the learning procedure being validated by qualitative assessment of the generative model. Here we propose and justify, from a very different perspective, an alternative training procedure, proving sufficient conditions for optimal inference under that procedure. The resulting algorithm can be learned with only forward passes through the data--backprop-through-time is not required, as in previous approaches. The proof exploits a recent result about information retention in density estimators (Makin and Sabes, 2015), and applies it to a "recurrent EFH" (rEFH) by induction. Finally, we demonstrate optimality by simulation, testing the rEFH: (1) as a filter on training data generated with a linear dynamical system, the position of which is noisily reported by a population of "neurons" with Poisson-distributed spike counts; and (2) with the qualitative experiments proposed by Sutskever et al. (2009).
△ Less
Submitted 18 May, 2016;
originally announced May 2016.