-
AI-assisted summary of suicide risk Formulation
Authors:
Rajib Rana,
Niall Higgins,
Kazi N. Haque,
John Reilly,
Kylie Burke,
Kathryn Turner,
Anthony R. Pisani,
Terry Stedman
Abstract:
Background: Formulation, associated with suicide risk assessment, is an individualised process that seeks to understand the idiosyncratic nature and development of an individual's problems. Auditing clinical documentation on an electronic health record (EHR) is challenging as it requires resource-intensive manual efforts to identify keywords in relevant sections of specific forms. Furthermore, cli…
▽ More
Background: Formulation, associated with suicide risk assessment, is an individualised process that seeks to understand the idiosyncratic nature and development of an individual's problems. Auditing clinical documentation on an electronic health record (EHR) is challenging as it requires resource-intensive manual efforts to identify keywords in relevant sections of specific forms. Furthermore, clinicians and healthcare professionals often do not use keywords; their clinical language can vary greatly and may contain various jargon and acronyms. Also, the relevant information may be recorded elsewhere. This study describes how we developed advanced Natural Language Processing (NLP) algorithms, a branch of Artificial Intelligence (AI), to analyse EHR data automatically. Method: Advanced Optical Character Recognition techniques were used to process unstructured data sets, such as portable document format (pdf) files. Free text data was cleaned and pre-processed using Normalisation of Free Text techniques. We developed algorithms and tools to unify the free text. Finally, the formulation was checked for the presence of each concept based on similarity using NLP-powered semantic matching techniques. Results: We extracted information indicative of formulation and assessed it to cover the relevant concepts. This was achieved using a Weighted Score to obtain a Confidence Level. Conclusion: The rigour to which formulation is completed is crucial to effectively using EHRs, ensuring correct and timely identification, engagement and interventions that may potentially avoid many suicide attempts and suicides.
△ Less
Submitted 19 December, 2024; v1 submitted 29 November, 2024;
originally announced December 2024.
-
Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning
Authors:
Rajib Rana,
Niall Higgins,
Kazi Nazmul Haque,
John Reilly,
Kylie Burke,
Kathryn Turner,
Terry Stedman
Abstract:
Ensuring accurate call prioritisation is essential for optimising the efficiency and responsiveness of mental health helplines. Currently, call operators rely entirely on the caller's statements to determine the priority of the calls. It has been shown that entirely subjective assessment can lead to errors. Furthermore, it is a missed opportunity not to utilise the voice properties readily availab…
▽ More
Ensuring accurate call prioritisation is essential for optimising the efficiency and responsiveness of mental health helplines. Currently, call operators rely entirely on the caller's statements to determine the priority of the calls. It has been shown that entirely subjective assessment can lead to errors. Furthermore, it is a missed opportunity not to utilise the voice properties readily available during the call to aid in the evaluation. Incorrect prioritisation can result in delayed assistance for high-risk individuals, resource misallocation, increased mental health deterioration, loss of trust, and potential legal consequences. It is vital to address these risks to guarantee the reliability and effectiveness of mental health services. This study delves into the potential of using machine learning, a branch of Artificial Intelligence, to estimate call priority from the callers' voices for users of mental health phone helplines. After analysing 459 call records from a mental health helpline, we achieved a balanced accuracy of 92\%, showing promise in aiding the call operators' efficiency in call handling processes and improving customer satisfaction.
△ Less
Submitted 24 November, 2024;
originally announced December 2024.
-
Bellybutton: Accessible and Customizable Deep-Learning Image Segmentation
Authors:
Sam Dillavou,
Jesse M. Hanlan,
Anthony T. Chieco,
Hongyi Xiao,
Sage Fulco,
Kevin T. Turner,
Douglas J. Durian
Abstract:
The conversion of raw images into quantifiable data can be a major hurdle in experimental research, and typically involves identifying region(s) of interest, a process known as segmentation. Machine learning tools for image segmentation are often specific to a set of tasks, such as tracking cells, or require substantial compute or coding knowledge to train and use. Here we introduce an easy-to-use…
▽ More
The conversion of raw images into quantifiable data can be a major hurdle in experimental research, and typically involves identifying region(s) of interest, a process known as segmentation. Machine learning tools for image segmentation are often specific to a set of tasks, such as tracking cells, or require substantial compute or coding knowledge to train and use. Here we introduce an easy-to-use (no coding required), image segmentation method, using a 15-layer convolutional neural network that can be trained on a laptop: Bellybutton. The algorithm trains on user-provided segmentation of example images, but, as we show, just one or even a portion of one training image can be sufficient in some cases. We detail the machine learning method and give three use cases where Bellybutton correctly segments images despite substantial lighting, shape, size, focus, and/or structure variation across the regions(s) of interest. Instructions for easy download and use, with further details and the datasets used in this paper are available at pypi.org/project/Bellybuttonseg.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Interpretable Visual Understanding with Cognitive Attention Network
Authors:
Xuejiao Tang,
Wenbin Zhang,
Yi Yu,
Kea Turner,
Tyler Derr,
Mengyu Wang,
Eirini Ntoutsi
Abstract:
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitiv…
▽ More
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN
△ Less
Submitted 7 December, 2023; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Towards Stratified Space Learning: Linearly Embedded Graphs
Authors:
Yossi Bokor,
Katharine Turner,
Christopher Williams
Abstract:
In this paper, we consider the simplest class of stratified spaces -- linearly embedded graphs. We present an algorithm that learns the abstract structure of an embedded graph and models the specific embedding from a point cloud sampled from it. We use tools and inspiration from computational geometry, algebraic topology, and topological data analysis and prove the correctness of the identified ab…
▽ More
In this paper, we consider the simplest class of stratified spaces -- linearly embedded graphs. We present an algorithm that learns the abstract structure of an embedded graph and models the specific embedding from a point cloud sampled from it. We use tools and inspiration from computational geometry, algebraic topology, and topological data analysis and prove the correctness of the identified abstract structure under assumptions on the embedding. The algorithm is implemented in the Julia package http://github.com/yossibokor/Skyler.jl , which we used for the numerical simulations in this paper.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Stratified Space Learning: Reconstructing Embedded Graphs
Authors:
Yossi Bokor,
Daniel Grixti-Cheng,
Markus Hegland,
Stephen Roberts,
Katharine Turner
Abstract:
Many data-rich industries are interested in the efficient discovery and modelling of structures underlying large data sets, as it allows for the fast triage and dimension reduction of large volumes of data embedded in high dimensional spaces. The modelling of these underlying structures is also beneficial for the creation of simulated data that better represents real data. In particular, for syste…
▽ More
Many data-rich industries are interested in the efficient discovery and modelling of structures underlying large data sets, as it allows for the fast triage and dimension reduction of large volumes of data embedded in high dimensional spaces. The modelling of these underlying structures is also beneficial for the creation of simulated data that better represents real data. In particular, for systems testing in cases where the use of real data streams might prove impractical or otherwise undesirable. We seek to discover and model the structure by combining methods from topological data analysis with numerical modelling. As a first step in combining these two areas, we examine the recovery of the abstract graph $G$ structure, and model a linear embedding $|G|$ given only a noisy point cloud sample $X$ of $|G|$.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Intrinsic Interleaving Distance for Merge Trees
Authors:
Ellen Gasparovic,
Elizabeth Munch,
Steve Oudot,
Katharine Turner,
Bei Wang,
Yusu Wang
Abstract:
Merge trees are a type of graph-based topological summary that tracks the evolution of connected components in the sublevel sets of scalar functions. They enjoy widespread applications in data analysis and scientific visualization. In this paper, we consider the problem of comparing two merge trees via the notion of interleaving distance in the metric space setting. We investigate various theoreti…
▽ More
Merge trees are a type of graph-based topological summary that tracks the evolution of connected components in the sublevel sets of scalar functions. They enjoy widespread applications in data analysis and scientific visualization. In this paper, we consider the problem of comparing two merge trees via the notion of interleaving distance in the metric space setting. We investigate various theoretical properties of such a metric. In particular, we show that the interleaving distance is intrinsic on the space of labeled merge trees and provide an algorithm to construct metric 1-centers for collections of labeled merge trees. We further prove that the intrinsic property of the interleaving distance also holds for the space of unlabeled merge trees. Our results are a first step toward performing statistics on graph-based topological summaries.
△ Less
Submitted 2 February, 2022; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Computing Environments for Reproducibility: Capturing the "Whole Tale"
Authors:
Adam Brinckman,
Kyle Chard,
Niall Gaffney,
Mihael Hategan,
Matthew B. Jones,
Kacper Kowalik,
Sivakumar Kulasekaran,
Bertram Ludäscher,
Bryce D. Mecum,
Jarek Nabrzyski,
Victoria Stodden,
Ian J. Taylor,
Matthew J. Turk,
Kandace Turner
Abstract:
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale p…
▽ More
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process--transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create "living publications" or "tales". The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
The quantitative and qualitative content analysis of marketing literature for innovative information systems: the Aldrich Archive
Authors:
Sebastian Fass,
Kevin Turner
Abstract:
The Aldrich Archive is a collection of technical and marketing material covering the period from 1977 to 2000; the physical documents are in the process of being digitised and made available on the internet. The Aldrich Archive includes contemporaneous case studies of end-user computer systems that were used for marketing purposes. This paper analyses these case studies of innovative information s…
▽ More
The Aldrich Archive is a collection of technical and marketing material covering the period from 1977 to 2000; the physical documents are in the process of being digitised and made available on the internet. The Aldrich Archive includes contemporaneous case studies of end-user computer systems that were used for marketing purposes. This paper analyses these case studies of innovative information systems 1980 - 1990 using a quantitative and qualitative content analysis. The major aim of this research paper is to find out how innovative information systems were marketed in the decade from 1980 to 1990. The paper uses a double-step content analysis and does not focus on one method of content analysis only. The reason for choosing this approach is to combine the advantages of both quantitative and qualitative content analysis. The results of the quantitative content analysis indicated that the focus of the marketing material would be on information management / information supply. But the qualitative analysis revealed that the focus is on monetary advantages. The strong focus on monetary advantages of information technology seems typical for the 1980s and 1990s. In 1987, Robert Solow stated you can see the computer age everywhere but in the productivity statistics. This paradox caused a lot of discussion: since the introduction of the IT productivity paradox the business value of information technology has been the topic of many debates by practitioners as well as by academics.
△ Less
Submitted 17 May, 2015;
originally announced May 2015.
-
Hypothesis Testing for Topological Data Analysis
Authors:
Andrew Robinson,
Katharine Turner
Abstract:
Persistent homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two s…
▽ More
Persistent homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two samples arise from the same population or process. We propose the use of randomization-style null hypothesis significance tests (NHST) for these situations. The test is based on a loss function that comprises pairwise distances between the elements of each sample and all the elements in the other sample. We use this method to analyze a range of simulated and experimental data. Through these examples we experimentally explore the power of the p-values. Our results show that the randomization-style NHST based on pairwise distances can distinguish between samples from different processes, which suggests that its use for hypothesis tests upon persistence diagrams is reasonable. We demonstrate its application on a real dataset of fMRI data of patients with ADHD.
△ Less
Submitted 21 February, 2016; v1 submitted 28 October, 2013;
originally announced October 2013.
-
Medians of populations of persistence diagrams
Authors:
Katharine Turner
Abstract:
Persistence diagrams are common objects in the field of Topological Data Analysis. They are topological summaries that capture both topological and geometric structure within data. Recently there has been a surge of interest in developing tools to statistically analyse populations of persistence diagrams, a process hampered by the complicated geometry of the space of persistence diagrams. In this…
▽ More
Persistence diagrams are common objects in the field of Topological Data Analysis. They are topological summaries that capture both topological and geometric structure within data. Recently there has been a surge of interest in developing tools to statistically analyse populations of persistence diagrams, a process hampered by the complicated geometry of the space of persistence diagrams. In this paper we study the median of a set of diagrams, defined as the minimizer of an appropriate cost function analogous to the sum of distances used for samples of real numbers. We then characterize the local minima of this cost function and in doing so characterize the median. We also do some comparative analysis of the properties of the median and the mean.
△ Less
Submitted 5 February, 2019; v1 submitted 31 July, 2013;
originally announced July 2013.
-
Probabilistic Fréchet Means for Time Varying Persistence Diagrams
Authors:
Elizabeth Munch,
Katharine Turner,
Paul Bendich,
Sayan Mukherjee,
Jonathan Mattingly,
John Harer
Abstract:
In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In 2011, Mileyko and his collaborators made the first study of the properties of the Fréchet mean in $(\mathcal{D}_p,W_p)$, the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet…
▽ More
In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In 2011, Mileyko and his collaborators made the first study of the properties of the Fréchet mean in $(\mathcal{D}_p,W_p)$, the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet mean of a finite set of diagrams always exists, but is not necessarily unique. The means of a continuously-varying set of diagrams do not themselves (necessarily) vary continuously, which presents obvious problems when trying to extend the Fréchet mean definition to the realm of vineyards.
We fix this problem by altering the original definition of Fréchet mean so that it now becomes a probability measure on the set of persistence diagrams; in a nutshell, the mean of a set of diagrams will be a weighted sum of atomic measures, where each atom is itself a persistence diagram determined using a perturbation of the input diagrams. This definition gives for each $N$ a map $(\mathcal{D}_p)^N \to \mathbb{P}(\mathcal{D}_p)$. We show that this map is Hölder continuous on finite diagrams and thus can be used to build a useful statistic on time-varying persistence diagrams, better known as vineyards.
△ Less
Submitted 17 November, 2014; v1 submitted 24 July, 2013;
originally announced July 2013.
-
Cone fields and topological sampling in manifolds with bounded curvature
Authors:
Katharine Turner
Abstract:
Often noisy point clouds are given as an approximation of a particular compact set of interest. A finite point cloud is a compact set. This paper proves a reconstruction theorem which gives a sufficient condition, as a bound on the Hausdorff distance between two compact sets, for when certain offsets of these two sets are homotopic in terms of the absence of μ-critical points in an annular region.…
▽ More
Often noisy point clouds are given as an approximation of a particular compact set of interest. A finite point cloud is a compact set. This paper proves a reconstruction theorem which gives a sufficient condition, as a bound on the Hausdorff distance between two compact sets, for when certain offsets of these two sets are homotopic in terms of the absence of μ-critical points in an annular region. Since an offset of a set deformation retracts to the set itself provided that there are no critical points of the distance function nearby, we can use this theorem to show when the offset of a point cloud is homotopy equivalent to the set it is sampled from. The ambient space can be any Riemannian manifold but we focus on ambient manifolds which have nowhere negative curvature. In the process, we prove stability theorems for μ-critical points when the ambient space is a manifold.
△ Less
Submitted 3 August, 2013; v1 submitted 28 December, 2011;
originally announced December 2011.