-
Graph polynomials: some questions on the edge
Authors:
Graham Farr,
Kerri Morgan
Abstract:
We raise some questions about graph polynomials, highlighting concepts and phenomena that may merit consideration in the development of a general theory. Our questions are mainly of three types: When do graph polynomials have reduction relations (simple linear recursions based on local operations), perhaps in a wider class of combinatorial objects? How many levels of reduction relations does a gra…
▽ More
We raise some questions about graph polynomials, highlighting concepts and phenomena that may merit consideration in the development of a general theory. Our questions are mainly of three types: When do graph polynomials have reduction relations (simple linear recursions based on local operations), perhaps in a wider class of combinatorial objects? How many levels of reduction relations does a graph polynomial need in order to express it in terms of trivial base cases? For a graph polynomial, how are properties such as equivalence and factorisation reflected in the structure of a graph? We illustrate our discussion with a variety of graph polynomials and other invariants. This leads us to reflect on the historical origins of graph polynomials. We also introduce some new polynomials based on partial colourings of graphs and establish some of their basic properties.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Extracting Biomedical Entities from Noisy Audio Transcripts
Authors:
Nima Ebadi,
Kellen Morgan,
Adrian Tan,
Billy Linares,
Sheri Osborn,
Emma Majors,
Jeremy Davis,
Anthony Rios
Abstract:
Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Health Record (EHR) systems. Nevertheless, challenges persist, especially when transcriptions contain noise, leading to significant drops in performance when Natural…
▽ More
Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Health Record (EHR) systems. Nevertheless, challenges persist, especially when transcriptions contain noise, leading to significant drops in performance when Natural Language Processing (NLP) models are applied. Named Entity Recognition (NER), an essential clinical task, is particularly affected by such noise, often termed the ASR-NLP gap. Prior works have primarily studied ASR's efficiency in clean recordings, leaving a research gap concerning the performance in noisy environments. This paper introduces a novel dataset, BioASR-NER, designed to bridge the ASR-NLP gap in the biomedical domain, focusing on extracting adverse drug reactions and mentions of entities from the Brief Test of Adult Cognition by Telephone (BTACT) exam. Our dataset offers a comprehensive collection of almost 2,000 clean and noisy recordings. In addressing the noise challenge, we present an innovative transcript-cleaning method using GPT4, investigating both zero-shot and few-shot methodologies. Our study further delves into an error analysis, shedding light on the types of errors in transcription software, corrections by GPT4, and the challenges GPT4 faces. This paper aims to foster improved understanding and potential solutions for the ASR-NLP gap, ultimately supporting enhanced healthcare documentation practices.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Duke Spleen Data Set: A Publicly Available Spleen MRI and CT dataset for Training Segmentation
Authors:
Yuqi Wang,
Jacob A. Macdonald,
Katelyn R. Morgan,
Danielle Hom,
Sarah Cubberley,
Kassi Sollace,
Nicole Casasanto,
Islam H. Zaki,
Kyle J. Lafata,
Mustafa R. Bashir
Abstract:
Spleen volumetry is primarily associated with patients suffering from chronic liver disease and portal hypertension, as they often have spleens with abnormal shapes and sizes. However, manually segmenting the spleen to obtain its volume is a time-consuming process. Deep learning algorithms have proven to be effective in automating spleen segmentation, but a suitable dataset is necessary for traini…
▽ More
Spleen volumetry is primarily associated with patients suffering from chronic liver disease and portal hypertension, as they often have spleens with abnormal shapes and sizes. However, manually segmenting the spleen to obtain its volume is a time-consuming process. Deep learning algorithms have proven to be effective in automating spleen segmentation, but a suitable dataset is necessary for training such algorithms. To our knowledge, the few publicly available datasets for spleen segmentation lack confounding features such as ascites and abdominal varices. To address this issue, the Duke Spleen Data Set (DSDS) has been developed, which includes 109 CT and MRI volumes from patients with chronic liver disease and portal hypertension. The dataset includes a diverse range of image types, vendors, planes, and contrasts, as well as varying spleen shapes and sizes due to underlying disease states. The DSDS aims to facilitate the creation of robust spleen segmentation models that can take into account these variations and confounding factors.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Finding Maximum Cliques in Large Networks
Authors:
S. Y. Chan,
K. Morgan,
J. Ugon
Abstract:
There are many methods to find a maximum (or maximal) clique in large networks. Due to the nature of combinatorics, computation becomes exponentially expensive as the number of vertices in a graph increases. Thus, there is a need for efficient algorithms to find a maximum clique. In this paper, we present a graph reduction method that significantly reduces the order of a graph, and so enables the…
▽ More
There are many methods to find a maximum (or maximal) clique in large networks. Due to the nature of combinatorics, computation becomes exponentially expensive as the number of vertices in a graph increases. Thus, there is a need for efficient algorithms to find a maximum clique. In this paper, we present a graph reduction method that significantly reduces the order of a graph, and so enables the identification of a maximum clique in graphs of large order, that would otherwise be computational infeasible to find the maximum. We find bounds of the maximum (or maximal) clique using this reduction. We demonstrate our method on real-life social networks and also on Erdös-Renyi random graphs.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Supernodes
Authors:
Su Yuan Chan,
Kerri Morgan,
Nick Parsons,
Julien Ugon
Abstract:
In this paper, we present two new concepts related to subgraph counting where the focus is not on the number of subgraphs that are isomorphic to some fixed graph $H$, but on the frequency with which a vertex or an edge belongs to such subgraphs. In particular, we are interested in the case where $H$ is a complete graph. These new concepts are termed vertex participation and edge participation resp…
▽ More
In this paper, we present two new concepts related to subgraph counting where the focus is not on the number of subgraphs that are isomorphic to some fixed graph $H$, but on the frequency with which a vertex or an edge belongs to such subgraphs. In particular, we are interested in the case where $H$ is a complete graph. These new concepts are termed vertex participation and edge participation respectively. We combine these concepts with that of the rich-club to identify what we call a Super rich-club and rich edge-club. We show that the concept of vertex participation is a generalisation of the rich-club. We present experimental results on randomised Erdös Rényi and Watts-Strogatz small-world networks. We further demonstrate both concepts on a complex brain network and compare our results to the rich-club of the brain.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Observement as Universal Measurement
Authors:
David G. Green,
Kerri Morgan,
Marc Cheong
Abstract:
Measurement theory is the cornerstone of science, but no equivalent theory underpins the huge volumes of non-numerical data now being generated. In this study, we show that replacing numbers with alternative mathematical models, such as strings and graphs, generalises traditional measurement to provide rigorous, formal systems (`observement') for recording and interpreting non-numerical data. More…
▽ More
Measurement theory is the cornerstone of science, but no equivalent theory underpins the huge volumes of non-numerical data now being generated. In this study, we show that replacing numbers with alternative mathematical models, such as strings and graphs, generalises traditional measurement to provide rigorous, formal systems (`observement') for recording and interpreting non-numerical data. Moreover, we show that these representations are already widely used and identify general classes of interpretive methodologies implicit in representations based on character strings and graphs (networks). This implies that a generalised concept of measurement has the potential to reveal new insights as well as deep connections between different fields of research.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Instance Space Analysis for the Car Sequencing Problem
Authors:
Yuan Sun,
Samuel Esler,
Dhananjay Thiruvady,
Andreas T. Ernst,
Xiaodong Li,
Kerri Morgan
Abstract:
We investigate an important research question for solving the car sequencing problem, that is, which characteristics make an instance hard to solve? To do so, we carry out an instance space analysis for the car sequencing problem, by extracting a vector of problem features to characterize an instance. In order to visualize the instance space, the feature vectors are projected onto a two-dimensiona…
▽ More
We investigate an important research question for solving the car sequencing problem, that is, which characteristics make an instance hard to solve? To do so, we carry out an instance space analysis for the car sequencing problem, by extracting a vector of problem features to characterize an instance. In order to visualize the instance space, the feature vectors are projected onto a two-dimensional space using dimensionality reduction techniques. The resulting two-dimensional visualizations provide new insights into the characteristics of the instances used for testing and how these characteristics influence the behaviours of an optimization algorithm. This analysis guides us in constructing a new set of benchmark instances with a range of instance properties. We demonstrate that these new instances are more diverse than the previous benchmarks, including some instances that are significantly more difficult to solve. We introduce two new algorithms for solving the car sequencing problem and compare them with four existing methods from the literature. Our new algorithms are shown to perform competitively for this problem but no single algorithm can outperform all others over all instances. This observation motivates us to build an algorithm selection model based on machine learning, to identify the niche in the instance space that an algorithm is expected to perform well on. Our analysis helps to understand problem hardness and select an appropriate algorithm for solving a given car sequencing problem instance.
△ Less
Submitted 20 August, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
A survey of repositories in graph theory
Authors:
Srinibas Swain,
C. Paul Bonnington,
Graham Farr,
Kerri Morgan
Abstract:
Since the pioneering work of R. M. Foster in the 1930s, many graph repositories have been created to support research in graph theory. This survey reviews many of these graph repositories and summarises the scope and contents of each repository. We identify opportunities for the development of repositories that can be queried in more flexible ways.
Since the pioneering work of R. M. Foster in the 1930s, many graph repositories have been created to support research in graph theory. This survey reviews many of these graph repositories and summarises the scope and contents of each repository. We identify opportunities for the development of repositories that can be queried in more flexible ways.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Factorisation of Greedoid Polynomials of Rooted Digraphs
Authors:
Kai Siong Yow,
Kerri Morgan,
Graham Farr
Abstract:
Gordon and McMahon defined a two-variable greedoid polynomial $ f(G;t,z) $ for any greedoid $ G $. They studied greedoid polynomials for greedoids associated with rooted graphs and rooted digraphs. They proved that greedoid polynomials of rooted digraphs have the multiplicative direct sum property. In addition, these polynomials are divisible by $ 1 + z $ under certain conditions. We compute the g…
▽ More
Gordon and McMahon defined a two-variable greedoid polynomial $ f(G;t,z) $ for any greedoid $ G $. They studied greedoid polynomials for greedoids associated with rooted graphs and rooted digraphs. They proved that greedoid polynomials of rooted digraphs have the multiplicative direct sum property. In addition, these polynomials are divisible by $ 1 + z $ under certain conditions. We compute the greedoid polynomials for all rooted digraphs up to order six. A greedoid polynomial $ f(D) $ of a rooted digraph $ D $ of order $ n $ GM-factorises if $ f(D) = f(G) \cdot f(H) $ such that $ G $ and $ H $ are rooted digraphs of order at most $ n $ and $ f(G),f(H) \ne 1 $. We study the GM-factorability of greedoid polynomials of rooted digraphs, particularly those that are not divisible by $ 1 + z $. We give some examples and an infinite family of rooted digraphs that are not direct sums but their greedoid polynomials GM-factorise.
△ Less
Submitted 3 May, 2019; v1 submitted 9 September, 2018;
originally announced September 2018.
-
Tutte Invariants for Alternating Dimaps
Authors:
Kai Siong Yow,
Graham Farr,
Kerri Morgan
Abstract:
An alternating dimap is an orientably embedded Eulerian directed graph where the edges incident with each vertex are directed inwards and outwards alternately. Three reduction operations for alternating dimaps were investigated by Farr. A minor of an alternating dimap can be obtained by reducing some of its edges using the reduction operations. Unlike classical minor operations, these reduction op…
▽ More
An alternating dimap is an orientably embedded Eulerian directed graph where the edges incident with each vertex are directed inwards and outwards alternately. Three reduction operations for alternating dimaps were investigated by Farr. A minor of an alternating dimap can be obtained by reducing some of its edges using the reduction operations. Unlike classical minor operations, these reduction operations do not commute in general. A Tutte invariant for alternating dimaps is a function $ P $ defined on every alternating dimap and taking values in a field such that $ P $ is invariant under isomorphism and obeys a linear recurrence relation involving reduction operations. It is well known that if a graph $ G $ is planar, then the Tutte polynomial $ T $ satisfies $ T(G;x,y)=T(G^{*};y,x) $. We note an analogous relation for the extended Tutte invariants for alternating dimaps introduced by Farr. We then characterise the Tutte invariant for alternating dimaps of genus zero under several conditions. As a result of the non-commutativity of the reduction operations, the recursions based on them cannot always be satisfied. We investigate the properties of alternating dimaps of genus zero that are required in order to obtain a well defined Tutte invariant. Some excluded minor characterisations for these alternating dimaps are also given.
△ Less
Submitted 23 March, 2020; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs
Authors:
Tim Dwyer,
Christopher Mears,
Kerri Morgan,
Todd Niven,
Kim Marriott,
Mark Wallace
Abstract:
Drawings of highly connected (dense) graphs can be very difficult to read. Power Graph Analysis offers an alternate way to draw a graph in which sets of nodes with common neighbours are shown grouped into modules. An edge connected to the module then implies a connection to each member of the module. Thus, the entire graph may be represented with much less clutter and without loss of detail. A rec…
▽ More
Drawings of highly connected (dense) graphs can be very difficult to read. Power Graph Analysis offers an alternate way to draw a graph in which sets of nodes with common neighbours are shown grouped into modules. An edge connected to the module then implies a connection to each member of the module. Thus, the entire graph may be represented with much less clutter and without loss of detail. A recent experimental study has shown that such lossless compression of dense graphs makes it easier to follow paths. However, computing optimal power graphs is difficult. In this paper, we show that computing the optimal power-graph with only one module is NP-hard and therefore likely NP-hard in the general case. We give an ILP model for power graph computation and discuss why ILP and CP techniques are poorly suited to the problem. Instead, we are able to find optimal solutions much more quickly using a custom search method. We also show how to restrict this type of search to allow only limited back-tracking to provide a heuristic that has better speed and better results than previously known heuristics.
△ Less
Submitted 13 November, 2013;
originally announced November 2013.