-
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset
Authors:
Stefano Puliti,
Emily R. Lines,
Jana Müllerová,
Julian Frey,
Zoe Schindler,
Adrian Straker,
Matthew J. Allen,
Lukas Winiwarter,
Nataliia Rehush,
Hristina Hristova,
Brent Murray,
Kim Calders,
Louise Terryn,
Nicholas Coops,
Bernhard Höfle,
Samuli Junttila,
Martin Krůček,
Grzegorz Krok,
Kamil Král,
Shaun R. Levick,
Linda Luck,
Azim Missarov,
Martin Mokroš,
Harry J. F. Owen,
Krzysztof Stereńczak
, et al. (8 additional authors not shown)
Abstract:
Proximally-sensed laser scanning offers significant potential for automated forest data capture, but challenges remain in automatically identifying tree species without additional ground data. Deep learning (DL) shows promise for automation, yet progress is slowed by the lack of large, diverse, openly available labeled datasets of single tree point clouds. This has impacted the robustness of DL mo…
▽ More
Proximally-sensed laser scanning offers significant potential for automated forest data capture, but challenges remain in automatically identifying tree species without additional ground data. Deep learning (DL) shows promise for automation, yet progress is slowed by the lack of large, diverse, openly available labeled datasets of single tree point clouds. This has impacted the robustness of DL models and the ability to establish best practices for species classification.
To overcome these challenges, the FOR-species20K benchmark dataset was created, comprising over 20,000 tree point clouds from 33 species, captured using terrestrial (TLS), mobile (MLS), and drone laser scanning (ULS) across various European forests, with some data from other regions. This dataset enables the benchmarking of DL models for tree species classification, including both point cloud-based (PointNet++, MinkNet, MLP-Mixer, DGCNNs) and multi-view image-based methods (SimpleView, DetailView, YOLOv5).
2D image-based models generally performed better (average OA = 0.77) than 3D point cloud-based models (average OA = 0.72), with consistent results across different scanning platforms and sensors. The top model, DetailView, was particularly robust, handling data imbalances well and generalizing effectively across tree sizes.
The FOR-species20K dataset, available at https://zenodo.org/records/13255198, is a key resource for developing and benchmarking DL models for tree species classification using laser scanning data, providing a foundation for future advancements in the field.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Fiber optic computing using distributed feedback
Authors:
Brandon Redding,
Joseph B. Murray,
Joseph D. Hart,
Zheyuan Zhu,
Shuo S. Pang,
Raktim Sarma
Abstract:
The widespread adoption of machine learning and other matrix intensive computing algorithms has inspired renewed interest in analog optical computing, which has the potential to perform large-scale matrix multiplications with superior energy scaling and lower latency than digital electronics. However, most existing optical techniques rely on spatial multiplexing to encode and process data in paral…
▽ More
The widespread adoption of machine learning and other matrix intensive computing algorithms has inspired renewed interest in analog optical computing, which has the potential to perform large-scale matrix multiplications with superior energy scaling and lower latency than digital electronics. However, most existing optical techniques rely on spatial multiplexing to encode and process data in parallel, requiring a large number of high-speed modulators and detectors. More importantly, most of these architectures are restricted to performing a single kernel convolution operation per layer. Here, we introduce a fiber-optic computing architecture based on temporal multiplexing and distributed feedback that performs multiple convolutions on the input data in a single layer (i.e. grouped convolutions). Our approach relies on temporally encoding the input data as an optical pulse train and injecting it into an optical fiber where partial reflectors create a series of delayed copies of the input vector. In this work, we used Rayleigh backscattering in standard single mode fiber as the partial reflectors to encode a series of random kernel transforms. We show that this technique effectively performs a random non-linear projection of the input data into a higher dimensional space which can facilitate a variety of computing tasks, including non-linear principal component analysis, support vector machines, or extreme learning machines. By using a passive fiber to perform the kernel transforms, this approach enables efficient energy scaling with orders of magnitude lower power consumption than GPUs, while using a high-speed modulator and detector maintains low latency and high data-throughput. Finally, our approach is readily integrated with fiber-optic communication links, enabling additional applications such as processing remote sensing data transmitted in the analog domain.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
PharmacyGPT: The AI Pharmacist
Authors:
Zhengliang Liu,
Zihao Wu,
Mengxuan Hu,
Bokai Zhao,
Lin Zhao,
Tianyi Zhang,
Haixing Dai,
Xianyan Chen,
Ye Shen,
Sheng Li,
Quanzheng Li,
Xiang Li,
Brian Murray,
Tianming Liu,
Andrea Sikora
Abstract:
In this study, we introduce PharmacyGPT, a novel framework to assess the capabilities of large language models (LLMs) such as ChatGPT and GPT-4 in emulating the role of clinical pharmacists. Our methodology encompasses the utilization of LLMs to generate comprehensible patient clusters, formulate medication plans, and forecast patient outcomes. We conduct our investigation using real data acquired…
▽ More
In this study, we introduce PharmacyGPT, a novel framework to assess the capabilities of large language models (LLMs) such as ChatGPT and GPT-4 in emulating the role of clinical pharmacists. Our methodology encompasses the utilization of LLMs to generate comprehensible patient clusters, formulate medication plans, and forecast patient outcomes. We conduct our investigation using real data acquired from the intensive care unit (ICU) at the University of North Carolina Chapel Hill (UNC) Hospital. Our analysis offers valuable insights into the potential applications and limitations of LLMs in the field of clinical pharmacy, with implications for both patient care and the development of future AI-driven healthcare solutions. By evaluating the performance of PharmacyGPT, we aim to contribute to the ongoing discourse surrounding the integration of artificial intelligence in healthcare settings, ultimately promoting the responsible and efficacious use of such technologies.
△ Less
Submitted 3 October, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Exploration of Convolutional Neural Network Architectures for Large Region Map Automation
Authors:
R. M. Tsenov,
C. J. Henry,
J. L. Storie,
C. D. Storie,
B. Murray,
M. Sokolov
Abstract:
Deep learning semantic segmentation algorithms have provided improved frameworks for the automated production of Land-Use and Land-Cover (LULC) maps, which significantly increases the frequency of map generation as well as consistency of production quality. In this research, a total of 28 different model variations were examined to improve the accuracy of LULC maps. The experiments were carried ou…
▽ More
Deep learning semantic segmentation algorithms have provided improved frameworks for the automated production of Land-Use and Land-Cover (LULC) maps, which significantly increases the frequency of map generation as well as consistency of production quality. In this research, a total of 28 different model variations were examined to improve the accuracy of LULC maps. The experiments were carried out using Landsat 5/7 or Landsat 8 satellite images with the North American Land Change Monitoring System labels. The performance of various CNNs and extension combinations were assessed, where VGGNet with an output stride of 4, and modified U-Net architecture provided the best results. Additional expanded analysis of the generated LULC maps was also provided. Using a deep neural network, this work achieved 92.4% accuracy for 13 LULC classes within southern Manitoba representing a 15.8% improvement over published results for the NALCMS. Based on the large regions of interest, higher radiometric resolution of Landsat 8 data resulted in better overall accuracies (88.04%) compare to Landsat 5/7 (80.66%) for 16 LULC classes. This represents an 11.44% and 4.06% increase in overall accuracy compared to previously published NALCMS results, including larger land area and higher number of LULC classes incorporated into the models compared to other published LULC map automation methods.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Automated Extraction of Energy Systems Information from Remotely Sensed Data: A Review and Analysis
Authors:
Simiao Ren,
Wei Hu,
Kyle Bradbury,
Dylan Harrison-Atlas,
Laura Malaguzzi Valeri,
Brian Murray,
Jordan M. Malof
Abstract:
High quality energy systems information is a crucial input to energy systems research, modeling, and decision-making. Unfortunately, actionable information about energy systems is often of limited availability, incomplete, or only accessible for a substantial fee or through a non-disclosure agreement. Recently, remotely sensed data (e.g., satellite imagery, aerial photography) have emerged as a po…
▽ More
High quality energy systems information is a crucial input to energy systems research, modeling, and decision-making. Unfortunately, actionable information about energy systems is often of limited availability, incomplete, or only accessible for a substantial fee or through a non-disclosure agreement. Recently, remotely sensed data (e.g., satellite imagery, aerial photography) have emerged as a potentially rich source of energy systems information. However, the use of these data is frequently challenged by its sheer volume and complexity, precluding manual analysis. Recent breakthroughs in machine learning have enabled automated and rapid extraction of useful information from remotely sensed data, facilitating large-scale acquisition of critical energy system variables. Here we present a systematic review of the literature on this emerging topic, providing an in-depth survey and review of papers published within the past two decades. We first taxonomize the existing literature into ten major areas, spanning the energy value chain. Within each research area, we distill and critically discuss major features that are relevant to energy researchers, including, for example, key challenges regarding the accessibility and reliability of the methods. We then synthesize our findings to identify limitations and trends in the literature as a whole, and discuss opportunities for innovation. These include the opportunity to extend the methods beyond electricity to broader energy systems and wider geographic areas; and the ability to expand the use of these methods in research and decision making as satellite data become cheaper and easier to access. We also find that there are persistent challenges: limited standardization and rigor of performance assessments; limited sharing of code, which would improve replicability; and a limited consideration of the ethics and privacy of data.
△ Less
Submitted 2 October, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Accessible Data Curation and Analytics for International-Scale Citizen Science Datasets
Authors:
Benjamin Murray,
Eric Kerfoot,
Mark S. Graham,
Carole H. Sudre,
Erika Molteni,
Liane S. Canas,
Michela Antonelli,
Kerstin Klaser,
Alessia Visconti,
Andrew T. Chan,
Paul W. Franks,
Richard Davies,
Jonathan Wolf,
Tim Spector,
Claire J. Steves,
Marc Modat,
Sebastien Ourselin
Abstract:
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scal…
▽ More
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scale of the dataset means that it can no longer be easily processed using standard software on commodity hardware. Secondly, the size of the research group means that replicability and consistency of key analytics used across multiple publications becomes an issue. We present ExeTera, an open source data curation software designed to address scalability challenges and to enable reproducible research across an international research group for datasets such as the Covid Symptom Study dataset.
△ Less
Submitted 17 February, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Extending the Morphological Hit-or-Miss Transform to Deep Neural Networks
Authors:
Muhammad Aminul Islam,
Bryce Murray,
Andrew Buck,
Derek T. Anderson,
Grant Scott,
Mihail Popescu,
James Keller
Abstract:
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an ima…
▽ More
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an image. Herein, we identify limitations in existing hit-or-miss neural definitions and we formulate an optimization problem to learn the transform relative to deeper architectures. To this end, we model the semantically important condition that the intersection of the hit and miss structuring elements (SEs) should be empty and we present a way to express Don't Care (DNC), which is important for denoting regions of an SE that are not relevant to detecting a target pattern. Our analysis shows that convolution, in fact, acts like a hit-miss transform through semantic interpretation of its filter differences. On these premises, we introduce an extension that outperforms conventional convolution on benchmark data. Quantitative experiments are provided on synthetic and benchmark data, showing that the direct encoding hit-or-miss transform provides better interpretability on learned shapes consistent with objects whereas our morphologically inspired generalized convolution yields higher classification accuracy. Last, qualitative hit and miss filter visualizations are provided relative to single morphological layer.
△ Less
Submitted 27 September, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.