-
Increasing trust in new data sources: crowdsourcing image classification for ecology
Authors:
Edgar Santos-Fernandez,
Julie Vercelloni,
Aiden Price,
Grace Heron,
Bryce Christensen,
Erin E. Peterson,
Kerrie Mengersen
Abstract:
Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addres…
▽ More
Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics
Authors:
Joshua J. Bon,
Adam Bretherton,
Katie Buchhorn,
Susanna Cramb,
Christopher Drovandi,
Conor Hassan,
Adrianne L. Jenner,
Helen J. Mayfield,
James M. McGree,
Kerrie Mengersen,
Aiden Price,
Robert Salomone,
Edgar Santos-Fernandez,
Julie Vercelloni,
Xiaoyu Wang
Abstract:
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six moder…
▽ More
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products.
△ Less
Submitted 17 January, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
A Bayesian latent allocation model for clustering compositional data with application to the Great Barrier Reef
Authors:
Luiza Piancastelli,
Nial Friel,
Julie Vercelloni,
Kerrie Mengersen,
Antonietta Mira
Abstract:
Relative abundance is a common metric to estimate the composition of species in ecological surveys reflecting patterns of commonness and rarity of biological assemblages. Measurements of coral reef compositions formed by four communities along Australia's Great Barrier Reef (GBR) gathered between 2012 and 2017 are the focus of this paper. We undertake the task of finding clusters of transect locat…
▽ More
Relative abundance is a common metric to estimate the composition of species in ecological surveys reflecting patterns of commonness and rarity of biological assemblages. Measurements of coral reef compositions formed by four communities along Australia's Great Barrier Reef (GBR) gathered between 2012 and 2017 are the focus of this paper. We undertake the task of finding clusters of transect locations with similar community composition and investigate changes in clustering dynamics over time. During these years, an unprecedented sequence of extreme weather events (cyclones and coral bleaching) impacted the 58 surveyed locations. The dependence between constituent parts of a composition presents a challenge for existing multivariate clustering approaches. In this paper, we introduce a finite mixture of Dirichlet distributions with group-specific parameters, where cluster memberships are dictated by unobserved latent variables. The inference is carried in a Bayesian framework, where MCMC strategies are outlined to sample from the posterior model. Simulation studies are presented to illustrate the performance of the model in a controlled setting. The application of the model to the 2012 coral reef data reveals that clusters were spatially distributed in similar ways across reefs which indicates a potential influence of wave exposure at the origin of coral reef community composition. The number of clusters estimated by the model decreased from four in 2012 to two from 2014 until 2017. Posterior probabilities of transect allocations to the same cluster substantially increase through time showing a potential homogenization of community composition across the whole GBR. The Bayesian model highlights the diversity of coral reef community composition within a coral reef and rapid changes across large spatial scales that may contribute to undermining the future of the GBR's biodiversity.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective
Authors:
Edgar Santos-Fernandez,
Erin E. Peterson,
Julie Vercelloni,
Em Rushworth,
Kerrie Mengersen
Abstract:
Many research domains use data elicited from "citizen scientists" when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants' abilities. The model is des…
▽ More
Many research domains use data elicited from "citizen scientists" when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants' abilities. The model is described in the context of an ecological application that involves crowdsourced classifications of georeferenced coral-reef images from the Great Barrier Reef, Australia. The latent variable of interest is the proportion of coral cover, which is a common indicator of coral reef health. The participants' abilities are expressed in terms of sensitivity and specificity of a correctly classified set of points on the images. The model also incorporates a spatial component, which allows prediction of the latent variable in locations that have not been surveyed. We show that the model outperforms traditional weighted-regression approaches used to account for uncertainty in citizen science data. Our approach produces more accurate regression coefficients and provides a better characterization of the latent process of interest. This new method is implemented in the probabilistic programming language Stan and can be applied to a wide number of problems that rely on uncertain citizen science data.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Monitoring through many eyes: Integrating disparate datasets to improve monitoring of the Great Barrier Reef
Authors:
Erin E Peterson,
Edgar Santos-Fernández,
Carla Chen,
Sam Clifford,
Julie Vercelloni,
Alan Pearse,
Ross Brown,
Bryce Christensen,
Allan James,
Ken Anthony,
Jennifer Loder,
Manuel González-Rivero,
Chris Roelfsema,
M. Julian Caley,
Tomasz Bednarz,
Kerrie Mengersen
Abstract:
Numerous organisations collect data in the Great Barrier Reef (GBR), but they are rarely analysed together due to different program objectives, methods, and data quality. We developed a weighted spatiotemporal Bayesian model and used it to integrate image based hard coral data collected by professional and citizen scientists, who captured and or classified underwater images. We used the model to p…
▽ More
Numerous organisations collect data in the Great Barrier Reef (GBR), but they are rarely analysed together due to different program objectives, methods, and data quality. We developed a weighted spatiotemporal Bayesian model and used it to integrate image based hard coral data collected by professional and citizen scientists, who captured and or classified underwater images. We used the model to predict coral cover across the GBR with estimates of uncertainty; thus filling gaps in space and time where no data exist. Additional data increased the models predictive ability by 43 percent, but did not affect model inferences about pressures (e.g. bleaching and cyclone damage). Thus, effective integration of professional and high-volume citizen data could enhance the capacity and cost efficiency of monitoring programs. This general approach is equally viable for other variables collected in the marine environment or other ecosystems; opening up new opportunities to integrate data and provide pathways for community engagement and stewardship.
△ Less
Submitted 27 March, 2019; v1 submitted 15 August, 2018;
originally announced August 2018.