-
Satellite Monitoring of Terrestrial Plastic Waste
Authors:
Caleb Kruse,
Edward Boyda,
Sully Chen,
Krishna Karra,
Tristan Bou-Nahra,
Dan Hammer,
Jennifer Mathis,
Taylor Maddalene,
Jenna Jambeck,
Fabien Laurier
Abstract:
Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of s…
▽ More
Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of sites found in public databases. The same system deployed across twelve countries in Southeast Asia identifies 996 subsequently confirmed waste sites. For each detected site, we algorithmically monitor waste site footprints through time and cross-reference other datasets to generate physical and social metadata. 19% of detected waste sites are located within 200 m of a waterway. Numerous sites sit directly on riverbanks, with high risk of ocean leakage.
△ Less
Submitted 24 March, 2022;
originally announced April 2022.
-
Interpretable contrastive word mover's embedding
Authors:
Ruijie Jiang,
Julia Gouvea,
Eric Miller,
David Hammer,
Shuchin Aeron
Abstract:
This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon exist…
▽ More
This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon existing baselines while providing interpretation to the clusters via identifying a set of keywords that are the most representative of a particular class. Our approach was motivated in part by the need to develop Natural Language Processing (NLP) methods for the \textit{novel problem of assessing student work for scientific writing and thinking} - a problem that is central to the area of (educational) Learning Sciences (LS). In this context, we show that our approach leads to a meaningful assessment of the student work related to lab reports from a biology class and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Automatic coding of students' writing via Contrastive Representation Learning in the Wasserstein space
Authors:
Ruijie Jiang,
Julia Gouvea,
David Hammer,
Eric Miller,
Shuchin Aeron
Abstract:
Qualitative analysis of verbal data is of central importance in the learning sciences. It is labor-intensive and time-consuming, however, which limits the amount of data researchers can include in studies. This work is a step towards building a statistical machine learning (ML) method for achieving an automated support for qualitative analyses of students' writing, here specifically in score labor…
▽ More
Qualitative analysis of verbal data is of central importance in the learning sciences. It is labor-intensive and time-consuming, however, which limits the amount of data researchers can include in studies. This work is a step towards building a statistical machine learning (ML) method for achieving an automated support for qualitative analyses of students' writing, here specifically in score laboratory reports in introductory biology for sophistication of argumentation and reasoning. We start with a set of lab reports from an undergraduate biology course, scored by a four-level scheme that considers the complexity of argument structure, the scope of evidence, and the care and nuance of conclusions. Using this set of labeled data, we show that a popular natural language modeling processing pipeline, namely vector representation of words, a.k.a word embeddings, followed by Long Short Term Memory (LSTM) model for capturing language generation as a state-space model, is able to quantitatively capture the scoring, with a high Quadratic Weighted Kappa (QWK) prediction score, when trained in via a novel contrastive learning set-up. We show that the ML algorithm approached the inter-rater reliability of human analysis. Ultimately, we conclude, that machine learning (ML) for natural language processing (NLP) holds promise for assisting learning sciences researchers in conducting qualitative studies at much larger scales than is currently possible.
△ Less
Submitted 1 December, 2020; v1 submitted 26 November, 2020;
originally announced November 2020.
-
Simulating Population Protocols in Sub-Constant Time per Interaction
Authors:
Petra Berenbrink,
David Hammer,
Dominik Kaaser,
Ulrich Meyer,
Manuel Penschuck,
Hung Tran
Abstract:
We consider the problem of efficiently simulating population protocols. In the population model, we are given a distributed system of $n$ agents modeled as identical finite-state machines. In each time step, a pair of agents is selected uniformly at random to interact. In an interaction, agents update their states according to a common transition function. We empirically and analytically analyze t…
▽ More
We consider the problem of efficiently simulating population protocols. In the population model, we are given a distributed system of $n$ agents modeled as identical finite-state machines. In each time step, a pair of agents is selected uniformly at random to interact. In an interaction, agents update their states according to a common transition function. We empirically and analytically analyze two classes of simulators for this model.
First, we consider sequential simulators executing one interaction after the other. Key to the performance of these simulators is the data structure storing the agents' states. For our analysis, we consider plain arrays, binary search trees, and a novel Dynamic Alias Table data structure.
Secondly, we consider batch processing to efficiently update the states of multiple independent agents in one step. For many protocols considered in literature, our simulator requires amortized sub-constant time per interaction and is fast in practice: given a fixed time budget, the implementation of our batched simulator is able to simulate population protocols several orders of magnitude larger compared to the sequential competitors, and can carry out $2^{50}$ interactions among the same number of agents in less than 400s.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Fragile Complexity of Comparison-Based Algorithms
Authors:
Peyman Afshani,
Rolf Fagerberg,
David Hammer,
Riko Jacob,
Irina Kostitsyna,
Ulrich Meyer,
Manuel Penschuck,
Nodari Sitchinava
Abstract:
We initiate a study of algorithms with a focus on the computational complexity of individual elements, and introduce the fragile complexity of comparison-based algorithms as the maximal number of comparisons any individual element takes part in. We give a number of upper and lower bounds on the fragile complexity for fundamental problems, including Minimum, Selection, Sorting and Heap Construction…
▽ More
We initiate a study of algorithms with a focus on the computational complexity of individual elements, and introduce the fragile complexity of comparison-based algorithms as the maximal number of comparisons any individual element takes part in. We give a number of upper and lower bounds on the fragile complexity for fundamental problems, including Minimum, Selection, Sorting and Heap Construction. The results include both deterministic and randomized upper and lower bounds, and demonstrate a separation between the two settings for a number of problems. The depth of a comparator network is a straight-forward upper bound on the worst case fragile complexity of the corresponding fragile algorithm. We prove that fragile complexity is a different and strictly easier property than the depth of comparator networks, in the sense that for some problems a fragile complexity equal to the best network depth can be achieved with less total work and that with randomization, even a lower fragile complexity is possible.
△ Less
Submitted 3 September, 2019; v1 submitted 9 January, 2019;
originally announced January 2019.
-
Detecting Human Interventions on the Landscape: KAZE Features, Poisson Point Processes, and a Construction Dataset
Authors:
Edward Boyda,
Colin McCormick,
Dan Hammer
Abstract:
We present an algorithm capable of identifying a wide variety of human-induced change on the surface of the planet by analyzing matches between local features in time-sequenced remote sensing imagery. We evaluate feature sets, match protocols, and the statistical modeling of feature matches. With application of KAZE features, k-nearest-neighbor descriptor matching, and geometric proximity and bi-d…
▽ More
We present an algorithm capable of identifying a wide variety of human-induced change on the surface of the planet by analyzing matches between local features in time-sequenced remote sensing imagery. We evaluate feature sets, match protocols, and the statistical modeling of feature matches. With application of KAZE features, k-nearest-neighbor descriptor matching, and geometric proximity and bi-directional match consistency checks, average match rates increase more than two-fold over the previous standard. In testing our platform, we developed a small, labeled benchmark dataset expressing large-scale residential, industrial, and civic construction, along with null instances, in California between the years 2010 and 2012. On the benchmark set, our algorithm makes precise, accurate change proposals on two-thirds of scenes. Further, the detection threshold can be tuned so that all or almost all proposed detections are true positives.
△ Less
Submitted 29 March, 2017;
originally announced March 2017.