-
Adapting the re-ID challenge for static sensors
Authors:
Avirath Sundaresan,
Jason R. Parham,
Jonathan Crall,
Rosemary Warungu,
Timothy Muthami,
Margaret Mwangi,
Jackson Miliko,
Jason Holmberg,
Tanya Y. Berger-Wolf,
Daniel Rubenstein,
Charles V. Stewart,
Sara Beery
Abstract:
In both 2016 and 2018, a census of the highly-endangered Grevy's zebra population was enabled by the Great Grevy's Rally (GGR), a citizen science event that produces population estimates via expert and algorithmic curation of volunteer-captured images. A complementary, scalable, and long-term Grevy's population monitoring approach involves deploying camera trap networks. However, in both scenarios…
▽ More
In both 2016 and 2018, a census of the highly-endangered Grevy's zebra population was enabled by the Great Grevy's Rally (GGR), a citizen science event that produces population estimates via expert and algorithmic curation of volunteer-captured images. A complementary, scalable, and long-term Grevy's population monitoring approach involves deploying camera trap networks. However, in both scenarios, a substantial majority of zebra images are not usable for individual identification due to poor in-the-wild imaging conditions; camera trap images in particular present high rates of occlusion and high spatio-temporal similarity within image bursts. Our proposed filtering pipeline incorporates animal detection, species identification, viewpoint estimation, quality evaluation, and temporal subsampling to obtain individual crops suitable for re-ID, which are subsequently curated by the LCA decision management algorithm. Our method processed images taken during GGR-16 and GGR-18 in Meru County, Kenya, into 4,142 highly-comparable annotations, requiring only 120 contrastive human decisions to produce a population estimate within 4.6% of the ground-truth count. Our method also efficiently processed 8.9M unlabeled camera trap images from 70 cameras at the Mpala Research Centre in Laikipia County, Kenya over two years into 685 encounters of 173 individuals, requiring only 331 contrastive human decisions.
△ Less
Submitted 29 November, 2024;
originally announced December 2024.
-
The Animal ID Problem: Continual Curation
Authors:
Charles V. Stewart,
Jason R. Parham,
Jason Holmberg,
Tanya Y. Berger-Wolf
Abstract:
Hoping to stimulate new research in individual animal identification from images, we propose to formulate the problem as the human-machine Continual Curation of images and animal identities. This is an open world recognition problem, where most new animals enter the system after its algorithms are initially trained and deployed. Continual Curation, as defined here, requires (1) an improvement in t…
▽ More
Hoping to stimulate new research in individual animal identification from images, we propose to formulate the problem as the human-machine Continual Curation of images and animal identities. This is an open world recognition problem, where most new animals enter the system after its algorithms are initially trained and deployed. Continual Curation, as defined here, requires (1) an improvement in the effectiveness of current recognition methods, (2) a pairwise verification algorithm that allows the possibility of no decision, and (3) an algorithmic decision mechanism that seeks human input to guide the curation process. Error metrics must evaluate the ability of recognition algorithms to identify not only animals that have been seen just once or twice but also recognize new animals not in the database. An important measure of overall system performance is accuracy as a function of the amount of human input required.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Mining and modeling complex leadership-followership dynamics of movement data
Authors:
Chainarong Amornbunchornvej,
Tanya Y. Berger-Wolf
Abstract:
Leadership and followership are essential parts of collective decision and organization in social animals, including humans. In nature, relationships of leaders and followers are dynamic and vary with context or temporal factors. Understanding dynamics of leadership and followership, such as how leaders and followers change, emerge, or converge, allows scientists to gain more insight into group de…
▽ More
Leadership and followership are essential parts of collective decision and organization in social animals, including humans. In nature, relationships of leaders and followers are dynamic and vary with context or temporal factors. Understanding dynamics of leadership and followership, such as how leaders and followers change, emerge, or converge, allows scientists to gain more insight into group decision-making and collective behavior in general. However, given only data of individual activities, it is challenging to infer the dynamics of leaders and followers. In this paper, we focus on mining and modeling frequent patterns of leading and following. We formalize new computational problems and propose a framework that can be used to address several questions regarding group movement. We use the leadership inference framework, mFLICA, to infer the time series of leaders and their factions from movement datasets and then propose an approach to mine and model frequent patterns of both leadership and followership dynamics. We evaluate our framework performance by using several simulated datasets, as well as the real-world dataset of baboon movement to demonstrate the applications of our framework. These are novel computational problems and, to the best of our knowledge, there are no existing comparable methods to address them. Thus, we modify and extend an existing leadership inference framework to provide a non-trivial baseline for comparison. Our framework performs better than this baseline in all datasets. Our framework opens the opportunities for scientists to generate testable scientific hypotheses about the dynamics of leadership in movement data.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Privacy Shadow: Measuring Node Predictability and Privacy Over Time
Authors:
Ivan Brugere,
Tanya y. Berger-Wolf
Abstract:
The structure of network data enables simple predictive models to leverage local correlations between nodes to high accuracy on tasks such as attribute and link prediction. While this is useful for building better user models, it introduces the privacy concern that a user's data may be re-inferred from the network structure, after they leave the application. We propose the privacy shadow for measu…
▽ More
The structure of network data enables simple predictive models to leverage local correlations between nodes to high accuracy on tasks such as attribute and link prediction. While this is useful for building better user models, it introduces the privacy concern that a user's data may be re-inferred from the network structure, after they leave the application. We propose the privacy shadow for measuring how long a user remains predictive from an arbitrary time within the network. Furthermore, we demonstrate that the length of the privacy shadow can be predicted for individual users in three real-world datasets.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Inferring Network Structure From Data
Authors:
Ivan Brugere,
Tanya Y. Berger-Wolf
Abstract:
Networks are complex models for underlying data in many application domains. In most instances, raw data is not natively in the form of a network, but derived from sensors, logs, images, or other data. Yet, the impact of the various choices in translating this data to a network have been largely unexamined. In this work, we propose a network model selection methodology that focuses on evaluating a…
▽ More
Networks are complex models for underlying data in many application domains. In most instances, raw data is not natively in the form of a network, but derived from sensors, logs, images, or other data. Yet, the impact of the various choices in translating this data to a network have been largely unexamined. In this work, we propose a network model selection methodology that focuses on evaluating a network's utility for varying tasks, together with an efficiency measure which selects the most parsimonious model. We demonstrate that this network definition matters in several ways for modeling the behavior of the underlying system.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Variable-lag Granger Causality for Time Series Analysis
Authors:
Chainarong Amornbunchornvej,
Elena Zheleva,
Tanya Y. Berger-Wolf
Abstract:
Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. However, the assumption of the fixed time delay does not hold…
▽ More
Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. However, the assumption of the fixed time delay does not hold in many applications, such as collective behavior, financial markets, and many natural phenomena. To address this issue, we develop variable-lag Granger causality, a generalization of Granger causality that relaxes the assumption of the fixed time delay and allows causes to influence effects with arbitrary time delays. In addition, we propose a method for inferring variable-lag Granger causality relations. We demonstrate our approach on an application for studying coordinated collective behavior and show that it performs better than several existing methods in both simulated and real-world datasets. Our approach can be applied in any domain of time series analysis.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Wildbook: Crowdsourcing, computer vision, and data science for conservation
Authors:
Tanya Y. Berger-Wolf,
Daniel I. Rubenstein,
Charles V. Stewart,
Jason A. Holmberg,
Jason Parham,
Sreejith Menon,
Jonathan Crall,
Jon Van Oast,
Emre Kiciman,
Lucas Joppa
Abstract:
Photographs, taken by field scientists, tourists, automated cameras, and incidental photographers, are the most abundant source of data on wildlife today. Wildbook is an autonomous computational system that starts from massive collections of images and, by detecting various species of animals and identifying individuals, combined with sophisticated data management, turns them into high resolution…
▽ More
Photographs, taken by field scientists, tourists, automated cameras, and incidental photographers, are the most abundant source of data on wildlife today. Wildbook is an autonomous computational system that starts from massive collections of images and, by detecting various species of animals and identifying individuals, combined with sophisticated data management, turns them into high resolution information database, enabling scientific inquiry, conservation, and citizen science.
We have built Wildbooks for whales (flukebook.org), sharks (whaleshark.org), two species of zebras (Grevy's and plains), and several others. In January 2016, Wildbook enabled the first ever full species (the endangered Grevy's zebra) census using photographs taken by ordinary citizens in Kenya. The resulting numbers are now the official species census used by IUCN Red List: http://www.iucnredlist.org/details/7950/0. In 2016, Wildbook partnered up with WWF to build Wildbook for Sea Turtles, Internet of Turtles (IoT), as well as systems for seals and lynx. Most recently, we have demonstrated that we can now use publicly available social media images to count and track wild animals.
In this paper we present and discuss both the impact and challenges that the use of crowdsourced images can have on wildlife conservation.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Network Model Selection Using Task-Focused Minimum Description Length
Authors:
Ivan Brugere,
Tanya Y. Berger-Wolf
Abstract:
Networks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or thei…
▽ More
Networks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or their impacts. We propose a task-focused network model selection methodology which addresses several key challenges. Our approach constructs network models from underlying data and uses minimum description length (MDL) criteria for selection. Our methodology measures efficiency, a general and comparable measure of the network's performance of a local (i.e. node-level) predictive task of interest. Selection on efficiency favors parsimonious (e.g. sparse) models to avoid overfitting and can be applied across arbitrary tasks and representations. We show stability, sensitivity, and significance testing in our methodology.
△ Less
Submitted 10 January, 2018; v1 submitted 14 October, 2017;
originally announced October 2017.
-
Network Model Selection for Task-Focused Attributed Network Inference
Authors:
Ivan Brugere,
Chris Kanich,
Tanya Y. Berger-Wolf
Abstract:
Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model sel…
▽ More
Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments.
△ Less
Submitted 16 September, 2017; v1 submitted 21 August, 2017;
originally announced August 2017.
-
Evaluating Social Networks Using Task-Focused Network Inference
Authors:
Ivan Brugere,
Chris Kanich,
Tanya Y. Berger-Wolf
Abstract:
Networks are representations of complex underlying social processes. However, the same given network may be more suitable to model one behavior of individuals than another. In many cases, aggregate population models may be more effective than modeling on the network. We present a general framework for evaluating the suitability of given networks for a set of predictive tasks of interest, compared…
▽ More
Networks are representations of complex underlying social processes. However, the same given network may be more suitable to model one behavior of individuals than another. In many cases, aggregate population models may be more effective than modeling on the network. We present a general framework for evaluating the suitability of given networks for a set of predictive tasks of interest, compared against alternative, networks inferred from data. We present several interpretable network models and measures for our comparison. We apply this general framework to the case study on collective classification of music preferences in a newly available dataset of the Last.fm social network.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
A General Framework For Task-Oriented Network Inference
Authors:
Ivan Brugere,
Chris Kanich,
Tanya Y. Berger-Wolf
Abstract:
We present a brief introduction to a flexible, general network inference framework which models data as a network space, sampled to optimize network structure to a particular task. We introduce a formal problem statement related to influence maximization in networks, where the network structure is not given as input, but learned jointly with an influence maximization solution.
We present a brief introduction to a flexible, general network inference framework which models data as a network space, sampled to optimize network structure to a particular task. We introduce a formal problem statement related to influence maximization in networks, where the network structure is not given as input, but learned jointly with an influence maximization solution.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Network Structure Inference, A Survey: Motivations, Methods, and Applications
Authors:
Ivan Brugere,
Brian Gallagher,
Tanya Y. Berger-Wolf
Abstract:
Networks represent relationships between entities in many complex systems, spanning from online social interactions to biological cell development and brain connectivity. In many cases, relationships between entities are unambiguously known: are two users 'friends' in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? T…
▽ More
Networks represent relationships between entities in many complex systems, spanning from online social interactions to biological cell development and brain connectivity. In many cases, relationships between entities are unambiguously known: are two users 'friends' in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak in a population?
Existing approaches for inferring networks from data are found across many application domains and use specialized knowledge to infer and measure the quality of inferred network for a specific task or hypothesis. However, current research lacks a rigorous methodology which employs standard statistical validation on inferred models. In this survey, we examine (1) how network representations are constructed from underlying data, (2) the variety of questions and tasks on these representations over several domains, and (3) validation strategies for measuring the inferred network's capability of answering questions on the system of interest.
△ Less
Submitted 19 January, 2018; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Coordination Event Detection and Initiator Identification in Time Series Data
Authors:
Chainarong Amornbunchornvej,
Ivan Brugere,
Ariana Strandburg-Peshkin,
Damien Farine,
Margaret C. Crofoot,
Tanya Y. Berger-Wolf
Abstract:
Behavior initiation is a form of leadership and is an important aspect of social organization that affects the processes of group formation, dynamics, and decision-making in human societies and other social animal species. In this work, we formalize the "Coordination Initiator Inference Problem" and propose a simple yet powerful framework for extracting periods of coordinated activity and determin…
▽ More
Behavior initiation is a form of leadership and is an important aspect of social organization that affects the processes of group formation, dynamics, and decision-making in human societies and other social animal species. In this work, we formalize the "Coordination Initiator Inference Problem" and propose a simple yet powerful framework for extracting periods of coordinated activity and determining individuals who initiated this coordination, based solely on the activity of individuals within a group during those periods. The proposed approach, given arbitrary individual time series, automatically (1) identifies times of coordinated group activity, (2) determines the identities of initiators of those activities, and (3) classifies the likely mechanism by which the group coordination occurred, all of which are novel computational tasks. We demonstrate our framework on both simulated and real-world data: trajectories tracking of animals as well as stock market data. Our method is competitive with existing global leadership inference methods but provides the first approaches for local leadership and coordination mechanism classification. Our results are consistent with ground-truthed biological data and the framework finds many known events in financial data which are not otherwise reflected in the aggregate NASDAQ index. Our method is easily generalizable to any coordinated time-series data from interacting entities.
△ Less
Submitted 23 November, 2019; v1 submitted 4 March, 2016;
originally announced March 2016.
-
Benefits of Bias: Towards Better Characterization of Network Sampling
Authors:
Arun S. Maiya,
Tanya Y. Berger-Wolf
Abstract:
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as th…
▽ More
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally, we describe how these sampling biases can be exploited in several, real-world applications including disease outbreak detection and market research.
△ Less
Submitted 18 September, 2011;
originally announced September 2011.
-
An Implicit Cover Problem in Wild Population Study
Authors:
Mary V. Ashley,
Tanya Y. Berger-Wolf,
Wanpracha Chaovalitwongse,
Bhaskar DasGupta,
Ashfaq Khokhar,
Saad Sheikh
Abstract:
In an implicit combinatorial optimization problem, the constraints are not enumerated explicitly but rather stated implicitly through equations, other constraints or auxiliary algorithms. An important subclass of such problems is the implicit set cover (or, equivalently, hitting set) problem in which the sets are not given explicitly but rather defined implicitly For example, the well-known minimu…
▽ More
In an implicit combinatorial optimization problem, the constraints are not enumerated explicitly but rather stated implicitly through equations, other constraints or auxiliary algorithms. An important subclass of such problems is the implicit set cover (or, equivalently, hitting set) problem in which the sets are not given explicitly but rather defined implicitly For example, the well-known minimum feedback arc set problem is such a problem. In this paper, we consider such a cover problem that arises in the study of wild populations in biology in which the sets are defined implicitly via the Mendelian constraints and prove approximability results for this problem.
△ Less
Submitted 26 February, 2011;
originally announced February 2011.
-
Expansion and Search in Networks
Authors:
Arun S. Maiya,
Tanya Y. Berger-Wolf
Abstract:
Borrowing from concepts in expander graphs, we study the expansion properties of real-world, complex networks (e.g. social networks, unstructured peer-to-peer or P2P networks) and the extent to which these properties can be exploited to understand and address the problem of decentralized search. We first produce samples that concisely capture the overall expansion properties of an entire network,…
▽ More
Borrowing from concepts in expander graphs, we study the expansion properties of real-world, complex networks (e.g. social networks, unstructured peer-to-peer or P2P networks) and the extent to which these properties can be exploited to understand and address the problem of decentralized search. We first produce samples that concisely capture the overall expansion properties of an entire network, which we collectively refer to as the expansion signature. Using these signatures, we find a correspondence between the magnitude of maximum expansion and the extent to which a network can be efficiently searched. We further find evidence that standard graph-theoretic measures, such as average path length, fail to fully explain the level of "searchability" or ease of information diffusion and dissemination in a network. Finally, we demonstrate that this high expansion can be leveraged to facilitate decentralized search in networks and show that an expansion-based search strategy outperforms typical search methods.
△ Less
Submitted 1 September, 2011; v1 submitted 22 September, 2010;
originally announced September 2010.
-
Sharp Bounds for Bandwidth of Clique Products
Authors:
Tanya Y. Berger-Wolf,
Mitchell A. Harris
Abstract:
The bandwidth of a graph is the labeling of vertices with minimum maximum edge difference. For many graph families this is NP-complete. A classic result computes the bandwidth for the hypercube. We generalize this result to give sharp lower bounds for products of cliques. This problem turns out to be equivalent to one in communication over multiple channels in which channels can fail and the inf…
▽ More
The bandwidth of a graph is the labeling of vertices with minimum maximum edge difference. For many graph families this is NP-complete. A classic result computes the bandwidth for the hypercube. We generalize this result to give sharp lower bounds for products of cliques. This problem turns out to be equivalent to one in communication over multiple channels in which channels can fail and the information sent over those channels is lost. The goal is to create an encoding that minimizes the difference between the received and the original information while having as little redundancy as possible. Berger-Wolf and Reingold [2] have considered the problem for the equal size cliques (or equal capacity channels). This paper presents a tight lower bound and an algorithm for constructing the labeling for the product of any number of arbitrary size cliques.
△ Less
Submitted 28 May, 2003;
originally announced May 2003.
-
Index Assignment for Multichannel Communication under Failure
Authors:
Tanya Y. Berger-Wolf,
Edward M. Reingold
Abstract:
We consider the problem of multiple description scalar quantizers and describing the achievable rate-distortion tuples in that setting. We formulate it as a combinatorial optimization problem of arranging numbers in a matrix to minimize the maximum difference between the largest and the smallest number in any row or column. We develop a technique for deriving lower bounds on the distortion at gi…
▽ More
We consider the problem of multiple description scalar quantizers and describing the achievable rate-distortion tuples in that setting. We formulate it as a combinatorial optimization problem of arranging numbers in a matrix to minimize the maximum difference between the largest and the smallest number in any row or column. We develop a technique for deriving lower bounds on the distortion at given channel rates. The approach is constructive, thus allowing an algorithm that gives a closely matching upper bound. For the case of two communication channels with equal rates, the bounds coincide, thus giving the precise lowest achievable distortion at fixed rates. The bounds are within a small constant for higher number of channels. To the best of our knowledge, this is the first result concerning systems with more than two communication channels. The problem is also equivalent to the bandwidth minimization problem of Hamming graphs.
△ Less
Submitted 30 November, 2000;
originally announced November 2000.