-
Forecasting Local Ionospheric Parameters Using Transformers
Authors:
Daniel J. Alford-Lago,
Christopher W. Curtis,
Alexander T. Ihler,
Katherine A. Zawdie,
Douglas P. Drob
Abstract:
We present a novel method for forecasting key ionospheric parameters using transformer-based neural networks. The model provides accurate forecasts and uncertainty quantification of the F2-layer peak plasma frequency (foF2), the F2-layer peak density height (hmF2), and total electron content (TEC) for a given geographic location. It supports a number of exogenous variables, including F10.7cm solar…
▽ More
We present a novel method for forecasting key ionospheric parameters using transformer-based neural networks. The model provides accurate forecasts and uncertainty quantification of the F2-layer peak plasma frequency (foF2), the F2-layer peak density height (hmF2), and total electron content (TEC) for a given geographic location. It supports a number of exogenous variables, including F10.7cm solar flux and disturbance storm time (Dst). We demonstrate how transformers can be trained in a data assimilation-like fashion that use these exogenous variables along with naïve predictions from climatology to generate 24-hour forecasts with non-parametric uncertainty bounds. We call this method the Local Ionospheric Forecast Transformer (LIFT). We demonstrate that the trained model can generalize to new geographic locations and time periods not seen during training, and we compare its performance to that of the International Reference Ionosphere (IRI).
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Graph-based Complexity for Causal Effect by Empirical Plug-in
Authors:
Rina Dechter,
Annie Raichev,
Alexander Ihler,
Jin Tian
Abstract:
This paper focuses on the computational complexity of computing empirical plug-in estimates for causal effect queries. Given a causal graph and observational data, any identifiable causal query can be estimated from an expression over the observed variables, called the estimand. The estimand can then be evaluated by plugging in probabilities computed empirically from data. In contrast to conventio…
▽ More
This paper focuses on the computational complexity of computing empirical plug-in estimates for causal effect queries. Given a causal graph and observational data, any identifiable causal query can be estimated from an expression over the observed variables, called the estimand. The estimand can then be evaluated by plugging in probabilities computed empirically from data. In contrast to conventional wisdom, which assumes that high dimensional probabilistic functions will lead to exponential evaluation time of the estimand. We show that computation can be done efficiently, potentially in time linear in the data size, depending on the estimand's hypergraph.
In particular, we show that both the treewidth and hypertree width of the estimand's structure bound the evaluation complexity of the plug-in estimands, analogous to their role in the complexity of probabilistic inference in graphical models. Often, the hypertree width provides a more effective bound, since the empirical distributions are sparse.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Estimating Causal Effects from Learned Causal Networks
Authors:
Anna Raichev,
Alexander Ihler,
Jin Tian,
Rina Dechter
Abstract:
The standard approach to answering an identifiable causal-effect query (e.g., $P(Y|do(X)$) when given a causal diagram and observational data is to first generate an estimand, or probabilistic expression over the observable variables, which is then evaluated using the observational data. In this paper, we propose an alternative paradigm for answering causal-effect queries over discrete observable…
▽ More
The standard approach to answering an identifiable causal-effect query (e.g., $P(Y|do(X)$) when given a causal diagram and observational data is to first generate an estimand, or probabilistic expression over the observable variables, which is then evaluated using the observational data. In this paper, we propose an alternative paradigm for answering causal-effect queries over discrete observable variables. We propose to instead learn the causal Bayesian network and its confounding latent variables directly from the observational data. Then, efficient probabilistic graphical model (PGM) algorithms can be applied to the learned model to answer queries. Perhaps surprisingly, we show that this \emph{model completion} learning approach can be more effective than estimand approaches, particularly for larger models in which the estimand expressions become computationally difficult.
We illustrate our method's potential using a benchmark collection of Bayesian networks and synthetically generated causal models.
△ Less
Submitted 27 August, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
A Hybrid Reactive Routing Protocol for Decentralized UAV Networks
Authors:
Shivam Garg,
Alexander Ihler,
Elizabeth Serena Bentley,
Sunil Kumar
Abstract:
Wireless networks consisting of low SWaP, FW-UAVs are used in many applications, such as monitoring, search and surveillance of inaccessible areas. A decentralized and autonomous approach ensures robustness to failures; the UAVs explore and sense within the area and forward their information, in a multihop manner, to nearby aerial gateway nodes. However, the unpredictable nature of the events, rel…
▽ More
Wireless networks consisting of low SWaP, FW-UAVs are used in many applications, such as monitoring, search and surveillance of inaccessible areas. A decentralized and autonomous approach ensures robustness to failures; the UAVs explore and sense within the area and forward their information, in a multihop manner, to nearby aerial gateway nodes. However, the unpredictable nature of the events, relatively high speed of UAVs, and dynamic UAV trajectories cause the network topology to change significantly over time, resulting in frequent route breaks. A holistic routing approach is needed to support multiple traffic flows in these networks to provide mobility- and congestion-aware, high-quality routes when needed, with low control and computational overheads, using the information collected in a distributed manner. Existing routing schemes do not address all the mentioned issues.
We present a hybrid reactive routing protocol for decentralized UAV networks. Our scheme searches routes on-demand, monitors a region around the selected route (the pipe), and proactively switches to an alternative route before the current route's quality degrades below a threshold. We empirically evaluate the impact of pipe width and node density on our ability to find alternate high-quality routes within the pipe and the overhead required to maintain the pipe. Compared to existing reactive routing schemes, our approach achieves higher throughput and reduces the number of route discoveries, overhead, and resulting flow interruptions at different traffic loads, node densities and speeds. Despite having limited network topology information, and low overhead and route computation complexity, our proposed scheme achieves superior throughput to proactive optimized link state routing scheme at different network and traffic settings. We also evaluate the relative performance of reactive and proactive routing schemes.
△ Less
Submitted 23 January, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Pipe Routing with Topology Control for UAV Networks
Authors:
Shreyas Devaraju,
Shivam Garg,
Alexander Ihler,
Sunil Kumar
Abstract:
Routing protocols help in transmitting the sensed data from UAVs monitoring the targets (called target UAVs) to the BS. However, the highly dynamic nature of an autonomous, decentralized UAV network leads to frequent route breaks or traffic disruptions. Traditional routing schemes cannot quickly adapt to dynamic UAV networks and/or incur large control overhead and delays. To establish stable, high…
▽ More
Routing protocols help in transmitting the sensed data from UAVs monitoring the targets (called target UAVs) to the BS. However, the highly dynamic nature of an autonomous, decentralized UAV network leads to frequent route breaks or traffic disruptions. Traditional routing schemes cannot quickly adapt to dynamic UAV networks and/or incur large control overhead and delays. To establish stable, high-quality routes from target UAVs to the BS, we design a hybrid reactive routing scheme called pipe routing that is mobility, congestion, and energy-aware. The pipe routing scheme discovers routes on-demand and proactively switches to alternate high-quality routes within a limited region around the active routes (called the pipe) when needed, reducing the number of route breaks and increasing data throughput. We then design a novel topology control-based pipe routing scheme to maintain robust connectivity in the pipe region around the active routes, leading to improved route stability and increased throughput with minimal impact on the coverage performance of the UAV network.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
A Deep Q-Learning based, Base-Station Connectivity-Aware, Decentralized Pheromone Mobility Model for Autonomous UAV Networks
Authors:
Shreyas Devaraju,
Alexander Ihler,
Sunil Kumar
Abstract:
UAV networks consisting of low SWaP (size, weight, and power), fixed-wing UAVs are used in many applications, including area monitoring, search and rescue, surveillance, and tracking. Performing these operations efficiently requires a scalable, decentralized, autonomous UAV network architecture with high network connectivity. Whereas fast area coverage is needed for quickly sensing the area, stron…
▽ More
UAV networks consisting of low SWaP (size, weight, and power), fixed-wing UAVs are used in many applications, including area monitoring, search and rescue, surveillance, and tracking. Performing these operations efficiently requires a scalable, decentralized, autonomous UAV network architecture with high network connectivity. Whereas fast area coverage is needed for quickly sensing the area, strong node degree and base station (BS) connectivity are needed for UAV control and coordination and for transmitting sensed information to the BS in real time. However, the area coverage and connectivity exhibit a fundamental trade-off: maintaining connectivity restricts the UAVs' ability to explore. In this paper, we first present a node degree and BS connectivity-aware distributed pheromone (BS-CAP) mobility model to autonomously coordinate the UAV movements in a decentralized UAV network. This model maintains a desired connectivity among 1-hop neighbors and to the BS while achieving fast area coverage. Next, we propose a deep Q-learning policy based BS-CAP model (BSCAP-DQN) to further tune and improve the coverage and connectivity trade-off. Since it is not practical to know the complete topology of such a network in real time, the proposed mobility models work online, are fully distributed, and rely on neighborhood information. Our simulations demonstrate that both proposed models achieve efficient area coverage and desired node degree and BS connectivity, improving significantly over existing schemes.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Authors:
Bobak Pezeshki,
Radu Marinescu,
Alexander Ihler,
Rina Dechter
Abstract:
Scientific computing has experienced a surge empowered by advancements in technologies such as neural networks. However, certain important tasks are less amenable to these technologies, benefiting from innovations to traditional inference schemes. One such task is protein re-design. Recently a new re-design algorithm, AOBB-K*, was introduced and was competitive with state-of-the-art BBK* on small…
▽ More
Scientific computing has experienced a surge empowered by advancements in technologies such as neural networks. However, certain important tasks are less amenable to these technologies, benefiting from innovations to traditional inference schemes. One such task is protein re-design. Recently a new re-design algorithm, AOBB-K*, was introduced and was competitive with state-of-the-art BBK* on small protein re-design problems. However, AOBB-K* did not scale well. In this work we focus on scaling up AOBB-K* and introduce three new versions: AOBB-K*-b (boosted), AOBB-K*-DH (with dynamic heuristics), and AOBB-K*-UFO (with underflow optimization) that significantly enhance scalability.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Connectivity-Aware Pheromone Mobility Model for Autonomous UAV Networks
Authors:
Shreyas Devaraju,
Alexander Ihler,
Sunil Kumar
Abstract:
UAV networks consisting of reduced size, weight, and power (low SWaP) fixed-wing UAVs are used for civilian and military applications such as search and rescue, surveillance, and tracking. To carry out these operations efficiently, there is a need to develop scalable, decentralized autonomous UAV network architectures with high network connectivity. However, the area coverage and the network conne…
▽ More
UAV networks consisting of reduced size, weight, and power (low SWaP) fixed-wing UAVs are used for civilian and military applications such as search and rescue, surveillance, and tracking. To carry out these operations efficiently, there is a need to develop scalable, decentralized autonomous UAV network architectures with high network connectivity. However, the area coverage and the network connectivity requirements exhibit a fundamental trade-off. In this paper, a connectivity-aware pheromone mobility (CAP) model is designed for search and rescue operations, which is capable of maintaining connectivity among UAVs in the network. We use stigmergy-based digital pheromone maps along with distance-based local connectivity information to autonomously coordinate the UAV movements, in order to improve its map coverage efficiency while maintaining high network connectivity.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Design Amortization for Bayesian Optimal Experimental Design
Authors:
Noble Kennamer,
Steven Walton,
Alexander Ihler
Abstract:
Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this work we build off of…
▽ More
Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this work we build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the EIG. Past work focused on learning a new variational model from scratch for each new design considered. Here we present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs. To further improve computational efficiency, we also propose to train the variational model on a significantly cheaper-to-evaluate lower bound, and show empirically that the resulting model provides an excellent guide for more accurate, but expensive to evaluate bounds on the EIG. We demonstrate the effectiveness of our technique on generalized linear models, a class of statistical models that is widely used in the analysis of controlled experiments. Experiments show that our method is able to greatly improve accuracy over existing approximation strategies, and achieve these results with far better sample efficiency.
△ Less
Submitted 19 October, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
Authors:
Litian Liang,
Yaosheng Xu,
Stephen McAleer,
Dailin Hu,
Alexander Ihler,
Pieter Abbeel,
Roy Fox
Abstract:
In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In…
▽ More
In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble means. Despite its simplicity, MeanQ shows remarkable sample efficiency in experiments on the Atari Learning Environment benchmark. Importantly, we find that an ensemble of size 5 sufficiently reduces estimation variance to obviate the lagging target network, eliminating it as a source of bias and further gaining sample efficiency. We justify intuitively and empirically the design choices in MeanQ, including the necessity of independent experience sampling. On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average. MeanQ also outperforms Rainbow DQN at 500K steps in 21/26 environments, and by 49% on average, and achieves average human-level performance using 200K ($\pm$100K) interaction steps. Our implementation is available at https://github.com/indylab/MeanQ.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Scale-Separated Dynamic Mode Decomposition and Ionospheric Forecasting
Authors:
Daniel J. Alford-Lago,
Christopher W. Curtis,
Alexander T. Ihler,
Katherine A. Zawdie
Abstract:
We present a method for forecasting the foF2 and hmF2 parameters using modal decompositions of ionospheric electron density profile time series. Our method is based on the Dynamic Mode Decomposition (DMD), which provides a means of determining spatiotemporal modes from measurements alone. DMD models are easily updated as new data is recorded and do not require any physics to inform the dynamics. H…
▽ More
We present a method for forecasting the foF2 and hmF2 parameters using modal decompositions of ionospheric electron density profile time series. Our method is based on the Dynamic Mode Decomposition (DMD), which provides a means of determining spatiotemporal modes from measurements alone. DMD models are easily updated as new data is recorded and do not require any physics to inform the dynamics. However, in the case of ionospheric profiles, we find a wide range of oscillations, including some far above the diurnal frequency. Therefore, we propose nontrivial extensions to DMD using wavelet decompositions. We call this method the Scale-Separated Dynamic Mode Decomposition (SSDMD) since the wavelets isolate fluctuations at different time scales in the data into separated components. We show that this method provides a stable reconstruction of the peak plasma density and can be used to predict the state of foF2 and hmF2 at future time steps. We demonstrate the SSDMD method on data sets covering periods of high and low solar activity as well as low, mid, and high latitude locations.
△ Less
Submitted 5 September, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Accurate Link Lifetime Computation in Autonomous Airborne UAV Networks
Authors:
Shivam Garg,
Alexander Ihler,
Sunil Kumar
Abstract:
An autonomous airborne network (AN) consists of multiple unmanned aerial vehicles (UAVs), which can self-configure to provide seamless, low-cost and secure connectivity. AN is preferred for applications in civilian and military sectors because it can improve the network reliability and fault tolerance, reduce mission completion time through collaboration, and adapt to dynamic mission requirements.…
▽ More
An autonomous airborne network (AN) consists of multiple unmanned aerial vehicles (UAVs), which can self-configure to provide seamless, low-cost and secure connectivity. AN is preferred for applications in civilian and military sectors because it can improve the network reliability and fault tolerance, reduce mission completion time through collaboration, and adapt to dynamic mission requirements. However, facilitating seamless communication in such ANs is a challenging task due to their fast node mobility, which results in frequent link disruptions. Many existing AN-specific mobility-aware schemes restrictively assume that UAVs fly in straight lines, to reduce the high uncertainty in the mobility pattern and simplify the calculation of link lifetime (LLT). Here, LLT represents the duration after which the link between a node pair terminates. However, the application of such schemes is severely limited, which makes them unsuitable for practical autonomous ANs.
In this report, a mathematical framework is described to accurately compute the \textit{LLT} value for a UAV node pair, where each node flies independently in a randomly selected smooth trajectory. In addition, the impact of random trajectory changes on LLT accuracy is also discussed.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Authors:
Litian Liang,
Yaosheng Xu,
Stephen McAleer,
Dailin Hu,
Alexander Ihler,
Pieter Abbeel,
Roy Fox
Abstract:
Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unfamiliar state. Estimation noise becomes a bias after the max operator in the policy improvement step, and carries over to value estimations of other s…
▽ More
Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unfamiliar state. Estimation noise becomes a bias after the max operator in the policy improvement step, and carries over to value estimations of other states, causing Q-Learning to overestimate the Q value. Algorithms like Soft Q-Learning (SQL) introduce the notion of a soft-greedy policy, which reduces the estimation bias via soft updates in early stages of training. However, the inverse temperature $β$ that controls the softness of an update is usually set by a hand-designed heuristic, which can be inaccurate at capturing the uncertainty in the target estimate. Under the belief that $β$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $β$ by maintaining a collection of the model parameters that characterizes model uncertainty. In this paper, we present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state space Markov Decision Processes. We also provide a principled numerical scheduling of $β$, extended from SQL and using model uncertainty, during the optimization process. We show the theoretical guarantees and the effectiveness of this update method in experiments on several discrete control environments.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Deep Learning Enhanced Dynamic Mode Decomposition
Authors:
Daniel J. Alford-Lago,
Christopher W. Curtis,
Alexander T. Ihler,
Opal Issan
Abstract:
Koopman operator theory shows how nonlinear dynamical systems can be represented as an infinite-dimensional, linear operator acting on a Hilbert space of observables of the system. However, determining the relevant modes and eigenvalues of this infinite-dimensional operator can be difficult. The extended dynamic mode decomposition (EDMD) is one such method for generating approximations to Koopman…
▽ More
Koopman operator theory shows how nonlinear dynamical systems can be represented as an infinite-dimensional, linear operator acting on a Hilbert space of observables of the system. However, determining the relevant modes and eigenvalues of this infinite-dimensional operator can be difficult. The extended dynamic mode decomposition (EDMD) is one such method for generating approximations to Koopman spectra and modes, but the EDMD method faces its own set of challenges due to the need of user defined observables. To address this issue, we explore the use of autoencoder networks to simultaneously find optimal families of observables which also generate both accurate embeddings of the flow into a space of observables and submersions of the observables back into flow coordinates. This network results in a global transformation of the flow and affords future state prediction via the EDMD and the decoder network. We call this method the deep learning dynamic mode decomposition (DLDMD). The method is tested on canonical nonlinear data sets and is shown to produce results that outperform a standard DMD approach and enable data-driven prediction where the standard DMD fails.
△ Less
Submitted 15 March, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients
Authors:
Noble Kennamer,
Emille E. O. Ishida,
Santiago Gonzalez-Gaitan,
Rafael S. de Souza,
Alexander Ihler,
Kara Ponder,
Ricardo Vilalta,
Anais Moller,
David O. Jones,
Mi Dai,
Alberto Krone-Martins,
Bruno Quint,
Sreevarsha Sreejith,
Alex I. Malz,
Lluis Galbany
Abstract:
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and…
▽ More
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms.
△ Less
Submitted 26 October, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Learning Infinite RBMs with Frank-Wolfe
Authors:
Wei Ping,
Qiang Liu,
Alexander Ihler
Abstract:
In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite…
▽ More
In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.
△ Less
Submitted 14 October, 2017;
originally announced October 2017.
-
Multi-Person Pose Estimation via Column Generation
Authors:
Shaofei Wang,
Chong Zhang,
Miguel A. Gonzalez-Ballester,
Alexander Ihler,
Julian Yarkony
Abstract:
We study the problem of multi-person pose estimation in natural images. A pose estimate describes the spatial position and identity (head, foot, knee, etc.) of every non-occluded body part of a person. Pose estimation is difficult due to issues such as deformation and variation in body configurations and occlusion of parts, while multi-person settings add complications such as an unknown number of…
▽ More
We study the problem of multi-person pose estimation in natural images. A pose estimate describes the spatial position and identity (head, foot, knee, etc.) of every non-occluded body part of a person. Pose estimation is difficult due to issues such as deformation and variation in body configurations and occlusion of parts, while multi-person settings add complications such as an unknown number of people, with unknown appearance and possible interactions in their poses and part locations. We give a novel integer program formulation of the multi-person pose estimation problem, in which variables correspond to assignments of parts in the image to poses in a two-tier, hierarchical way. This enables us to develop an efficient custom optimization procedure based on column generation, where columns are produced by exact optimization of very small scale integer programs. We demonstrate improved accuracy and speed for our method on the MPII multi-person pose estimation benchmark.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
Belief Propagation in Conditional RBMs for Structured Prediction
Authors:
Wei Ping,
Alexander Ihler
Abstract:
Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in…
▽ More
Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
Decomposition Bounds for Marginal MAP
Authors:
Wei Ping,
Qiang Liu,
Alexander Ihler
Abstract:
Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information. It is significantly more difficult than pure marginalization and MAP tasks, for which a large class of efficient and convergent variational algorithms, such as dual decomposition, exist. In this work, we generalize dual decomposition to a generic power sum inference task, which i…
▽ More
Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information. It is significantly more difficult than pure marginalization and MAP tasks, for which a large class of efficient and convergent variational algorithms, such as dual decomposition, exist. In this work, we generalize dual decomposition to a generic power sum inference task, which includes marginal MAP, along with pure marginalization and MAP, as special cases. Our method is based on a block coordinate descent algorithm on a new convex decomposition bound, that is guaranteed to converge monotonically, and can be parallelized efficiently. We demonstrate our approach on marginal MAP queries defined on real-world problems from the UAI approximate inference challenge, showing that our framework is faster and more reliable than previous methods.
△ Less
Submitted 9 November, 2015;
originally announced November 2015.
-
An Evaluation of Sparse Inverse Covariance Models for Group Functional Connectivity in Molecular Imaging
Authors:
David B. Keator,
Alexander Ihler
Abstract:
Evaluating the functional relationships between brain regions measured with neuroimaging provides insight into how the brain is sharing information at a macro scale. Many functional connectivity methods have been developed for dynamic imaging modalities such as functional MRI (fMRI), while less work has focused on models for static molecular imaging techniques such as FDG-PET and Tc-99m HMPAO SPEC…
▽ More
Evaluating the functional relationships between brain regions measured with neuroimaging provides insight into how the brain is sharing information at a macro scale. Many functional connectivity methods have been developed for dynamic imaging modalities such as functional MRI (fMRI), while less work has focused on models for static molecular imaging techniques such as FDG-PET and Tc-99m HMPAO SPECT across groups of individuals. In this work we provide a quantitative assessment of how well three functional connec- tivity models based on sparse inverse covariance estimation can accurately recover gold standard connectivity patterns across multiple cohorts and data set sizes. We compare the accuracies of learning regularized inverse covariance matrices across cohorts independently with those learned using two different group-based regular- ization models. By using large cohorts of SPECT scans, we are able to provide a quantitative assessment of the accuracy of the models in recovering the gold standard functional conn
△ Less
Submitted 28 October, 2015;
originally announced October 2015.
-
Distributed Estimation, Information Loss and Exponential Families
Authors:
Qiang Liu,
Alexander Ihler
Abstract:
Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole data…
▽ More
Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole dataset. We study this framework's statistical properties, showing that the efficiency loss compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the "full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of both methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions.
△ Less
Submitted 9 October, 2014;
originally announced October 2014.
-
Marginal Structured SVM with Hidden Variables
Authors:
Wei Ping,
Qiang Liu,
Alexander Ihler
Abstract:
In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. MSSVM properly accounts for the uncertainty of hidden variables, and can significantly outperform the previously proposed latent structured SVM (LSSVM; Yu & Joachims (2009)) and other state-of-art methods, especially when that uncertainty is large. Our method also results in a smoother obj…
▽ More
In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. MSSVM properly accounts for the uncertainty of hidden variables, and can significantly outperform the previously proposed latent structured SVM (LSSVM; Yu & Joachims (2009)) and other state-of-art methods, especially when that uncertainty is large. Our method also results in a smoother objective function, making gradient-based optimization of MSSVMs converge significantly faster than for LSSVMs. We also show that our method consistently outperforms hidden conditional random fields (HCRFs; Quattoni et al. (2007)) on both simulated and real-world datasets. Furthermore, we propose a unified framework that includes both our and several other existing methods as special cases, and provides insights into the comparison of different models in practice.
△ Less
Submitted 5 September, 2014; v1 submitted 4 September, 2014;
originally announced September 2014.
-
Variational Algorithms for Marginal MAP
Authors:
Qiang Liu,
Alexander Ihler
Abstract:
The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less att…
▽ More
The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of "mixed-product" message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel "argmax-product" message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods.
△ Less
Submitted 17 July, 2013; v1 submitted 26 February, 2013;
originally announced February 2013.
-
A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation
Authors:
Max Welling,
Andrew E. Gelfand,
Alexander T. Ihler
Abstract:
We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1]. However, in contrast to the latter, the CCE enjoys the following important qualities: 1) it is defined for arbitrary state spaces 2) it is easily extended to fixed points of generalized belief pr…
▽ More
We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1]. However, in contrast to the latter, the CCE enjoys the following important qualities: 1) it is defined for arbitrary state spaces 2) it is easily extended to fixed points of generalized belief propagation (GBP), 3) disconnected groups of variables will not contribute to the CCE and 4) the accuracy of the expansion empirically improves upon that of the LS. The CCE is based on the same Möbius transform as the Kikuchi approximation, but unlike GBP does not require storing the beliefs of the GBP-clusters nor does it suffer from convergence issues during belief updating.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Belief Propagation for Structured Decision Making
Authors:
Qiang Liu,
Alexander T. Ihler
Abstract:
Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized…
▽ More
Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized and decentralized (or multi-agent) decisions. In this work, we present a general variational framework for solving structured cooperative decision-making problems, use it to propose several belief propagation-like algorithms, and analyze them both theoretically and empirically.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Join-graph based cost-shifting schemes
Authors:
Alexander T. Ihler,
Natalia Flerova,
Rina Dechter,
Lars Otten
Abstract:
We develop several algorithms taking advantage of two common approaches for bounding MPE queries in graphical models: minibucket elimination and message-passing updates for linear programming relaxations. Both methods are quite similar, and offer useful perspectives for the other; our hybrid approaches attempt to balance the advantages of each. We demonstrate the power of our hybrid algorithms thr…
▽ More
We develop several algorithms taking advantage of two common approaches for bounding MPE queries in graphical models: minibucket elimination and message-passing updates for linear programming relaxations. Both methods are quite similar, and offer useful perspectives for the other; our hybrid approaches attempt to balance the advantages of each. We demonstrate the power of our hybrid algorithms through extensive empirical evaluation. Most notably, a Branch and Bound search guided by the heuristic function calculated by one of our new algorithms has recently won first place in the PASCAL2 inference challenge.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Fast Planar Correlation Clustering for Image Segmentation
Authors:
Julian Yarkony,
Alexander T. Ihler,
Charless C. Fowlkes
Abstract:
We describe a new optimization scheme for finding high-quality correlation clusterings in planar graphs that uses weighted perfect matching as a subroutine. Our method provides lower-bounds on the energy of the optimal correlation clustering that are typically fast to compute and tight in practice. We demonstrate our algorithm on the problem of image segmentation where this approach outperforms ex…
▽ More
We describe a new optimization scheme for finding high-quality correlation clusterings in planar graphs that uses weighted perfect matching as a subroutine. Our method provides lower-bounds on the energy of the optimal correlation clustering that are typically fast to compute and tight in practice. We demonstrate our algorithm on the problem of image segmentation where this approach outperforms existing global optimization techniques in minimizing the objective and is competitive with the state of the art in producing high-quality segmentations.
△ Less
Submitted 1 August, 2012;
originally announced August 2012.
-
Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation
Authors:
Ian Porteous,
Alexander T. Ihler,
Padhraic Smyth,
Max Welling
Abstract:
Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mix…
▽ More
Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infinite mixture models (e.g. over time) at the level of their parameters (i.e. the dependent Dirichlet process model). However, Gibbs samplers for infinite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Distributed Parameter Estimation via Pseudo-likelihood
Authors:
Qiang Liu,
Alexander Ihler
Abstract:
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudo-likelihood components, encompassing a number of combination methods, and provide both theoretical and experimental analys…
▽ More
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudo-likelihood components, encompassing a number of combination methods, and provide both theoretical and experimental analysis. We show that simple linear combination or max-voting methods, when combined with second-order information, are statistically competitive with more advanced and costly joint optimization. Our algorithms have many attractive properties including low communication and computational cost and "any-time" behavior.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Accuracy Bounds for Belief Propagation
Authors:
Alexander T. Ihler
Abstract:
The belief propagation (BP) algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in BP and new results on approximations in binary systems, we deriv…
▽ More
The belief propagation (BP) algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in BP and new results on approximations in binary systems, we derive a bound on the error in BP's estimates for pairwise Markov random fields over discrete valued random variables. Our bound is relatively simple to compute, and compares favorably with a previous method of bounding the accuracy of BP.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Adaptive Inference on General Graphical Models
Authors:
Umut A. Acar,
Alexander T. Ihler,
Ramgopal Mettu,
Ozgur Sumer
Abstract:
Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adapt…
▽ More
Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional probabilities and dependencies in logarithmic time. We give experimental results for an implementation of our algorithm, and demonstrate its potential performance benefit in the study of protein structure.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Negative Tree Reweighted Belief Propagation
Authors:
Qiang Liu,
Alexander T. Ihler
Abstract:
We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex…
▽ More
We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the non-convex set of valid parameter values. Our algorithm generalizes mean field approaches (including naive and structured mean field approximations), which it includes as a limiting case.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Tightening MRF Relaxations with Planar Subproblems
Authors:
Julian Yarkony,
Ragib Morshed,
Alexander T. Ihler,
Charless C. Fowlkes
Abstract:
We describe a new technique for computing lower-bounds on the minimum energy configuration of a planar Markov Random Field (MRF). Our method successively adds large numbers of constraints and enforces consistency over binary projections of the original problem state space. These constraints are represented in terms of subproblems in a dual-decomposition framework that is optimized using subgradien…
▽ More
We describe a new technique for computing lower-bounds on the minimum energy configuration of a planar Markov Random Field (MRF). Our method successively adds large numbers of constraints and enforces consistency over binary projections of the original problem state space. These constraints are represented in terms of subproblems in a dual-decomposition framework that is optimized using subgradient techniques. The complete set of constraints we consider enforces cycle consistency over the original graph. In practice we find that the method converges quickly on most problems with the addition of a few subproblems and outperforms existing methods for some interesting classes of hard potentials.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
Variational Algorithms for Marginal MAP
Authors:
Qiang Liu,
Alexander T. Ihler
Abstract:
Marginal MAP problems are notoriously difficult tasks for graphical models. We derive a general variational framework for solving marginal MAP problems, in which we apply analogues of the Bethe, tree-reweighted, and mean field approximations. We then derive a "mixed" message passing algorithm and a convergent alternative using CCCP to solve the BP-type approximations. Theoretically, we give condit…
▽ More
Marginal MAP problems are notoriously difficult tasks for graphical models. We derive a general variational framework for solving marginal MAP problems, in which we apply analogues of the Bethe, tree-reweighted, and mean field approximations. We then derive a "mixed" message passing algorithm and a convergent alternative using CCCP to solve the BP-type approximations. Theoretically, we give conditions under which the decoded solution is a global or local optimum, and obtain novel upper bounds on solutions. Experimentally we demonstrate that our algorithms outperform related approaches. We also show that EM and variational EM comprise a special case of our framework.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
Planar Cycle Covering Graphs
Authors:
Julian Yarkony,
Alexander T. Ihler,
Charless C. Fowlkes
Abstract:
We describe a new variational lower-bound on the minimum energy configuration of a planar binary Markov Random Field (MRF). Our method is based on adding auxiliary nodes to every face of a planar embedding of the graph in order to capture the effect of unary potentials. A ground state of the resulting approximation can be computed efficiently by reduction to minimum-weight perfect matching. We sho…
▽ More
We describe a new variational lower-bound on the minimum energy configuration of a planar binary Markov Random Field (MRF). Our method is based on adding auxiliary nodes to every face of a planar embedding of the graph in order to capture the effect of unary potentials. A ground state of the resulting approximation can be computed efficiently by reduction to minimum-weight perfect matching. We show that optimization of variational parameters achieves the same lower-bound as dual-decomposition into the set of all cycles of the original graph. We demonstrate that our variational optimization converges quickly and provides high-quality solutions to hard combinatorial problems 10-100x faster than competing algorithms that optimize the same bound.
△ Less
Submitted 6 April, 2011;
originally announced April 2011.
-
Fault Identification via Non-parametric Belief Propagation
Authors:
Danny Bickson,
Dror Baron,
Alex T. Ihler,
Harel Avissar,
Danny Dolev
Abstract:
We consider the problem of identifying a pattern of faults from a set of noisy linear measurements. Unfortunately, maximum a posteriori probability estimation of the fault pattern is computationally intractable. To solve the fault identification problem, we propose a non-parametric belief propagation approach. We show empirically that our belief propagation solver is more accurate than recent stat…
▽ More
We consider the problem of identifying a pattern of faults from a set of noisy linear measurements. Unfortunately, maximum a posteriori probability estimation of the fault pattern is computationally intractable. To solve the fault identification problem, we propose a non-parametric belief propagation approach. We show empirically that our belief propagation solver is more accurate than recent state-of-the-art algorithms including interior point methods and semidefinite programming. Our superior performance is explained by the fact that we take into account both the binary nature of the individual faults and the sparsity of the fault pattern arising from their rarity.
△ Less
Submitted 1 February, 2011; v1 submitted 13 August, 2009;
originally announced August 2009.
-
A Low Density Lattice Decoder via Non-Parametric Belief Propagation
Authors:
Danny Bickson,
Alexander T. Ihler,
Danny Dolev
Abstract:
The recent work of Sommer, Feder and Shalvi presented a new family of codes called low density lattice codes (LDLC) that can be decoded efficiently and approach the capacity of the AWGN channel. A linear time iterative decoding scheme which is based on a message-passing formulation on a factor graph is given.
In the current work we report our theoretical findings regarding the relation between…
▽ More
The recent work of Sommer, Feder and Shalvi presented a new family of codes called low density lattice codes (LDLC) that can be decoded efficiently and approach the capacity of the AWGN channel. A linear time iterative decoding scheme which is based on a message-passing formulation on a factor graph is given.
In the current work we report our theoretical findings regarding the relation between the LDLC decoder and belief propagation. We show that the LDLC decoder is an instance of non-parametric belief propagation and further connect it to the Gaussian belief propagation algorithm. Our new results enable borrowing knowledge from the non-parametric and Gaussian belief propagation domains into the LDLC domain. Specifically, we give more general convergence conditions for convergence of the LDLC decoder (under the same assumptions of the original LDLC convergence analysis). We discuss how to extend the LDLC decoder from Latin square to full rank, non-square matrices. We propose an efficient construction of sparse generator matrix and its matching decoder. We report preliminary experimental results which show our decoder has comparable symbol to error rate compared to the original LDLC decoder.%
△ Less
Submitted 7 October, 2009; v1 submitted 21 January, 2009;
originally announced January 2009.