-
SKOLR: Structured Koopman Operator Linear RNN for Time-Series Forecasting
Authors:
Yitian Zhang,
Liheng Ma,
Antonios Valkanas,
Boris N. Oreshkin,
Mark Coates
Abstract:
Koopman operator theory provides a framework for nonlinear dynamical system analysis and time-series forecasting by mapping dynamics to a space of real-valued measurement functions, enabling a linear operator representation. Despite the advantage of linearity, the operator is generally infinite-dimensional. Therefore, the objective is to learn measurement functions that yield a tractable finite-di…
▽ More
Koopman operator theory provides a framework for nonlinear dynamical system analysis and time-series forecasting by mapping dynamics to a space of real-valued measurement functions, enabling a linear operator representation. Despite the advantage of linearity, the operator is generally infinite-dimensional. Therefore, the objective is to learn measurement functions that yield a tractable finite-dimensional Koopman operator approximation. In this work, we establish a connection between Koopman operator approximation and linear Recurrent Neural Networks (RNNs), which have recently demonstrated remarkable success in sequence modeling. We show that by considering an extended state consisting of lagged observations, we can establish an equivalence between a structured Koopman operator and linear RNN updates. Building on this connection, we present SKOLR, which integrates a learnable spectral decomposition of the input signal with a multilayer perceptron (MLP) as the measurement functions and implements a structured Koopman operator via a highly parallel linear RNN stack. Numerical experiments on various forecasting benchmarks and dynamical systems show that this streamlined, Koopman-theory-based design delivers exceptional performance.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Quantile Regression with Large Language Models for Price Prediction
Authors:
Nikhita Vedula,
Dushyanta Dhyani,
Laleh Jalali,
Boris Oreshkin,
Mohsen Bayati,
Shervin Malmasi
Abstract:
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods. We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced t…
▽ More
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods. We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced text understanding and uncertainty quantification are critical. We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions, improving upon traditional point estimates. Through extensive experiments across three diverse price prediction datasets, we demonstrate that a Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations, as measured by three established metrics each for prediction accuracy and distributional calibration. Our systematic comparison of LLM approaches, model architectures, training approaches, and data scaling reveals that Mistral-7B consistently outperforms encoder architectures, embedding-based methods, and few-shot learning methods. Our experiments also reveal the effectiveness of LLM-assisted label correction in achieving human-level accuracy without systematic bias. Our curated datasets are made available at https://github.com/vnik18/llm-price-quantile-reg/ to support future research.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Enhanced N-BEATS for Mid-Term Electricity Demand Forecasting
Authors:
Mateusz Kasprzyk,
Paweł Pełka,
Boris N. Oreshkin,
Grzegorz Dudek
Abstract:
This paper presents an enhanced N-BEATS model, N-BEATS*, for improved mid-term electricity load forecasting (MTLF). Building on the strengths of the original N-BEATS architecture, which excels in handling complex time series data without requiring preprocessing or domain-specific knowledge, N-BEATS* introduces two key modifications. (1) A novel loss function -- combining pinball loss based on MAPE…
▽ More
This paper presents an enhanced N-BEATS model, N-BEATS*, for improved mid-term electricity load forecasting (MTLF). Building on the strengths of the original N-BEATS architecture, which excels in handling complex time series data without requiring preprocessing or domain-specific knowledge, N-BEATS* introduces two key modifications. (1) A novel loss function -- combining pinball loss based on MAPE with normalized MSE, the new loss function allows for a more balanced approach by capturing both L1 and L2 loss terms. (2) A modified block architecture -- the internal structure of the N-BEATS blocks is adjusted by introducing a destandardization component to harmonize the processing of different time series, leading to more efficient and less complex forecasting tasks. Evaluated on real-world monthly electricity consumption data from 35 European countries, N-BEATS* demonstrates superior performance compared to its predecessor and other established forecasting methods, including statistical, machine learning, and hybrid models. N-BEATS* achieves the lowest MAPE and RMSE, while also exhibiting the lowest dispersion in forecast errors.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition
Authors:
Malcolm Wolff,
Kin G. Olivares,
Boris Oreshkin,
Sunny Ruan,
Sitan Yang,
Abhinav Katoch,
Shankar Ramasubramanian,
Youxin Zhang,
Michael W. Mahoney,
Dmitry Efimov,
Vincent Quenneville-Bélair
Abstract:
Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-bia…
▽ More
Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts. To tackle this challenge, we introduce a neural forecasting model called Split Peak Attention DEcomposition, SPADE. This model reduces the impact of PEs on subsequent forecasts by modeling forecasting as consisting of two separate tasks: one for PEs; and the other for the rest. Its architecture then uses masked convolution filters and a specialized Peak Attention module. We show SPADE's performance on a worldwide retail dataset with hundreds of millions of products. Our results reveal an overall PPE improvement of 4.5%, a 30% improvement for most affected forecasts after promotions and holidays, and an improvement in PE accuracy by 3.9%, relative to current production models.
△ Less
Submitted 21 January, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Dynamic layer selection in decoder-only transformers
Authors:
Theodore Glavas,
Joud Chataoui,
Florence Regol,
Wassim Jabbour,
Antonios Valkanas,
Boris N. Oreshkin,
Mark Coates
Abstract:
The vast size of Large Language Models (LLMs) has prompted a search to optimize inference. One effective approach is dynamic inference, which adapts the architecture to the sample-at-hand to reduce the overall computational cost. We empirically examine two common dynamic inference methods for natural language generation (NLG): layer skipping and early exiting. We find that a pre-trained decoder-on…
▽ More
The vast size of Large Language Models (LLMs) has prompted a search to optimize inference. One effective approach is dynamic inference, which adapts the architecture to the sample-at-hand to reduce the overall computational cost. We empirically examine two common dynamic inference methods for natural language generation (NLG): layer skipping and early exiting. We find that a pre-trained decoder-only model is significantly more robust to layer removal via layer skipping, as opposed to early exit. We demonstrate the difficulty of using hidden state information to adapt computation on a per-token basis for layer skipping. Finally, we show that dynamic computation allocation on a per-sequence basis holds promise for significant efficiency gains by constructing an oracle controller. Remarkably, we find that there exists an allocation which achieves equal performance to the full model using only 23.3% of its layers on average.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Online Posterior Sampling with a Diffusion Prior
Authors:
Branislav Kveton,
Boris Oreshkin,
Youngsuk Park,
Aniket Deshmukh,
Rui Song
Abstract:
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of appr…
▽ More
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse process, which are estimated in a closed form using the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
MODL: Multilearner Online Deep Learning
Authors:
Antonios Valkanas,
Boris N. Oreshkin,
Mark Coates
Abstract:
Online deep learning tackles the challenge of learning from data streams by balancing two competing goals: fast learning and deep learning. However, existing research primarily emphasizes deep learning solutions, which are more adept at handling the ``deep'' aspect than the ``fast'' aspect of online learning. In this work, we introduce an alternative paradigm through a hybrid multilearner approach…
▽ More
Online deep learning tackles the challenge of learning from data streams by balancing two competing goals: fast learning and deep learning. However, existing research primarily emphasizes deep learning solutions, which are more adept at handling the ``deep'' aspect than the ``fast'' aspect of online learning. In this work, we introduce an alternative paradigm through a hybrid multilearner approach. We begin by developing a fast online logistic regression learner, which operates without relying on backpropagation. It leverages closed-form recursive updates of model parameters, efficiently addressing the fast learning component of the online learning challenge. This approach is further integrated with a cascaded multilearner design, where shallow and deep learners are co-trained in a cooperative, synergistic manner to solve the online learning problem. We demonstrate that this approach achieves state-of-the-art performance on standard online learning datasets. We make our code available: https://github.com/AntonValk/MODL
△ Less
Submitted 20 March, 2025; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Any-Quantile Probabilistic Forecasting of Short-Term Electricity Demand
Authors:
Slawek Smyl,
Boris N. Oreshkin,
Paweł Pełka,
Grzegorz Dudek
Abstract:
Power systems operate under uncertainty originating from multiple factors that are impossible to account for deterministically. Distributional forecasting is used to control and mitigate risks associated with this uncertainty. Recent progress in deep learning has helped to significantly improve the accuracy of point forecasts, while accurate distributional forecasting still presents a significant…
▽ More
Power systems operate under uncertainty originating from multiple factors that are impossible to account for deterministically. Distributional forecasting is used to control and mitigate risks associated with this uncertainty. Recent progress in deep learning has helped to significantly improve the accuracy of point forecasts, while accurate distributional forecasting still presents a significant challenge. In this paper, we propose a novel general approach for distributional forecasting capable of predicting arbitrary quantiles. We show that our general approach can be seamlessly applied to two distinct neural architectures leading to the state-of-the-art distributional forecasting results in the context of short-term electricity demand forecasting task. We empirically validate our method on 35 hourly electricity demand time-series for European countries. Our code is available here: https://github.com/boreshkinai/any-quantile.
△ Less
Submitted 4 October, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
3D Human Pose and Shape Estimation via HybrIK-Transformer
Authors:
Boris N. Oreshkin
Abstract:
HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausib…
▽ More
HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausible twist and swing angles. In this paper we propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer, resulting in accuracy and computational efficiency improvement relative to the original HybrIK method. We demonstrate our results on commonly used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available https://github.com/boreshkinai/hybrik-transformer.
△ Less
Submitted 22 April, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
SMPL-IK: Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows
Authors:
Vikram Voleti,
Boris N. Oreshkin,
Florent Bocquelet,
Félix G. Harvey,
Louis-Simon Ménard,
Christopher Pal
Abstract:
Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We cal…
▽ More
Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, pose authoring can be made more flexible with SMPL-IK by allowing users to modify gender and body shape while posing a character. Additionally, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Finally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting
Authors:
Cristian Challu,
Kin G. Olivares,
Boris N. Oreshkin,
Federico Garza,
Max Mergenthaler-Canseco,
Artur Dubrawski
Abstract:
Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpol…
▽ More
Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We prove that the hierarchical interpolation technique can efficiently approximate arbitrarily long horizons in the presence of smoothness. Additionally, we conduct extensive large-scale dataset experiments from the long-horizon forecasting literature, demonstrating the advantages of our method over the state-of-the-art methods, where N-HiTS provides an average accuracy improvement of almost 20% over the latest Transformer architectures while reducing the computation time by an order of magnitude (50 times). Our code is available at bit.ly/3VA5DoT
△ Less
Submitted 29 November, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Motion Inbetweening via Deep $Δ$-Interpolator
Authors:
Boris N. Oreshkin,
Antonios Valkanas,
Félix G. Harvey,
Louis-Simon Ménard,
Florent Bocquelet,
Mark J. Coates
Abstract:
We show that the task of synthesizing human motion conditioned on a set of key frames can be solved more accurately and effectively if a deep learning based interpolator operates in the delta mode using the spherical linear interpolator as a baseline. We empirically demonstrate the strength of our approach on publicly available datasets achieving state-of-the-art performance. We further generalize…
▽ More
We show that the task of synthesizing human motion conditioned on a set of key frames can be solved more accurately and effectively if a deep learning based interpolator operates in the delta mode using the spherical linear interpolator as a baseline. We empirically demonstrate the strength of our approach on publicly available datasets achieving state-of-the-art performance. We further generalize these results by showing that the $Δ$-regime is viable with respect to the reference of the last known frame (also known as the zero-velocity model). This supports the more general conclusion that operating in the reference frame local to input frames is more accurate and robust than in the global (world) reference frame advocated in previous work. Our code is publicly available at https://github.com/boreshkinai/delta-interpolator.
△ Less
Submitted 16 August, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Neural forecasting at scale
Authors:
Philippe Chatigny,
Shengrui Wang,
Jean-Marc Patenaude,
Boris N. Oreshkin
Abstract:
We study the problem of efficiently scaling ensemble-based deep neural networks for multi-step time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global parallel variant of the N-BEATS model designed…
▽ More
We study the problem of efficiently scaling ensemble-based deep neural networks for multi-step time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global parallel variant of the N-BEATS model designed to allow simultaneous training of multiple univariate TS forecasting models. Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5, while keeping the same level of accuracy in all TS forecasting settings. We have performed multiple experiments detailing the various ways to train our model and have obtained results that demonstrate its capacity to generalize in various forecasting conditions and setups.
△ Less
Submitted 28 January, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
ProtoRes: Proto-Residual Network for Pose Authoring via Learned Inverse Kinematics
Authors:
Boris N. Oreshkin,
Florent Bocquelet,
Félix G. Harvey,
Bay Raitt,
Dominic Laflamme
Abstract:
Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual c…
▽ More
Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data, which will be released publicly along with model code.
△ Less
Submitted 16 August, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Adaptive filters for the moving target indicator system
Authors:
Boris N. Oreshkin
Abstract:
Adaptive algorithms belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. The contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used…
▽ More
Adaptive algorithms belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. The contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used to combat both ill conditioning of the original problem and contamination of the empirical covariance by the desired signal for the adaptive algorithms based on sample covariance matrix inversion. However, the optimum value of loading factor cannot be derived unless strong assumptions are made regarding the structure of covariance matrix and useful signal penetration model. Similarly, least mean square algorithm with linear constraint or without constraint, is also sensitive to the contamination of the learning sample with the target signal. We synthesize two approaches to improve the convergence of adaptive algorithms and protect them from the contamination of the learning sample with the signal from the target. The proposed approach is based on the maximization of empirical signal to interference plus noise ratio (SINR). Its effectiveness is demonstrated using simulated data.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Optimization of loading factor preventing target cancellation
Authors:
Boris N. Oreshkin,
Peter A. Bakulev
Abstract:
Adaptive algorithms based on sample matrix inversion belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. Sample matrix inversion problem is generally ill conditioned. Moreover, the contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adap…
▽ More
Adaptive algorithms based on sample matrix inversion belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. Sample matrix inversion problem is generally ill conditioned. Moreover, the contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used to combat both ill conditioning of the original problem and contamination of the empirical covariance by the desired signal. However, the optimum value of loading factor cannot be derived unless strong assumptions are made regarding the structure of covariance matrix and useful signal penetration model. In this paper an iterative algorithm for loading factor optimization based on the maximization of empirical signal to interference plus noise ratio (SINR) is proposed. The proposed solution does not rely on any assumptions regarding the structure of empirical covariance matrix and signal penetration model. The paper also presents simulation examples showing the effectiveness of the proposed solution.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Optimization over Random and Gradient Probabilistic Pixel Sampling for Fast, Robust Multi-Resolution Image Registration
Authors:
Boris N. Oreshkin,
Tal Arbel
Abstract:
This paper presents an approach to fast image registration through probabilistic pixel sampling. We propose a practical scheme to leverage the benefits of two state-of-the-art pixel sampling approaches: gradient magnitude based pixel sampling and uniformly random sampling. Our framework involves learning the optimal balance between the two sampling schemes off-line during training, based on a smal…
▽ More
This paper presents an approach to fast image registration through probabilistic pixel sampling. We propose a practical scheme to leverage the benefits of two state-of-the-art pixel sampling approaches: gradient magnitude based pixel sampling and uniformly random sampling. Our framework involves learning the optimal balance between the two sampling schemes off-line during training, based on a small training dataset, using particle swarm optimization. We then test the proposed sampling approach on 3D rigid registration against two state-of-the-art approaches based on the popular, publicly available, Vanderbilt RIRE dataset. Our results indicate that the proposed sampling approach yields much faster, accurate and robust registration results when compared against the state-of-the-art.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
Uncertainty driven probabilistic voxel selection for image registration
Authors:
Boris N. Oreshkin,
Tal Arbel
Abstract:
This paper presents a novel probabilistic voxel selection strategy for medical image registration in time-sensitive contexts, where the goal is aggressive voxel sampling (e.g. using less than 1% of the total number) while maintaining registration accuracy and low failure rate. We develop a Bayesian framework whereby, first, a voxel sampling probability field (VSPF) is built based on the uncertaint…
▽ More
This paper presents a novel probabilistic voxel selection strategy for medical image registration in time-sensitive contexts, where the goal is aggressive voxel sampling (e.g. using less than 1% of the total number) while maintaining registration accuracy and low failure rate. We develop a Bayesian framework whereby, first, a voxel sampling probability field (VSPF) is built based on the uncertainty on the transformation parameters. We then describe a practical, multi-scale registration algorithm, where, at each optimization iteration, different voxel subsets are sampled based on the VSPF. The approach maximizes accuracy without committing to a particular fixed subset of voxels. The probabilistic sampling scheme developed is shown to manage the tradeoff between the robustness of traditional random voxel selection (by permitting more exploration) and the accuracy of fixed voxel selection (by permitting a greater proportion of informative voxels).
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
N-BEATS neural network for mid-term electricity load forecasting
Authors:
Boris N. Oreshkin,
Grzegorz Dudek,
Paweł Pełka,
Ekaterina Turkina
Abstract:
This paper addresses the mid-term electricity load forecasting problem. Solving this problem is necessary for power system operation and planning as well as for negotiating forward contracts in deregulated energy markets. We show that our proposed deep neural network modeling approach based on the deep neural architecture is effective at solving the mid-term electricity load forecasting problem. P…
▽ More
This paper addresses the mid-term electricity load forecasting problem. Solving this problem is necessary for power system operation and planning as well as for negotiating forward contracts in deregulated energy markets. We show that our proposed deep neural network modeling approach based on the deep neural architecture is effective at solving the mid-term electricity load forecasting problem. Proposed neural network has high expressive power to solve non-linear stochastic forecasting problems with time series including trends, seasonality and significant random fluctuations. At the same time, it is simple to implement and train, it does not require signal preprocessing, and it is equipped with a forecast bias reduction mechanism. We compare our approach against ten baseline methods, including classical statistical methods, machine learning and hybrid approaches, on 35 monthly electricity demand time series for European countries. The empirical study shows that proposed neural network clearly outperforms all competitors in terms of both accuracy and forecast bias. Code is available here: https://github.com/boreshkinai/nbeats-midterm.
△ Less
Submitted 2 April, 2021; v1 submitted 24 September, 2020;
originally announced September 2020.
-
FC-GAGA: Fully Connected Gated Graph Architecture for Spatio-Temporal Traffic Forecasting
Authors:
Boris N. Oreshkin,
Arezou Amini,
Lucy Coyle,
Mark J. Coates
Abstract:
Forecasting of multivariate time-series is an important problem that has applications in traffic management, cellular network configuration, and quantitative finance. A special case of the problem arises when there is a graph available that captures the relationships between the time-series. In this paper we propose a novel learning architecture that achieves performance competitive with or better…
▽ More
Forecasting of multivariate time-series is an important problem that has applications in traffic management, cellular network configuration, and quantitative finance. A special case of the problem arises when there is a graph available that captures the relationships between the time-series. In this paper we propose a novel learning architecture that achieves performance competitive with or better than the best existing algorithms, without requiring knowledge of the graph. The key element of our proposed architecture is the learnable fully connected hard graph gating mechanism that enables the use of the state-of-the-art and highly computationally efficient fully connected time-series forecasting architecture in traffic forecasting applications. Experimental results for two public traffic network datasets illustrate the value of our approach, and ablation studies confirm the importance of each element of the architecture. The code is available here: https://github.com/boreshkinai/fc-gaga.
△ Less
Submitted 14 December, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Meta-learning framework with applications to zero-shot time-series forecasting
Authors:
Boris N. Oreshkin,
Dmitri Carpov,
Nicolas Chapados,
Yoshua Bengio
Abstract:
Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-lear…
▽ More
Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models.
△ Less
Submitted 14 December, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings
Authors:
Mennatullah Siam,
Naren Doraiswamy,
Boris N. Oreshkin,
Hengshuai Yao,
Martin Jagersand
Abstract:
Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot obje…
▽ More
Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embedding. Our model using image-level labels achieves 4.8% improvement over previously proposed image-level few-shot object segmentation. It also outperforms state-of-the-art methods that use weak bounding box supervision on PASCAL-5i. Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels. We further propose a novel setup, Temporal Object Segmentation for Few-shot Learning (TOSFL) for videos. TOSFL can be used on a variety of public video data such as Youtube-VOS, as demonstrated in both instance-level and category-level TOSFL experiments.
△ Less
Submitted 17 May, 2020; v1 submitted 26 January, 2020;
originally announced January 2020.
-
One-Shot Weakly Supervised Video Object Segmentation
Authors:
Mennatullah Siam,
Naren Doraiswamy,
Boris N. Oreshkin,
Hengshuai Yao,
Martin Jagersand
Abstract:
Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We pr…
▽ More
Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embeddings. It enables our model to achieve 5.1% improvement over previously proposed image-level few-shot object segmentation. Our method compares relatively close to the state of the art methods that use strong supervision, while ours use the least possible supervision. We further propose a novel setup for few-shot weakly supervised video object segmentation(VOS) that relies on image-level labels for the first frame. The proposed setup uses weak annotation unlike semi-supervised VOS setting that utilizes strongly labelled segmentation masks. The setup evaluates the effectiveness of generalizing to novel classes in the VOS setting. The setup splits the VOS data into multiple folds with different categories per fold. It provides a potential setup to evaluate how few-shot object segmentation methods can benefit from additional object poses, or object interactions that is not available in static frames as in PASCAL-5i benchmark.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
DECoVaC: Design of Experiments with Controlled Variability Components
Authors:
Thomas Boquet,
Laure Delisle,
Denis Kochetkov,
Nathan Schucher,
Parmida Atighehchian,
Boris Oreshkin,
Julien Cornebise
Abstract:
Reproducible research in Machine Learning has seen a salutary abundance of progress lately: workflows, transparency, and statistical analysis of validation and test performance. We build on these efforts and take them further. We offer a principled experimental design methodology, based on linear mixed models, to study and separate the effects of multiple factors of variation in machine learning e…
▽ More
Reproducible research in Machine Learning has seen a salutary abundance of progress lately: workflows, transparency, and statistical analysis of validation and test performance. We build on these efforts and take them further. We offer a principled experimental design methodology, based on linear mixed models, to study and separate the effects of multiple factors of variation in machine learning experiments. This approach allows to account for the effects of architecture, optimizer, hyper-parameters, intentional randomization, as well as unintended lack of determinism across reruns. We illustrate that methodology by analyzing Matching Networks, Prototypical Networks and TADAM on the miniImagenet dataset.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
CLAREL: Classification via retrieval loss for zero-shot learning
Authors:
Boris N. Oreshkin,
Negar Rostamzadeh,
Pedro O. Pinheiro,
Christopher Pal
Abstract:
We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic just…
▽ More
We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic justification for a metric rescaling approach that solves a very common problem in the generalized zero-shot learning setting, i.e., classifying test images from unseen classes as one of the classes seen during training. We evaluate our approach on two fine-grained zero-shot learning datasets: CUB and FLOWERS. We find that on the generalized zero-shot classification task CLAREL consistently outperforms the existing approaches on both datasets.
△ Less
Submitted 5 April, 2020; v1 submitted 31 May, 2019;
originally announced June 2019.
-
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
Authors:
Boris N. Oreshkin,
Dmitri Carpov,
Nicolas Chapados,
Yoshua Bengio
Abstract:
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the…
▽ More
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.
△ Less
Submitted 20 February, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Adaptive Masked Proxies for Few-Shot Segmentation
Authors:
Mennatullah Siam,
Boris Oreshkin,
Martin Jagersand
Abstract:
Deep learning has thrived by training on large-scale datasets. However, in robotics applications sample efficiency is critical. We propose a novel adaptive masked proxies method that constructs the final segmentation layer weights from few labelled samples. It utilizes multi-resolution average pooling on base embeddings masked with the label to act as a positive proxy for the new class, while fusi…
▽ More
Deep learning has thrived by training on large-scale datasets. However, in robotics applications sample efficiency is critical. We propose a novel adaptive masked proxies method that constructs the final segmentation layer weights from few labelled samples. It utilizes multi-resolution average pooling on base embeddings masked with the label to act as a positive proxy for the new class, while fusing it with the previously learned class signatures. Our method is evaluated on PASCAL-$5^i$ dataset and outperforms the state-of-the-art in the few-shot semantic segmentation. Unlike previous methods, our approach does not require a second branch to estimate parameters or prototypes, which enables it to be used with 2-stream motion and appearance based segmentation networks. We further propose a novel setup for evaluating continual learning of object segmentation which we name incremental PASCAL (iPASCAL) where our method outperforms the baseline method. Our code is publicly available at https://github.com/MSiam/AdaptiveMaskedProxies.
△ Less
Submitted 14 October, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Adaptive Cross-Modal Few-Shot Learning
Authors:
Chen Xing,
Negar Rostamzadeh,
Boris N. Oreshkin,
Pedro O. Pinheiro
Abstract:
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for o…
▽ More
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on these two intuitions, we propose a mechanism that can adaptively combine information from both modalities according to new image categories to be learned. Through a series of experiments, we show that by this adaptive combination of the two modalities, our model outperforms current uni-modality few-shot learning methods and modality-alignment methods by a large margin on all benchmarks and few-shot scenarios tested. Experiments also show that our model can effectively adjust its focus on the two modalities. The improvement in performance is particularly large when the number of shots is very small.
△ Less
Submitted 17 February, 2020; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Uncertainty in Multitask Transfer Learning
Authors:
Alexandre Lacoste,
Boris Oreshkin,
Wonchang Chung,
Thomas Boquet,
Negar Rostamzadeh,
David Krueger
Abstract:
Using variational Bayes neural networks, we develop an algorithm capable of accumulating knowledge into a prior from multiple different tasks. The result is a rich and meaningful prior capable of few-shot learning on new tasks. The posterior can go beyond the mean field approximation and yields good uncertainty on the performed experiments. Analysis on toy tasks shows that it can learn from signif…
▽ More
Using variational Bayes neural networks, we develop an algorithm capable of accumulating knowledge into a prior from multiple different tasks. The result is a rich and meaningful prior capable of few-shot learning on new tasks. The posterior can go beyond the mean field approximation and yields good uncertainty on the performed experiments. Analysis on toy tasks shows that it can learn from significantly different tasks while finding similarities among them. Experiments of Mini-Imagenet yields the new state of the art with 74.5% accuracy on 5 shot learning. Finally, we provide experiments showing that other existing methods can fail to perform well in different benchmarks.
△ Less
Submitted 6 July, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
TADAM: Task dependent adaptive metric for improved few-shot learning
Authors:
Boris N. Oreshkin,
Pau Rodriguez,
Alexandre Lacoste
Abstract:
Few-shot learning has become essential for producing models that generalize from few examples. In this work, we identify that metric scaling and metric task conditioning are important to improve the performance of few-shot algorithms. Our analysis reveals that simple metric scaling completely changes the nature of few-shot algorithm parameter updates. Metric scaling provides improvements up to 14%…
▽ More
Few-shot learning has become essential for producing models that generalize from few examples. In this work, we identify that metric scaling and metric task conditioning are important to improve the performance of few-shot algorithms. Our analysis reveals that simple metric scaling completely changes the nature of few-shot algorithm parameter updates. Metric scaling provides improvements up to 14% in accuracy for certain metrics on the mini-Imagenet 5-way 5-shot classification task. We further propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space. Moreover, we propose and empirically test a practical end-to-end optimization procedure based on auxiliary task co-training to learn a task-dependent metric space. The resulting few-shot learning model based on the task-dependent scaled metric achieves state of the art on mini-Imagenet. We confirm these results on another few-shot dataset that we introduce in this paper based on CIFAR100. Our code is publicly available at https://github.com/ElementAI/TADAM.
△ Less
Submitted 25 January, 2019; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Deep Prior
Authors:
Alexandre Lacoste,
Thomas Boquet,
Negar Rostamzadeh,
Boris Oreshkin,
Wonchang Chung,
David Krueger
Abstract:
The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided…
▽ More
The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided. Furthermore, this learned prior allows the model to extrapolate correctly far from a given task's training data on a meta-dataset of periodic signals.
△ Less
Submitted 15 December, 2017; v1 submitted 13 December, 2017;
originally announced December 2017.
-
Efficient delay-tolerant particle filtering
Authors:
Boris N. Oreshkin,
Xuan Liu,
Mark J. Coates
Abstract:
This paper proposes a novel framework for delay-tolerant particle filtering that is computationally efficient and has limited memory requirements. Within this framework the informativeness of a delayed (out-of-sequence) measurement (OOSM) is estimated using a lightweight procedure and uninformative measurements are immediately discarded. The framework requires the identification of a threshold tha…
▽ More
This paper proposes a novel framework for delay-tolerant particle filtering that is computationally efficient and has limited memory requirements. Within this framework the informativeness of a delayed (out-of-sequence) measurement (OOSM) is estimated using a lightweight procedure and uninformative measurements are immediately discarded. The framework requires the identification of a threshold that separates informative from uninformative; this threshold selection task is formulated as a constrained optimization problem, where the goal is to minimize tracking error whilst controlling the computational requirements. We develop an algorithm that provides an approximate solution for the optimization problem. Simulation experiments provide an example where the proposed framework processes less than 40% of all OOSMs with only a small reduction in tracking accuracy.
△ Less
Submitted 22 September, 2010;
originally announced September 2010.
-
Greedy Gossip with Eavesdropping
Authors:
Deniz Ustebay,
Boris Oreshkin,
Mark Coates,
Michael Rabbat
Abstract:
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, go…
▽ More
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.
△ Less
Submitted 9 September, 2009;
originally announced September 2009.
-
Optimization and Analysis of Distributed Averaging with Short Node Memory
Authors:
Boris N. Oreshkin,
Mark J. Coates,
Michael G. Rabbat
Abstract:
In this paper, we demonstrate, both theoretically and by numerical examples, that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's two previous values (i.e., two memory taps), and our update rule computes a combination of t…
▽ More
In this paper, we demonstrate, both theoretically and by numerical examples, that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's two previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighbouring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and carry out a theoretical analysis of the improvement in convergence rate that can be obtained using this acceleration methodology. For a chain topology on n nodes, this leads to a factor of n improvement over the one-step algorithm, and for a two-dimensional grid, our approach achieves a factor of n^1/2 improvement, in terms of the number of iterations required to reach a prescribed level of accuracy.
△ Less
Submitted 5 February, 2010; v1 submitted 20 March, 2009;
originally announced March 2009.