-
CMIP X-MOS: Improving Climate Models with Extreme Model Output Statistics
Authors:
Vsevolod Morozov,
Artem Galliamov,
Aleksandr Lukashevich,
Antonina Kurdukova,
Yury Maximov
Abstract:
Climate models are essential for assessing the impact of greenhouse gas emissions on our changing climate and the resulting increase in the frequency and severity of natural disasters. Despite the widespread acceptance of climate models produced by the Coupled Model Intercomparison Project (CMIP), they still face challenges in accurately predicting climate extremes, which pose most significant thr…
▽ More
Climate models are essential for assessing the impact of greenhouse gas emissions on our changing climate and the resulting increase in the frequency and severity of natural disasters. Despite the widespread acceptance of climate models produced by the Coupled Model Intercomparison Project (CMIP), they still face challenges in accurately predicting climate extremes, which pose most significant threats to both people and the environment. To address this limitation and improve predictions of natural disaster risks, we introduce Extreme Model Output Statistics (X-MOS). This approach utilizes deep regression techniques to precisely map CMIP model outputs to real measurements obtained from weather stations, which results in a more accurate analysis of the XXI climate extremes. In contrast to previous research, our study places a strong emphasis on enhancing the estimation of the tails of future climate parameter distributions. The latter supports decision-makers, enabling them to better assess climate-related risks across the globe.
△ Less
Submitted 24 October, 2023;
originally announced November 2023.
-
A-Priori Reduction of Scenario Approximation for Automated Generation Control in High-Voltage Power Grids with Renewable Energy
Authors:
Aleksander Lukashevich,
Aleksander Bulkin,
Yury Maximov
Abstract:
Renewable energy sources (RES) are increasingly integrated into power systems to support the United Nations' Sustainable Development Goals of decarbonization and energy security. However, their low inertia and high uncertainty pose challenges to grid stability and increase the risk of blackouts. Stochastic chance-constrained optimization, particularly data-driven methods, offers solutions but can…
▽ More
Renewable energy sources (RES) are increasingly integrated into power systems to support the United Nations' Sustainable Development Goals of decarbonization and energy security. However, their low inertia and high uncertainty pose challenges to grid stability and increase the risk of blackouts. Stochastic chance-constrained optimization, particularly data-driven methods, offers solutions but can be time-consuming, especially when handling multiple system snapshots. This paper addresses a dynamic joint chance-constrained Direct Current Optimal Power Flow (DC-OPF) problem with Automated Generation Control (AGC) to facilitate cost-effective power generation while ensuring that balance and security constraints are met. We propose an approach for a data-driven approximation that includes a priori sample reduction, maintaining solution reliability while reducing the size of the data-driven approximation. Both theoretical analysis and empirical results demonstrate the superiority of this approach in handling generation uncertainty, requiring up to twice less data while preserving solution reliability.
△ Less
Submitted 5 October, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
GP CC-OPF: Gaussian Process based optimization tool for Chance-Constrained Optimal Power Flow
Authors:
Mile Mitrovic,
Ognjen Kundacina,
Aleksandr Lukashevich,
Petr Vorobev,
Vladimir Terzija,
Yury Maximov,
Deepjyoti Deka
Abstract:
The Gaussian Process (GP) based Chance-Constrained Optimal Power Flow (CC-OPF) is an open-source Python code developed for solving economic dispatch (ED) problem in modern power grids. In recent years, integrating a significant amount of renewables into a power grid causes high fluctuations and thus brings a lot of uncertainty to power grid operations. This fact makes the conventional model-based…
▽ More
The Gaussian Process (GP) based Chance-Constrained Optimal Power Flow (CC-OPF) is an open-source Python code developed for solving economic dispatch (ED) problem in modern power grids. In recent years, integrating a significant amount of renewables into a power grid causes high fluctuations and thus brings a lot of uncertainty to power grid operations. This fact makes the conventional model-based CC-OPF problem non-convex and computationally complex to solve. The developed tool presents a novel data-driven approach based on the GP regression model for solving the CC-OPF problem with a trade-off between complexity and accuracy. The proposed approach and developed software can help system operators to effectively perform ED optimization in the presence of large uncertainties in the power grid.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Data-Driven Chance Constrained AC-OPF using Hybrid Sparse Gaussian Processes
Authors:
Mile Mitrovic,
Aleksandr Lukashevich,
Petr Vorobev,
Vladimir Terzija,
Yury Maximov,
Deepjyoti Deka
Abstract:
The alternating current (AC) chance-constrained optimal power flow (CC-OPF) problem addresses the economic efficiency of electricity generation and delivery under generation uncertainty. The latter is intrinsic to modern power grids because of the high amount of renewables. Despite its academic success, the AC CC-OPF problem is highly nonlinear and computationally demanding, which limits its pract…
▽ More
The alternating current (AC) chance-constrained optimal power flow (CC-OPF) problem addresses the economic efficiency of electricity generation and delivery under generation uncertainty. The latter is intrinsic to modern power grids because of the high amount of renewables. Despite its academic success, the AC CC-OPF problem is highly nonlinear and computationally demanding, which limits its practical impact. For improving the AC-OPF problem complexity/accuracy trade-off, the paper proposes a fast data-driven setup that uses the sparse and hybrid Gaussian processes (GP) framework to model the power flow equations with input uncertainty. We advocate the efficiency of the proposed approach by a numerical study over multiple IEEE test cases showing up to two times faster and more accurate solutions compared to the state-of-the-art methods.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Data-Driven Stochastic AC-OPF using Gaussian Processes
Authors:
Mile Mitrovic,
Aleksandr Lukashevich,
Petr Vorobev,
Vladimir Terzija,
Semen Budenny,
Yury Maximov,
Deepjyoti Deka
Abstract:
In recent years, electricity generation has been responsible for more than a quarter of the greenhouse gas emissions in the US. Integrating a significant amount of renewables into a power grid is probably the most accessible way to reduce carbon emissions from power grids and slow down climate change. Unfortunately, the most accessible renewable power sources, such as wind and solar, are highly fl…
▽ More
In recent years, electricity generation has been responsible for more than a quarter of the greenhouse gas emissions in the US. Integrating a significant amount of renewables into a power grid is probably the most accessible way to reduce carbon emissions from power grids and slow down climate change. Unfortunately, the most accessible renewable power sources, such as wind and solar, are highly fluctuating and thus bring a lot of uncertainty to power grid operations and challenge existing optimization and control policies. The chance-constrained alternating current (AC) optimal power flow (OPF) framework finds the minimum cost generation dispatch maintaining the power grid operations within security limits with a prescribed probability. Unfortunately, the AC-OPF problem's chance-constrained extension is non-convex, computationally challenging, and requires knowledge of system parameters and additional assumptions on the behavior of renewable distribution. Known linear and convex approximations to the above problems, though tractable, are too conservative for operational practice and do not consider uncertainty in system parameters. This paper presents an alternative data-driven approach based on Gaussian process (GP) regression to close this gap. The GP approach learns a simple yet non-convex data-driven approximation to the AC power flow equations that can incorporate uncertainty inputs. The latter is then used to determine the solution of CC-OPF efficiently, by accounting for both input and parameter uncertainty. The practical efficiency of the proposed approach using different approximations for GP-uncertainty propagation is illustrated over numerous IEEE test cases.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Learning over No-Preferred and Preferred Sequence of Items for Robust Recommendation (Extended Abstract)
Authors:
Aleksandra Burashnikova,
Yury Maximov,
Marianne Clausel,
Charlotte Laclau,
Franck Iutzeler,
Massih-Reza Amini
Abstract:
This paper is an extended version of [Burashnikova et al., 2021, arXiv: 2012.06910], where we proposed a theoretically supported sequential strategy for training a large-scale Recommender System (RS) over implicit feedback, mainly in the form of clicks. The proposed approach consists in minimizing pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items…
▽ More
This paper is an extended version of [Burashnikova et al., 2021, arXiv: 2012.06910], where we proposed a theoretically supported sequential strategy for training a large-scale Recommender System (RS) over implicit feedback, mainly in the form of clicks. The proposed approach consists in minimizing pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach. To prevent updating the parameters for an abnormally high number of clicks over some targeted items (mainly due to bots), we introduce an upper and a lower threshold on the number of updates for each user. These thresholds are estimated over the distribution of the number of blocks in the training set. They affect the decision of RS by shifting the distribution of items that are shown to the users. Furthermore, we provide a convergence analysis of both algorithms and demonstrate their practical efficiency over six large-scale collections with respect to various ranking measures.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Importance sampling approach to chance-constrained DC optimal power flow
Authors:
Aleksander Lukashevich,
Vyacheslav Gorchakov,
Petr Vorobev,
Deepjyoti Deka,
Yury Maximov
Abstract:
Despite significant economic and ecological effects, a higher level of renewable energy generation leads to increased uncertainty and variability in power injections, thus compromising grid reliability. In order to improve power grid security, we investigate a joint chance-constrained (CC) direct current (DC) optimal power flow (OPF) problem. The problem aims to find economically optimal power gen…
▽ More
Despite significant economic and ecological effects, a higher level of renewable energy generation leads to increased uncertainty and variability in power injections, thus compromising grid reliability. In order to improve power grid security, we investigate a joint chance-constrained (CC) direct current (DC) optimal power flow (OPF) problem. The problem aims to find economically optimal power generation while guaranteeing that all power generation, line flows, and voltages simultaneously remain within their bounds with a pre-defined probability. Unfortunately, the problem is computationally intractable even if the distribution of renewables fluctuations is specified. Moreover, existing approximate solutions to the joint CC OPF problem are overly conservative, and therefore have less value for the operational practice. This paper proposes an importance sampling approach to the CC DC OPF problem, which yields better complexity and accuracy than current state-of-the-art methods. The algorithm efficiently reduces the number of scenarios by generating and using only the most important of them, thus enabling real-time solutions for test cases with up to several hundred buses.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Tractable Minor-free Generalization of Planar Zero-field Ising Models
Authors:
Valerii Likhosherstov,
Yury Maximov,
Michael Chertkov
Abstract:
We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components and subsets of at most three vertices into a tree. The polynomial-time algorithm of the dynamic programming type for solving exact inference (computing partition function) and exact sampling (generating i.i.d. samples) consists in a sequential ap…
▽ More
We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components and subsets of at most three vertices into a tree. The polynomial-time algorithm of the dynamic programming type for solving exact inference (computing partition function) and exact sampling (generating i.i.d. samples) consists in a sequential application of an efficient (for planar) or brute-force (for $O(1)$-sized) inference and sampling to the components as a black box. To illustrate the utility of the new family of tractable graphical models, we first build a polynomial algorithm for inference and sampling of zero-field Ising models over $K_{3,3}$-minor-free topologies and over $K_{5}$-minor-free topologies -- both are extensions of the planar zero-field Ising models -- which are neither genus - nor treewidth-bounded. Second, we demonstrate empirically an improvement in the approximation quality of the NP-hard problem of inference over the square-grid Ising model in a node-dependent non-zero "magnetic" field.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
A New Family of Tractable Ising Models
Authors:
Valerii Likhosherstov,
Yury Maximov,
Michael Chertkov
Abstract:
We present a new family of zero-field Ising models over N binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components along with subsets of at most three vertices into a tree. The polynomial time algorithm of the dynamic programming type for solving exact inference (partition function computation) and sampling consists of a sequential application of an efficient (…
▽ More
We present a new family of zero-field Ising models over N binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components along with subsets of at most three vertices into a tree. The polynomial time algorithm of the dynamic programming type for solving exact inference (partition function computation) and sampling consists of a sequential application of an efficient (for planar) or brute-force (for $O(1)$-sized) inference and sampling to the components as a black box. To illustrate the utility of the new family of tractable graphical models, we first build an $O(N^{3/2})$ algorithm for inference and sampling of the K5-minor-free zero-field Ising models - an extension of the planar zero-field Ising models - which is neither genus- nor treewidth-bounded. Second, we demonstrate empirically an improvement in the approximation quality of the NP-hard problem of the square-grid Ising model (with non-zero field) inference.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Sequential Learning over Implicit Feedback for Robust Large-Scale Recommender Systems
Authors:
Alexandra Burashnikova,
Yury Maximov,
Massih-Reza Amini
Abstract:
In this paper, we propose a robust sequential learning strategy for training large-scale Recommender Systems (RS) over implicit feedback mainly in the form of clicks. Our approach relies on the minimization of a pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. Parameter updates are discarded if for a give…
▽ More
In this paper, we propose a robust sequential learning strategy for training large-scale Recommender Systems (RS) over implicit feedback mainly in the form of clicks. Our approach relies on the minimization of a pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. Parameter updates are discarded if for a given user the number of sequential blocks is below or above some given thresholds estimated over the distribution of the number of blocks in the training set. This is to prevent from an abnormal number of clicks over some targeted items, mainly due to bots; or very few user interactions. Both scenarios affect the decision of RS and imply a shift over the distribution of items that are shown to the users. We provide a theoretical analysis showing that in the case where the ranking loss is convex, the deviation between the loss with respect to the sequence of weights found by the proposed algorithm and its minimum is bounded. Furthermore, experimental results on five large-scale collections demonstrate the efficiency of the proposed algorithm with respect to the state-of-the-art approaches, both regarding different ranking measures and computation time.
△ Less
Submitted 20 February, 2019;
originally announced February 2019.
-
Learning a Generator Model from Terminal Bus Data
Authors:
Nikolay Stulov,
Dejan J Sobajic,
Yury Maximov,
Deepjyoti Deka,
Michael Chertkov
Abstract:
In this work we investigate approaches to reconstruct generator models from measurements available at the generator terminal bus using machine learning (ML) techniques. The goal is to develop an emulator which is trained online and is capable of fast predictive computations. The training is illustrated on synthetic data generated based on available open-source dynamical generator model. Two ML tec…
▽ More
In this work we investigate approaches to reconstruct generator models from measurements available at the generator terminal bus using machine learning (ML) techniques. The goal is to develop an emulator which is trained online and is capable of fast predictive computations. The training is illustrated on synthetic data generated based on available open-source dynamical generator model. Two ML techniques were developed and tested: (a) standard vector auto-regressive (VAR) model; and (b) novel customized long short-term memory (LSTM) deep learning model. Trade-offs in reconstruction ability between computationally light but linear AR model and powerful but computationally demanding LSTM model are established and analyzed.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Inference and Sampling of $K_{33}$-free Ising Models
Authors:
Valerii Likhosherstov,
Yury Maximov,
Michael Chertkov
Abstract:
We call an Ising model tractable when it is possible to compute its partition function value (statistical inference) in polynomial time. The tractability also implies an ability to sample configurations of this model in polynomial time. The notion of tractability extends the basic case of planar zero-field Ising models. Our starting point is to describe algorithms for the basic case computing part…
▽ More
We call an Ising model tractable when it is possible to compute its partition function value (statistical inference) in polynomial time. The tractability also implies an ability to sample configurations of this model in polynomial time. The notion of tractability extends the basic case of planar zero-field Ising models. Our starting point is to describe algorithms for the basic case computing partition function and sampling efficiently. To derive the algorithms, we use an equivalent linear transition to perfect matching counting and sampling on an expanded dual graph. Then, we extend our tractable inference and sampling algorithms to models, whose triconnected components are either planar or graphs of $O(1)$ size. In particular, it results in a polynomial-time inference and sampling algorithms for $K_{33}$ (minor) free topologies of zero-field Ising models - a generalization of planar graphs with a potentially unbounded genus.
△ Less
Submitted 21 May, 2019; v1 submitted 22 December, 2018;
originally announced December 2018.
-
Gauges, Loops, and Polynomials for Partition Functions of Graphical Models
Authors:
Michael Chertkov,
Vladimir Chernyak,
Yury Maximov
Abstract:
Graphical models represent multivariate and generally not normalized probability distributions. Computing the normalization factor, called the partition function, is the main inference challenge relevant to multiple statistical and optimization applications. The problem is of an exponential complexity with respect to the number of variables. In this manuscript, aimed at approximating the PF, we co…
▽ More
Graphical models represent multivariate and generally not normalized probability distributions. Computing the normalization factor, called the partition function, is the main inference challenge relevant to multiple statistical and optimization applications. The problem is of an exponential complexity with respect to the number of variables. In this manuscript, aimed at approximating the PF, we consider Multi-Graph Models where binary variables and multivariable factors are associated with edges and nodes, respectively, of an undirected multi-graph. We suggest a new methodology for analysis and computations that combines the Gauge Function technique with the technique from the field of real stable polynomials. We show that the Gauge Function has a natural polynomial representation in terms of gauges/variables associated with edges of the multi-graph. Moreover, it can be used to recover the Partition Function through a sequence of transformations allowing appealing algebraic and graphical interpretations. Algebraically, one step in the sequence consists in application of a differential operator over gauges associated with an edge. Graphically, the sequence is interpreted as a repetitive elimination of edges resulting in a sequence of models on decreasing in size graphs with the same Partition Function. Even though complexity of computing factors in the sequence models grow exponentially with the number of eliminated edges, polynomials associated with the new factors remain bi-stable if the original factors have this property. Moreover, we show that Belief Propagation estimations in the sequence do not decrease, each low-bounding the Partition Function.
△ Less
Submitted 28 August, 2020; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Belief Propagation Min-Sum Algorithm for Generalized Min-Cost Network Flow
Authors:
Andrii Riazanov,
Yury Maximov,
Michael Chertkov
Abstract:
Belief Propagation algorithms are instruments used broadly to solve graphical model optimization and statistical inference problems. In the general case of a loopy Graphical Model, Belief Propagation is a heuristic which is quite successful in practice, even though its empirical success, typically, lacks theoretical guarantees. This paper extends the short list of special cases where correctness a…
▽ More
Belief Propagation algorithms are instruments used broadly to solve graphical model optimization and statistical inference problems. In the general case of a loopy Graphical Model, Belief Propagation is a heuristic which is quite successful in practice, even though its empirical success, typically, lacks theoretical guarantees. This paper extends the short list of special cases where correctness and/or convergence of a Belief Propagation algorithm is proven. We generalize formulation of Min-Sum Network Flow problem by relaxing the flow conservation (balance) constraints and then proving that the Belief Propagation algorithm converges to the exact result.
△ Less
Submitted 12 July, 2018; v1 submitted 20 October, 2017;
originally announced October 2017.
-
Importance sampling the union of rare events with an application to power systems analysis
Authors:
Art B. Owen,
Yury Maximov,
Michael Chertkov
Abstract:
We consider importance sampling to estimate the probability $μ$ of a union of $J$ rare events $H_j$ defined by a random variable $\boldsymbol{x}$. The sampler we study has been used in spatial statistics, genomics and combinatorics going back at least to Karp and Luby (1983). It works by sampling one event at random, then sampling $\boldsymbol{x}$ conditionally on that event happening and it const…
▽ More
We consider importance sampling to estimate the probability $μ$ of a union of $J$ rare events $H_j$ defined by a random variable $\boldsymbol{x}$. The sampler we study has been used in spatial statistics, genomics and combinatorics going back at least to Karp and Luby (1983). It works by sampling one event at random, then sampling $\boldsymbol{x}$ conditionally on that event happening and it constructs an unbiased estimate of $μ$ by multiplying an inverse moment of the number of occuring events by the union bound. We prove some variance bounds for this sampler. For a sample size of $n$, it has a variance no larger than $μ(\barμ-μ)/n$ where $\barμ$ is the union bound. It also has a coefficient of variation no larger than $\sqrt{(J+J^{-1}-2)/(4n)}$ regardless of the overlap pattern among the $J$ events. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the $J$ rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by $5772$ constraints in $326$ dimensions, with probability below $10^{-22}$, are estimated with a coefficient of variation of about $0.0024$ with only $n=10{,}000$ sample values.
△ Less
Submitted 18 December, 2018; v1 submitted 18 October, 2017;
originally announced October 2017.
-
Representation Learning and Pairwise Ranking for Implicit Feedback in Recommendation Systems
Authors:
Sumit Sidana,
Mikhail Trofimov,
Oleg Horodnitskii,
Charlotte Laclau,
Yury Maximov,
Massih-Reza Amini
Abstract:
In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal num…
▽ More
In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. We further derive a Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items. The learning objective is based on three scenarios of ranking losses that control the ability of the model to maintain the ordering over the items induced from the users' preferences, as well as, the capacity of the dot-product defined in the learned embedded space to produce the ordering. The proposed model is by nature suitable for implicit feedback and involves the estimation of only very few parameters. Through extensive experiments on several real-world benchmarks on implicit data, we show the interest of learning the preference and the embedding simultaneously when compared to learning those separately. We also demonstrate that our approach is very competitive with the best state-of-the-art collaborative filtering techniques proposed for implicit feedback.
△ Less
Submitted 12 July, 2018; v1 submitted 28 April, 2017;
originally announced May 2017.
-
Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification
Authors:
Bikash Joshi,
Massih-Reza Amini,
Ioannis Partalas,
Franck Iutzeler,
Yury Maximov
Abstract:
We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributi…
▽ More
We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.
△ Less
Submitted 14 September, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Rademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm
Authors:
Yury Maximov,
Massih-Reza Amini,
Zaid Harchaoui
Abstract:
We propose Rademacher complexity bounds for multiclass classifiers trained with a two-step semi-supervised model. In the first step, the algorithm partitions the partially labeled data and then identifies dense clusters containing $κ$ predominant classes using the labeled training examples such that the proportion of their non-predominant classes is below a fixed threshold. In the second step, a c…
▽ More
We propose Rademacher complexity bounds for multiclass classifiers trained with a two-step semi-supervised model. In the first step, the algorithm partitions the partially labeled data and then identifies dense clusters containing $κ$ predominant classes using the labeled training examples such that the proportion of their non-predominant classes is below a fixed threshold. In the second step, a classifier is trained by minimizing a margin empirical loss over the labeled training set and a penalization term measuring the disability of the learner to predict the $κ$ predominant classes of the identified clusters. The resulting data-dependent generalization error bound involves the margin distribution of the classifier, the stability of the clustering technique used in the first step and Rademacher complexity terms corresponding to partially labeled training data. Our theoretical result exhibit convergence rates extending those proposed in the literature for the binary case, and experimental results on different multiclass classification problems show empirical evidence that supports the theory.
△ Less
Submitted 25 January, 2018; v1 submitted 2 July, 2016;
originally announced July 2016.
-
Tight Risk Bounds for Multi-Class Margin Classifiers
Authors:
Yury Maximov,
Daria Reshetova
Abstract:
We consider a problem of risk estimation for large-margin multi-class classifiers. We propose a novel risk bound for the multi-class classification problem. The bound involves the marginal distribution of the classifier and the Rademacher complexity of the hypothesis class. We prove that our bound is tight in the number of classes. Finally, we compare our bound with the related ones and provide a…
▽ More
We consider a problem of risk estimation for large-margin multi-class classifiers. We propose a novel risk bound for the multi-class classification problem. The bound involves the marginal distribution of the classifier and the Rademacher complexity of the hypothesis class. We prove that our bound is tight in the number of classes. Finally, we compare our bound with the related ones and provide a simplified version of the bound for the multi-class classification with kernel based hypotheses.
△ Less
Submitted 2 July, 2016; v1 submitted 10 July, 2015;
originally announced July 2015.