-
$(ε, δ)$-Differentially Private Partial Least Squares Regression
Authors:
Ramin Nikzad-Langerodi,
Mohit Kumar,
Du Nguyen Duy,
Mahtab Alghasi
Abstract:
As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulne…
▽ More
As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulnerable to privacy attacks. To address this gap, we propose an $(ε, δ)$-differentially private PLS (edPLS) algorithm, which integrates well-studied and theoretically motivated Gaussian noise-adding mechanisms into the PLS algorithm to ensure the privacy of the data underlying the model. Our approach involves adding carefully calibrated Gaussian noise to the outputs of four key functions in the PLS algorithm: the weights, scores, $X$-loadings, and $Y$-loadings. The noise variance is determined based on the global sensitivity of each function, ensuring that the privacy loss is controlled according to the $(ε, δ)$-differential privacy framework. Specifically, we derive the sensitivity bounds for each function and use these bounds to calibrate the noise added to the model components. Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data, ineffective. Application of edPLS to the NIR corn benchmark dataset shows that the root mean squared error of prediction (RMSEP) remains competitive even at strong privacy levels (i.e., $ε=1$), given proper pre-processing of the corresponding spectra. These findings highlight the practical utility of edPLS in creating privacy-preserving multivariate calibrations and for the analysis of their privacy-utility trade-offs.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Interval-Valued Fuzzy Fault Tree Analysis through Qualitative Data Processing and its Applications in Marine Operations
Authors:
Hitesh Khungla,
Kulbir Singh,
Mohit Kumar
Abstract:
Marine accidents highlight the crucial need for human safety. They result in loss of life, environmental harm, and significant economic costs, emphasizing the importance of being proactive and taking precautionary steps. This study aims to identify the root causes of accidents, to develop effective strategies for preventing them. Due to the lack of accurate quantitative data or reliable probabilit…
▽ More
Marine accidents highlight the crucial need for human safety. They result in loss of life, environmental harm, and significant economic costs, emphasizing the importance of being proactive and taking precautionary steps. This study aims to identify the root causes of accidents, to develop effective strategies for preventing them. Due to the lack of accurate quantitative data or reliable probability information, we employ qualitative approaches to assess the reliability of complex systems. We collect expert judgments regarding the failure likelihood of each basic event and aggregate those opinions using the Similarity-based Aggregation Method (SAM) to form a collective assessment. In SAM, we convert expert opinions into failure probability using interval-valued triangular fuzzy numbers. Since each expert possesses different knowledge and various levels of experience, we need to assign weights to their opinions to reflect their relative expertise. We employ the Best-Worst Method (BWM) to calculate the weights of each criterion, and then use the weighting scores to determine the weights of each expert. Ranking of basic events according to their criticality is a crucial step, and in this study, we use the FVI measure to prioritize and rank these events according to their criticality level. To demonstrate the effectiveness and validity of our proposed methodology, we apply our method to two case studies: (1) chemical cargo contamination, and (2) the loss of ship steering ability. These case studies serve as examples to illustrate the practicality and utility of our approach in evaluating criticality and assessing risk in complex systems.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Modularity aided consistent attributed graph clustering via coarsening
Authors:
Samarth Bhatia,
Yukti Makhija,
Manoj Kumar,
Sandeep Kumar
Abstract:
Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectivel…
▽ More
Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.
△ Less
Submitted 17 November, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Design of variable acceptance sampling plan for exponential distribution under uncertainty
Authors:
Mahesh Kumar,
Ashlyn Maria Mathai
Abstract:
In an acceptance monitoring system, acceptance sampling techniques are used to increase production, enhance control, and deliver higher-quality products at a lesser cost. It might not always be possible to define the acceptance sampling plan parameters as exact values, especially, when data has uncertainty. In this work, acceptance sampling plans for a large number of identical units with exponent…
▽ More
In an acceptance monitoring system, acceptance sampling techniques are used to increase production, enhance control, and deliver higher-quality products at a lesser cost. It might not always be possible to define the acceptance sampling plan parameters as exact values, especially, when data has uncertainty. In this work, acceptance sampling plans for a large number of identical units with exponential lifetimes are obtained by treating acceptable quality life, rejectable quality life, consumer's risk, and producer's risk as fuzzy parameters. To obtain plan parameters of sequential sampling plans and repetitive group sampling plans, fuzzy hypothesis test is considered. To validate the sampling plans obtained in this work, some examples are presented. Our results are compared with existing results in the literature. Finally, to demonstrate the application of the resulting sampling plans, a real-life case study is presented.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Information Geometry for the Working Information Theorist
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar,
Ting-Kam Leonard Wong
Abstract:
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas…
▽ More
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Detecting Errors in a Numerical Response via any Regression Model
Authors:
Hang Zhou,
Jonas Mueller,
Mayank Kumar,
Jane-Ling Wang,
Jing Lei
Abstract:
Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider general regression settings with covariates and a potentially corrupted response whose observed values may contain errors. By accounting for various uncertaint…
▽ More
Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider general regression settings with covariates and a potentially corrupted response whose observed values may contain errors. By accounting for various uncertainties, we introduced veracity scores that distinguish between genuine errors and natural data fluctuations, conditioned on the available covariate information in the dataset. We propose a simple yet efficient filtering procedure for eliminating potential errors, and establish theoretical guarantees for our method. We also contribute a new error detection benchmark involving 5 regression datasets with real-world numerical errors (for which the true values are also known). In this benchmark and additional simulation studies, our method identifies incorrect values with better precision/recall than other approaches.
△ Less
Submitted 12 March, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Expressive Losses for Verified Robustness via Convex Combinations
Authors:
Alessandro De Palma,
Rudy Bunel,
Krishnamurthy Dvijotham,
M. Pawan Kumar,
Robert Stanforth,
Alessio Lomuscio
Abstract:
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypot…
▽ More
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
△ Less
Submitted 18 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
A Unified Framework for Optimization-Based Graph Coarsening
Authors:
Manoj Kumar,
Anurag Sharma,
Sandeep Kumar
Abstract:
Graph coarsening is a widely used dimensionality reduction technique for approaching large-scale graph machine learning problems. Given a large graph, graph coarsening aims to learn a smaller-tractable graph while preserving the properties of the originally given graph. Graph data consist of node features and graph matrix (e.g., adjacency and Laplacian). The existing graph coarsening methods ignor…
▽ More
Graph coarsening is a widely used dimensionality reduction technique for approaching large-scale graph machine learning problems. Given a large graph, graph coarsening aims to learn a smaller-tractable graph while preserving the properties of the originally given graph. Graph data consist of node features and graph matrix (e.g., adjacency and Laplacian). The existing graph coarsening methods ignore the node features and rely solely on a graph matrix to simplify graphs. In this paper, we introduce a novel optimization-based framework for graph dimensionality reduction. The proposed framework lies in the unification of graph learning and dimensionality reduction. It takes both the graph matrix and the node features as the input and learns the coarsen graph matrix and the coarsen feature matrix jointly while ensuring desired properties. The proposed optimization formulation is a multi-block non-convex optimization problem, which is solved efficiently by leveraging block majorization-minimization, $\log$ determinant, Dirichlet energy, and regularization frameworks. The proposed algorithms are provably convergent and practically amenable to numerous tasks. It is also established that the learned coarsened graph is $ε\in(0,1)$ similar to the original graph. Extensive experiments elucidate the efficacy of the proposed framework for real-world applications.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Lookback for Learning to Branch
Authors:
Prateek Gupta,
Elias B. Khalil,
Didier Chetélat,
Maxime Gasse,
Yoshua Bengio,
Andrea Lodi,
M. Pawan Kumar
Abstract:
The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collec…
▽ More
The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times.
△ Less
Submitted 29 December, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound
Authors:
Alessandro De Palma,
Rudy Bunel,
Krishnamurthy Dvijotham,
M. Pawan Kumar,
Robert Stanforth
Abstract:
Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these algorithms either underperform or require complex and expensive stage-wise training procedures, hindering their practical applicability. We present IBP-R, a novel v…
▽ More
Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these algorithms either underperform or require complex and expensive stage-wise training procedures, hindering their practical applicability. We present IBP-R, a novel verified training algorithm that is both simple and effective. IBP-R induces network verifiability by coupling adversarial attacks on enlarged domains with a regularization term, based on inexpensive interval bound propagation, that minimizes the gap between the non-convex verification problem and its approximations. By leveraging recent branch-and-bound frameworks, we show that IBP-R obtains state-of-the-art verified robustness-accuracy trade-offs for small perturbations on CIFAR-10 while training significantly faster than relevant previous work. Additionally, we present UPB, a novel branching strategy that, relying on a simple heuristic based on $β$-CROWN, reduces the cost of state-of-the-art branching algorithms while yielding splits of comparable quality.
△ Less
Submitted 31 May, 2023; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Fragility curves for power transmission towers in Odisha, India, based on observed damage during 2019 Cyclone Fani
Authors:
Surender V Raj,
Manish Kumar,
Udit Bhatia
Abstract:
Lifeline infrastructure systems such as a power transmission network in coastal regions are vulnerable to strong winds generated during tropical cyclones. Understanding the fragility of individual towers is helpful in improving the resilience of such systems. Fragility curves have been developed in the past for some regions, but without considering relevant epistemic uncertainties. Further, risk a…
▽ More
Lifeline infrastructure systems such as a power transmission network in coastal regions are vulnerable to strong winds generated during tropical cyclones. Understanding the fragility of individual towers is helpful in improving the resilience of such systems. Fragility curves have been developed in the past for some regions, but without considering relevant epistemic uncertainties. Further, risk and resilience studies are best performed using the fragility curves specific to a region. Such studies become particularly important if the region is exposed to cyclones rather frequently. This paper presents the development of fragility curves for high-voltage power transmission towers in the state of Odisha, India, based on macro-level damage data from 2019 cyclone Fani, which was obtained through concerned government offices. Two types of damages were identified, namely, collapse and partial damage. Accordingly, fragility curves for collapse and functionality disruption damage states were developed considering relevant aleatory and epistemic uncertainties. The latter class of uncertainties included that associated with wind speed estimation at a location and the finite sample uncertainty. The most significant contribution in the epistemic uncertainty was due to the wind speed estimation at a location. The median and logarithmic standard deviation for the 50th percentile fragility curve associated with collapse was close to that for the functionality disruption damage state. These curves also compared reasonably well with those reported for similar structures in other parts of the world.
△ Less
Submitted 26 June, 2021;
originally announced July 2021.
-
Cyclone preparedness strategies for regional power transmission systems in data-scarce coastal regions of India
Authors:
Surender V Raj,
Udit Bhatia,
Manish Kumar
Abstract:
As the frequency and intensity of tropical cyclones, and the degree of urbanization increase, a systematic strengthening of power transmission networks in the coastal regions becomes imperative. An effective strategy for the same can be to strengthen select transmission towers, which requires consideration of network at holistic scale and its orientation relative to the coastline, the fragility of…
▽ More
As the frequency and intensity of tropical cyclones, and the degree of urbanization increase, a systematic strengthening of power transmission networks in the coastal regions becomes imperative. An effective strategy for the same can be to strengthen select transmission towers, which requires consideration of network at holistic scale and its orientation relative to the coastline, the fragility of towers, and properties of cyclones. Since necessary information is often missing, actionable frameworks for prioritization remain elusive. Based on publicly available data, we assess efficacies of strategic interventions in the network that serves 40 million people. After evaluating 72 strategies for prioritization, we find that strategies that consider rather simplistic properties of the network and its orientation with respect to the coastline work much better than those based purely on the network's properties, in spite of minor variations in fragilities of towers. This integrated approach opens avenues for actionable engineering and policy interventions in resource-constrained and data-deprived settings.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition
Authors:
Alessandro De Palma,
Rudy Bunel,
Alban Desmaison,
Krishnamurthy Dvijotham,
Pushmeet Kohli,
Philip H. S. Torr,
M. Pawan Kumar
Abstract:
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. First, we propose novel bounding algorithms based on Lagrangian Decomposition. Previous works have used off-the-shelf solvers to solve relaxations at each node of the BaB tree, or constructed weaker relaxations that can be solved efficiently, but lead to unnecessarily we…
▽ More
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. First, we propose novel bounding algorithms based on Lagrangian Decomposition. Previous works have used off-the-shelf solvers to solve relaxations at each node of the BaB tree, or constructed weaker relaxations that can be solved efficiently, but lead to unnecessarily weak bounds. Our formulation restricts the optimization to a subspace of the dual domain that is guaranteed to contain the optimum, resulting in accelerated convergence. Furthermore, it allows for a massively parallel implementation, which is amenable to GPU acceleration via modern deep learning frameworks. Second, we present a novel activation-based branching strategy. By coupling an inexpensive heuristic with fast dual bounding, our branching scheme greatly reduces the size of the BaB tree compared to previous heuristic methods. Moreover, it performs competitively with a recent strategy based on learning algorithms, without its large offline training cost. Finally, we design a BaB framework, named Branch and Dual Network Bound (BaDNB), based on our novel bounding and branching algorithms. We show that BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial robustness properties.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Information Geometry and Classical Cramér-Rao Type Inequalities
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar
Abstract:
We examine the role of information geometry in the context of classical Cramér-Rao (CR) type inequalities. In particular, we focus on Eguchi's theory of obtaining dualistic geometric structures from a divergence function and then applying Amari-Nagoaka's theory to obtain a CR type inequality. The classical deterministic CR inequality is derived from Kullback-Leibler (KL)-divergence. We show that t…
▽ More
We examine the role of information geometry in the context of classical Cramér-Rao (CR) type inequalities. In particular, we focus on Eguchi's theory of obtaining dualistic geometric structures from a divergence function and then applying Amari-Nagoaka's theory to obtain a CR type inequality. The classical deterministic CR inequality is derived from Kullback-Leibler (KL)-divergence. We show that this framework could be generalized to other CR type inequalities through four examples: $α$-version of CR inequality, generalized CR inequality, Bayesian CR inequality, and Bayesian $α$-CR inequality. These are obtained from, respectively, $I_α$-divergence (or relative $α$-entropy), generalized Csiszár divergence, Bayesian KL divergence, and Bayesian $I_α$-divergence.
△ Less
Submitted 21 August, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Chemical speciation and source apportionment of ambient PM2.5 in New Delhi before, during, and after the Diwali fireworks
Authors:
Chirag Manchanda,
Mayank Kumar,
Vikram Singh,
Naba Hazarika,
Mohd Faisal,
Vipul Lalchandani,
Ashutosh Shukla,
Jay Dave,
Neeraj Rastogi,
Sachchida Nand Tripathi
Abstract:
Diwali is among the most important Indian festivals, and elaborate firework displays mark the evening's festivities. This study assesses the impact of Diwali on the concentration, composition, and sources of ambient PM2.5. We observed the total PM2.5 concentrations to rise to 16 times the pre-firework levels, while each of the elemental, organic, and black carbon fractions of ambient PM2.5 increas…
▽ More
Diwali is among the most important Indian festivals, and elaborate firework displays mark the evening's festivities. This study assesses the impact of Diwali on the concentration, composition, and sources of ambient PM2.5. We observed the total PM2.5 concentrations to rise to 16 times the pre-firework levels, while each of the elemental, organic, and black carbon fractions of ambient PM2.5 increased by a factor of 46.1, 3.7, and 5.6, respectively. The concentration of species like K, Al, Sr, Ba, S, and Bi displayed distinct peaks during the firework event and were identified as tracers. The average concentrations of potential carcinogens, like As, exceeded US EPA screening levels for industrial air by a factor of ~9.6, while peak levels reached up to 16.1 times the screening levels. The source apportionment study, undertaken using positive matrix factorization, revealed the fireworks to account for 95% of the total elemental PM2.5 during Diwali. The resolved primary organic emissions, too, were enhanced by a factor of 8 during Diwali. Delhi has encountered serious haze events following Diwali in recent years; this study highlights that biomass burning emissions rather than the fireworks drive the poor air quality in the days following Diwali.
△ Less
Submitted 21 April, 2022; v1 submitted 29 November, 2020;
originally announced November 2020.
-
Variation in chemical composition and sources of PM2.5 during the COVID-19 lockdown in Delhi
Authors:
Chirag Manchanda,
Mayank Kumar,
Vikram Singh,
Mohd Faisal,
Naba Hazarika,
Ashutosh Shukla,
Vipul Lalchandani,
Vikas Goel,
Navaneeth Thamban,
Dilip Ganguly,
Sachchida Nand Tripathi
Abstract:
The Government of India (GOI) announced a nationwide lockdown starting 25th March 2020 to contain the spread of COVID-19, leading to an unprecedented decline in anthropogenic activities and in turn improvements in ambient air quality. This is the first study to focus on highly time-resolved chemical speciation and source apportionment of PM$_{2.5}$ to assess the impact of the lockdown and subseque…
▽ More
The Government of India (GOI) announced a nationwide lockdown starting 25th March 2020 to contain the spread of COVID-19, leading to an unprecedented decline in anthropogenic activities and in turn improvements in ambient air quality. This is the first study to focus on highly time-resolved chemical speciation and source apportionment of PM$_{2.5}$ to assess the impact of the lockdown and subsequent relaxations on the sources of ambient PM$_{2.5}$ in Delhi, India. The elemental, organic, and black carbon fractions of PM$_{2.5}$ were measured at the IIT Delhi campus from February 2020 to May 2020. We report source apportionment results using positive matrix factorization (PMF) of organic and elemental fractions of PM$_{2.5}$ during the different phases of the lockdown. The resolved sources such as vehicular emissions, domestic coal combustion, and semi-volatile oxygenated organic aerosol (SVOOA) were found to decrease by 96%, 95%, and 86%, respectively, during lockdown phase-1 as compared to pre-lockdown. An unforeseen rise in O$_3$ concentrations with declining NO$_x$ levels was observed, similar to other parts of the globe, leading to the low-volatility oxygenated organic aerosols (LVOOA) increasing to almost double the pre-lockdown concentrations during the last phase of the lockdown. The effect of the lockdown was found to be less pronounced on other resolved sources like secondary chloride, power plants, dust-related, hydrocarbon-like organic aerosols (HOA), and biomass burning related emissions, which were also swayed by the changing meteorological conditions during the four lockdown phases. The results presented in this study provide a basis for future emission control strategies, quantifying the extent to which constraining certain anthropogenic activities can ameliorate the ambient air.
△ Less
Submitted 11 April, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Machine Guides, Human Supervises: Interactive Learning with Global Explanations
Authors:
Teodora Popordanoska,
Mohit Kumar,
Stefano Teso
Abstract:
We introduce explanatory guided learning (XGL), a novel interactive learning strategy in which a machine guides a human supervisor toward selecting informative examples for a classifier. The guidance is provided by means of global explanations, which summarize the classifier's behavior on different regions of the instance space and expose its flaws. Compared to other explanatory interactive learni…
▽ More
We introduce explanatory guided learning (XGL), a novel interactive learning strategy in which a machine guides a human supervisor toward selecting informative examples for a classifier. The guidance is provided by means of global explanations, which summarize the classifier's behavior on different regions of the instance space and expose its flaws. Compared to other explanatory interactive learning strategies, which are machine-initiated and rely on local explanations, XGL is designed to be robust against cases in which the explanations supplied by the machine oversell the classifier's quality. Moreover, XGL leverages global explanations to open up the black-box of human-initiated interaction, enabling supervisors to select informative examples that challenge the learned model. By drawing a link to interactive machine teaching, we show theoretically that global explanations are a viable approach for guiding supervisors. Our simulations show that explanatory guided learning avoids overselling the model's quality and performs comparably or better than machine- and human-initiated interactive learning strategies in terms of model quality.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Hybrid Models for Learning to Branch
Authors:
Prateek Gupta,
Maxime Gasse,
Elias B. Khalil,
M. Pawan Kumar,
Andrea Lodi,
Yoshua Bengio
Abstract:
A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two…
▽ More
A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at https://github.com/pg2455/Hybrid-learn2branch.
△ Less
Submitted 23 October, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
A Weighted Mutual k-Nearest Neighbour for Classification Mining
Authors:
Joydip Dhar,
Ashaya Shukla,
Mukul Kumar,
Prashant Gupta
Abstract:
kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal o…
▽ More
kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset so as to provide comparative better results. This algorithm also tries to minimize effect of those neighbours which are distant. A concept of certainty measure is also introduced for experimental results. The advantage of using concept of mutual neighbours and distance-weighted voting is that, dataset will be refined after removal of anomaly and weightage concept compels to take into account more consideration of those neighbours, which are closer. Consequently, finally the performance of proposed algorithm is calculated.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Lagrangian Decomposition for Neural Network Verification
Authors:
Rudy Bunel,
Alessandro De Palma,
Alban Desmaison,
Krishnamurthy Dvijotham,
Pushmeet Kohli,
Philip H. S. Torr,
M. Pawan Kumar
Abstract:
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradien…
▽ More
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds. Empirically, we show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speed-up when employing the bounds for formal verification. Code for our algorithms is available at https://github.com/oval-group/decomposition-plnn-bounds.
△ Less
Submitted 17 June, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Generalized Bayesian Cramér-Rao Inequality via Information Geometry of Relative $α$-Entropy
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar
Abstract:
The relative $α$-entropy is the Rényi analog of relative entropy and arises prominently in information-theoretic problems. Recent information geometric investigations on this quantity have enabled the generalization of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator of an escort of the underlying parametric probability distribution. However, this framework…
▽ More
The relative $α$-entropy is the Rényi analog of relative entropy and arises prominently in information-theoretic problems. Recent information geometric investigations on this quantity have enabled the generalization of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator of an escort of the underlying parametric probability distribution. However, this framework remains unexamined in the Bayesian framework. In this paper, we propose a general Riemannian metric based on relative $α$-entropy to obtain a generalized Bayesian Cramér-Rao inequality. This establishes a lower bound for the variance of an unbiased estimator for the $α$-escort distribution starting from an unbiased estimator for the underlying distribution. We show that in the limiting case when the entropy order approaches unity, this framework reduces to the conventional Bayesian Cramér-Rao inequality. Further, in the absence of priors, the same framework yields the deterministic Cramér-Rao inequality.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Scalable bundling via dense product embeddings
Authors:
Madhav Kumar,
Dean Eckles,
Sinan Aral
Abstract:
Bundling, the practice of jointly selling two or more products at a discount, is a widely used strategy in industry and a well examined concept in academia. Historically, the focus has been on theoretical studies in the context of monopolistic firms and assumed product relationships, e.g., complementarity in usage. We develop a new machine-learning-driven methodology for designing bundles in a lar…
▽ More
Bundling, the practice of jointly selling two or more products at a discount, is a widely used strategy in industry and a well examined concept in academia. Historically, the focus has been on theoretical studies in the context of monopolistic firms and assumed product relationships, e.g., complementarity in usage. We develop a new machine-learning-driven methodology for designing bundles in a large-scale, cross-category retail setting. We leverage historical purchases and consideration sets created from clickstream data to generate dense continuous representations of products called embeddings. We then put minimal structure on these embeddings and develop heuristics for complementarity and substitutability among products. Subsequently, we use the heuristics to create multiple bundles for each product and test their performance using a field experiment with a large retailer. We combine the results from the experiment with product embeddings using a hierarchical model that maps bundle features to their purchase likelihood, as measured by the add-to-cart rate. We find that our embeddings-based heuristics are strong predictors of bundle success, robust across product categories, and generalize well to the retailer's entire assortment.
△ Less
Submitted 31 January, 2020;
originally announced February 2020.
-
Cramér-Rao Lower Bounds Arising from Generalized Csiszár Divergences
Authors:
M. Ashok Kumar,
Kumar Vijay Mishra
Abstract:
We study the geometry of probability distributions with respect to a generalized family of Csiszár $f$-divergences. A member of this family is the relative $α$-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi's theory to derive the Fisher information metric and the dual affine conne…
▽ More
We study the geometry of probability distributions with respect to a generalized family of Csiszár $f$-divergences. A member of this family is the relative $α$-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi's theory to derive the Fisher information metric and the dual affine connections arising from these generalized divergence functions. This enables us to arrive at a more widely applicable version of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator for an escort of the underlying parametric probability distribution. We then extend the Amari-Nagaoka's dually flat structure of the exponential and mixer models to other distributions with respect to the aforementioned generalized metric. We show that these formulations lead us to find unbiased and efficient estimators for the escort model. Finally, we compare our work with prior results on generalized Cramér-Rao inequalities that were derived from non-information-geometric frameworks.
△ Less
Submitted 24 May, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Shareable Representations for Search Query Understanding
Authors:
Mukul Kumar,
Youna Hu,
Will Headden,
Rahul Goutam,
Heran Lin,
Bing Yin
Abstract:
Understanding search queries is critical for shopping search engines to deliver a satisfying customer experience. Popular shopping search engines receive billions of unique queries yearly, each of which can depict any of hundreds of user preferences or intents. In order to get the right results to customers it must be known queries like "inexpensive prom dresses" are intended to not only surface r…
▽ More
Understanding search queries is critical for shopping search engines to deliver a satisfying customer experience. Popular shopping search engines receive billions of unique queries yearly, each of which can depict any of hundreds of user preferences or intents. In order to get the right results to customers it must be known queries like "inexpensive prom dresses" are intended to not only surface results of a certain product type but also products with a low price. Referred to as query intents, examples also include preferences for author, brand, age group, or simply a need for customer service. Recent works such as BERT have demonstrated the success of a large transformer encoder architecture with language model pre-training on a variety of NLP tasks. We adapt such an architecture to learn intents for search queries and describe methods to account for the noisiness and sparseness of search query data. We also describe cost effective ways of hosting transformer encoder models in context with low latency requirements. With the right domain-specific training we can build a shareable deep learning model whose internal representation can be reused for a variety of query understanding tasks including query intent identification. Model sharing allows for fewer large models needed to be served at inference time and provides a platform to quickly build and roll out new search query classifiers.
△ Less
Submitted 20 December, 2019;
originally announced January 2020.
-
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Authors:
Abhinav Garg,
Dhananjaya Gowda,
Ankur Kumar,
Kwangyoun Kim,
Mehul Kumar,
Chanwoo Kim
Abstract:
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic gra…
▽ More
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).
△ Less
Submitted 27 December, 2019;
originally announced December 2019.
-
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Authors:
Chanwoo Kim,
Mehul Kumar,
Kwangyoun Kim,
Dhananjaya Gowda
Abstract:
In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filter…
▽ More
In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filterbank coefficients and apply nonlinearity functions for each filterbank channel. With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution. With the histogram-based MUD, the empirical Cumulative Density Function (CDF) from the training database is employed to transform the original distribution into a uniform distribution. In MUD processing, we do not use any prior knowledge (e.g. logarithmic relation) about the energy of the incoming signal and the perceived intensity by a human. Experimental results using an end-to-end speech recognition system demonstrate that power-function based MUD shows better result than the conventional Mel Filterbank Cepstral Coefficients (MFCCs). On the LibriSpeech database, we could achieve 4.02 % WER on test-clean and 13.34 % WER on test-other without using any Language Models (LMs). The major contribution of this work is that we developed a new algorithm for designing the compressive nonlinearity in a data-driven way, which is much more flexible than the previous approaches and may be extended to other domains as well.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
end-to-end training of a large vocabulary end-to-end speech recognition system
Authors:
Chanwoo Kim,
Sungsoo Kim,
Kwangyoun Kim,
Mehul Kumar,
Jiyeon Kim,
Kyungmin Lee,
Changwoo Han,
Abhinav Garg,
Eunhyang Kim,
Minkyoo Shin,
Shatrughan Singh,
Larry Heck,
Dhananjaya Gowda
Abstract:
In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire data reading, large scale data augmentation, neural network parameter updates are all performed "on-the-fly". We use vocal tract length perturbation […
▽ More
In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire data reading, large scale data augmentation, neural network parameter updates are all performed "on-the-fly". We use vocal tract length perturbation [1] and an acoustic simulator [2] for data augmentation. The processed features and labels are sent to the GPU cluster. The Horovod allreduce approach is employed to train neural network parameters. We evaluated the effectiveness of our system on the standard Librispeech corpus [3] and the 10,000-hr anonymized Bixby English dataset. Our end-to-end speech recognition system built using this training infrastructure showed a 2.44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM). For the proprietary English Bixby open domain test set, we obtained a WER of 7.92 % using a Bidirectional Full Attention (BFA) end-to-end model after applying shallow fusion with an RNN-LM. When the monotonic chunckwise attention (MoCha) based approach is employed for streaming speech recognition, we obtained a WER of 9.95 % on the same Bixby open domain test set.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
Neural Network Branching for Neural Network Verification
Authors:
Jingyue Lu,
M. Pawan Kumar
Abstract:
Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic…
▽ More
Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic behaviour. Our framework differs from previous methods for learning to branch in two main aspects. Firstly, our framework directly treats the neural network we want to verify as a graph input for the GNN. Secondly, we develop an intuitive forward and backward embedding update schedule. Empirically, our framework achieves roughly $50\%$ reduction in both the number of branches and the time required for verification on various convolutional networks when compared to the best available hand-designed branching strategy. In addition, we show that our GNN model enjoys both horizontal and vertical transferability. Horizontally, the model trained on easy properties performs well on properties of increased difficulty levels. Vertically, the model trained on small neural networks achieves similar performance on large neural networks.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Branch and Bound for Piecewise Linear Neural Network Verification
Authors:
Rudy Bunel,
Jingyue Lu,
Ilker Turkaslan,
Philip H. S. Torr,
Pushmeet Kohli,
M. Pawan Kumar
Abstract:
The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. In this context, verification involves proving or disproving that an NN model satisfies certain input-output properties. Despite the reputation of learned NN models as black boxes, and the theoretical hardness of proving useful propert…
▽ More
The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. In this context, verification involves proving or disproving that an NN model satisfies certain input-output properties. Despite the reputation of learned NN models as black boxes, and the theoretical hardness of proving useful properties about them, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure and taking insights from formal methods such as Satisifiability Modulo Theory. However, these methods are still far from scaling to realistic neural networks. To facilitate progress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases. With the help of the BaB framework, we make three key contributions. Firstly, we identify new methods that combine the strengths of multiple existing approaches, accomplishing significant performance improvements over previous state of the art. Secondly, we introduce an effective branching strategy on ReLU non-linearities. This branching strategy allows us to efficiently and successfully deal with high input dimensional problems with convolutional network architecture, on which previous methods fail frequently. Finally, we propose comprehensive test data sets and benchmarks which includes a collection of previously released testcases. We use the data sets to conduct a thorough experimental comparison of existing and new algorithms and to provide an inclusive analysis of the factors impacting the hardness of verification problems.
△ Less
Submitted 26 October, 2020; v1 submitted 14 September, 2019;
originally announced September 2019.
-
Training Neural Networks for and by Interpolation
Authors:
Leonard Berrada,
Andrew Zisserman,
M. Pawan Kumar
Abstract:
In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two…
▽ More
In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two main advantages of Stochastic Gradient Descent (SGD), which are (i) a low computational cost per iteration and (ii) good generalization performance in practice. At each iteration, ALI-G exploits the interpolation property to compute an adaptive learning-rate in closed form. In addition, ALI-G clips the learning-rate to a maximal value, which we prove to be helpful for non-convex problems. Crucially, in contrast to the learning-rate of SGD, the maximal learning-rate of ALI-G does not require a decay schedule, which makes it considerably easier to tune. We provide convergence guarantees of ALI-G in various stochastic settings. Notably, we tackle the realistic case where the interpolation property is satisfied up to some tolerance. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. ALI-G produces state-of-the-art results among adaptive methods, and even yields comparable performance with SGD, which requires manually tuned learning-rate schedules. Furthermore, ALI-G is simple to implement in any standard deep learning framework and can be used as a drop-in replacement in existing code.
△ Less
Submitted 1 August, 2020; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Fast Online "Next Best Offers" using Deep Learning
Authors:
Rekha Singhal,
Gautam Shroff,
Mukund Kumar,
Sharod Roy,
Sanket Kadarkar,
Rupinder virk,
Siddharth Verma,
Vartika Tiwari
Abstract:
In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses an ensemble of deep learning and machine learning algorithms for prediction. We describe the scalable re…
▽ More
In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses an ensemble of deep learning and machine learning algorithms for prediction. We describe the scalable real-time streaming technology stack and optimized machine-learning implementations to achieve a 90th percentile recommendation latency of 38 milliseconds. Optimizations include a novel mechanism to deploy recurrent Long Short Term Memory (LSTM) deep learning networks efficiently.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Deep Frank-Wolfe For Neural Network Optimization
Authors:
Leonard Berrada,
Andrew Zisserman,
M. Pawan Kumar
Abstract:
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In…
▽ More
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In addition, its adaptive variants tend to produce solutions that generalize less well on unseen data than SGD with a hand-designed schedule. We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. Our approach is based on a composite proximal framework, which exploits the compositional nature of deep neural networks and can leverage powerful convex optimization algorithms by design. Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes an optimal step-size in closed-form at each time-step. We further show that the descent direction is given by a simple backward pass in the network, yielding the same computational cost per iteration as SGD. We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed learning rate schedule, and show that it provides similar generalization while converging faster. The code is publicly available at https://github.com/oval-group/dfw.
△ Less
Submitted 21 February, 2021; v1 submitted 19 November, 2018;
originally announced November 2018.
-
A Statistical Approach to Assessing Neural Network Robustness
Authors:
Stefan Webb,
Tom Rainforth,
Yee Whye Teh,
M. Pawan Kumar
Abstract:
We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. Specifically, we estimate the probability of the event that the property is violated under an input model. Our approach critically varies from the formal verification framework in that when the property can be violated, it provides an informative n…
▽ More
We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. Specifically, we estimate the probability of the event that the property is violated under an input model. Our approach critically varies from the formal verification framework in that when the property can be violated, it provides an informative notion of how robust the network is, rather than just the conventional assertion that the network is not verifiable. Furthermore, it provides an ability to scale to larger networks than formal verification approaches. Though the framework still provides a formal guarantee of satisfiability whenever it successfully finds one or more violations, these advantages do come at the cost of only providing a statistical estimate of unsatisfiability whenever no violation is found. Key to the practical success of our approach is an adaptation of multi-level splitting, a Monte Carlo approach for estimating the probability of rare events, to our statistical robustness framework. We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability.
△ Less
Submitted 21 February, 2019; v1 submitted 17 November, 2018;
originally announced November 2018.
-
Empowerment-driven Exploration using Mutual Information Estimation
Authors:
Navneet Madhu Kumar
Abstract:
Exploration is a difficult challenge in reinforcement learning and is of prime importance in sparse reward environments. However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. In such cases, empowerment can serve as an intrinsic reward signal to enable the agent to maximize the influence it has over the near future. We…
▽ More
Exploration is a difficult challenge in reinforcement learning and is of prime importance in sparse reward environments. However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. In such cases, empowerment can serve as an intrinsic reward signal to enable the agent to maximize the influence it has over the near future. We formulate empowerment as the channel capacity between states and actions and is calculated by estimating the mutual information between the actions and the following states. The mutual information is estimated using Mutual Information Neural Estimator and a forward dynamics model. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma's Revenge.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Worst-case Optimal Submodular Extensions for Marginal Estimation
Authors:
Pankaj Pansari,
Chris Russell,
M. Pawan Kumar
Abstract:
Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference. The accuracy of the marginals depends crucially on the quality of the submodular extension. To identify the best possible extension, we show an equivalence between the submodular extensions of the energy and the objective functions of linear programming (LP) relaxations f…
▽ More
Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference. The accuracy of the marginals depends crucially on the quality of the submodular extension. To identify the best possible extension, we show an equivalence between the submodular extensions of the energy and the objective functions of linear programming (LP) relaxations for the corresponding MAP estimation problem. This allows us to (i) establish the worst-case optimality of the submodular extension for Potts model used in the literature; (ii) identify the worst-case optimal submodular extension for the more general class of metric labeling; and (iii) efficiently compute the marginals for the widely used dense CRF model with the help of a recently proposed Gaussian filtering method. Using synthetic and real data, we show that our approach provides comparable upper bounds on the log-partition function to those obtained using tree-reweighted message passing (TRW) in cases where the latter is computationally feasible. Importantly, unlike TRW, our approach provides the first practical algorithm to compute an upper bound on the dense CRF model.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Decomposition Strategies for Constructive Preference Elicitation
Authors:
Paolo Dragone,
Stefano Teso,
Mohit Kumar,
Andrea Passerini
Abstract:
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previ…
▽ More
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
△ Less
Submitted 6 May, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Universal Adversarial Perturbations Against Semantic Image Segmentation
Authors:
Jan Hendrik Metzen,
Mummadi Chaithanya Kumar,
Thomas Brox,
Volker Fischer
Abstract:
While deep learning is remarkably successful on perceptual tasks, it was also shown to be vulnerable to adversarial perturbations of the input. These perturbations denote noise added to the input that was generated specifically to fool the system while being quasi-imperceptible for humans. More severely, there even exist universal perturbations that are input-agnostic but fool the network on the m…
▽ More
While deep learning is remarkably successful on perceptual tasks, it was also shown to be vulnerable to adversarial perturbations of the input. These perturbations denote noise added to the input that was generated specifically to fool the system while being quasi-imperceptible for humans. More severely, there even exist universal perturbations that are input-agnostic but fool the network on the majority of inputs. While recent work has focused on image classification, this work proposes attacks against semantic image segmentation: we present an approach for generating (universal) adversarial perturbations that make the network yield a desired target segmentation as output. We show empirically that there exist barely perceptible universal noise patterns which result in nearly the same predicted segmentation for arbitrary inputs. Furthermore, we also show the existence of universal noise which removes a target class (e.g., all pedestrians) from the segmentation while leaving the segmentation mostly unchanged otherwise.
△ Less
Submitted 31 July, 2017; v1 submitted 19 April, 2017;
originally announced April 2017.
-
FairJudge: Trustworthy User Prediction in Rating Platforms
Authors:
Srijan Kumar,
Bryan Hooi,
Disha Makhija,
Mohit Kumar,
Christos Faloutsos,
V. S. Subrahamanian
Abstract:
Rating platforms enable large-scale collection of user opinion about items (products, other users, etc.). However, many untrustworthy users give fraudulent ratings for excessive monetary gains. In the paper, we present FairJudge, a system to identify such fraudulent users. We propose three metrics: (i) the fairness of a user that quantifies how trustworthy the user is in rating the products, (ii)…
▽ More
Rating platforms enable large-scale collection of user opinion about items (products, other users, etc.). However, many untrustworthy users give fraudulent ratings for excessive monetary gains. In the paper, we present FairJudge, a system to identify such fraudulent users. We propose three metrics: (i) the fairness of a user that quantifies how trustworthy the user is in rating the products, (ii) the reliability of a rating that measures how reliable the rating is, and (iii) the goodness of a product that measures the quality of the product. Intuitively, a user is fair if it provides reliable ratings that are close to the goodness of the product. We formulate a mutually recursive definition of these metrics, and further address cold start problems and incorporate behavioral properties of users and products in the formulation. We propose an iterative algorithm, FairJudge, to predict the values of the three metrics. We prove that FairJudge is guaranteed to converge in a bounded number of iterations, with linear time complexity. By conducting five different experiments on five rating platforms, we show that FairJudge significantly outperforms nine existing algorithms in predicting fair and unfair users. We reported the 100 most unfair users in the Flipkart network to their review fraud investigators, and 80 users were correctly identified (80% accuracy). The FairJudge algorithm is already being deployed at Flipkart.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
Adversarial Examples for Semantic Image Segmentation
Authors:
Volker Fischer,
Mummadi Chaithanya Kumar,
Jan Hendrik Metzen,
Thomas Brox
Abstract:
Machine learning methods in general and Deep Neural Networks in particular have shown to be vulnerable to adversarial perturbations. So far this phenomenon has mainly been studied in the context of whole-image classification. In this contribution, we analyse how adversarial perturbations can affect the task of semantic segmentation. We show how existing adversarial attackers can be transferred to…
▽ More
Machine learning methods in general and Deep Neural Networks in particular have shown to be vulnerable to adversarial perturbations. So far this phenomenon has mainly been studied in the context of whole-image classification. In this contribution, we analyse how adversarial perturbations can affect the task of semantic segmentation. We show how existing adversarial attackers can be transferred to this task and that it is possible to create imperceptible adversarial perturbations that lead a deep network to misclassify almost all pixels of a chosen class while leaving network prediction nearly unchanged outside this class.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
A New Intelligence Based Approach for Computer-Aided Diagnosis of Dengue Fever
Authors:
Vadrevu Sree Hari Rao,
Mallenahalli Naresh Kumar
Abstract:
Identification of the influential clinical symptoms and laboratory features that help in the diagnosis of dengue fever in early phase of the illness would aid in designing effective public health management and virological surveillance strategies. Keeping this as our main objective we develop in this paper, a new computational intelligence based methodology that predicts the diagnosis in real time…
▽ More
Identification of the influential clinical symptoms and laboratory features that help in the diagnosis of dengue fever in early phase of the illness would aid in designing effective public health management and virological surveillance strategies. Keeping this as our main objective we develop in this paper, a new computational intelligence based methodology that predicts the diagnosis in real time, minimizing the number of false positives and false negatives. Our methodology consists of three major components (i) a novel missing value imputation procedure that can be applied on any data set consisting of categorical (nominal) and/or numeric (real or integer) (ii) a wrapper based features selection method with genetic search for extracting a subset of most influential symptoms that can diagnose the illness and (iii) an alternating decision tree method that employs boosting for generating highly accurate decision rules. The predictive models developed using our methodology are found to be more accurate than the state-of-the-art methodologies used in the diagnosis of the dengue fever.
△ Less
Submitted 30 January, 2015;
originally announced February 2015.
-
Performance comparison of State-of-the-art Missing Value Imputation Algorithms on Some Bench mark Datasets
Authors:
M. Naresh Kumar
Abstract:
Decision making from data involves identifying a set of attributes that contribute to effective decision making through computational intelligence. The presence of missing values greatly influences the selection of right set of attributes and this renders degradation in classification accuracies of the classifiers. As missing values are quite common in data collection phase during field experiment…
▽ More
Decision making from data involves identifying a set of attributes that contribute to effective decision making through computational intelligence. The presence of missing values greatly influences the selection of right set of attributes and this renders degradation in classification accuracies of the classifiers. As missing values are quite common in data collection phase during field experiments or clinical trails appropriate handling would improve the classifier performance. In this paper we present a review of recently developed missing value imputation algorithms and compare their performance on some bench mark datasets.
△ Less
Submitted 22 July, 2013;
originally announced July 2013.
-
Alternating Decision trees for early diagnosis of dengue fever
Authors:
M. Naresh Kumar
Abstract:
Dengue fever is a flu-like illness spread by the bite of an infected mosquito which is fast emerging as a major health problem. Timely and cost effective diagnosis using clinical and laboratory features would reduce the mortality rates besides providing better grounds for clinical management and disease surveillance. We wish to develop a robust and effective decision tree based approach for predic…
▽ More
Dengue fever is a flu-like illness spread by the bite of an infected mosquito which is fast emerging as a major health problem. Timely and cost effective diagnosis using clinical and laboratory features would reduce the mortality rates besides providing better grounds for clinical management and disease surveillance. We wish to develop a robust and effective decision tree based approach for predicting dengue disease. Our analysis is based on the clinical characteristics and laboratory measurements of the diseased individuals. We have developed and trained an alternating decision tree with boosting and compared its performance with C4.5 algorithm for dengue disease diagnosis. Of the 65 patient records a diagnosis establishes that 53 individuals have been confirmed to have dengue fever. An alternating decision tree based algorithm was able to differentiate the dengue fever using the clinical and laboratory data with number of correctly classified instances as 89%, F-measure of 0.86 and receiver operator characteristics (ROC) of 0.826 as compared to C4.5 having correctly classified instances as 78%,h F-measure of 0.738 and ROC of 0.617 respectively. Alternating decision tree based approach with boosting has been able to predict dengue fever with a higher degree of accuracy than C4.5 based decision tree using simple clinical and laboratory features. Further analysis on larger data sets is required to improve the sensitivity and specificity of the alternating decision trees.
△ Less
Submitted 5 June, 2013; v1 submitted 31 May, 2013;
originally announced May 2013.
-
Dual To Ratio Cum Product Estimator In Stratified Random Sampling
Authors:
Rajesh Singh,
Mukesh Kumar,
Manoj K. Chaudhary,
Cem Kadilar
Abstract:
Tracy et al.[8] have introduced a family of estimators using Srivenkataramana and Tracy ([6],[7]) transformation in simple random sampling. In this article, we have proposed a dual to ratio-cum-product estimator in stratified random sampling. The expressions of the mean square error of the proposed estimators are derived. Also, the theoretical findings are supported by a numerical example.
Tracy et al.[8] have introduced a family of estimators using Srivenkataramana and Tracy ([6],[7]) transformation in simple random sampling. In this article, we have proposed a dual to ratio-cum-product estimator in stratified random sampling. The expressions of the mean square error of the proposed estimators are derived. Also, the theoretical findings are supported by a numerical example.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.
-
A note on transformations on auxiliary variable in survey sampling
Authors:
Rajesh Singh,
Mukesh Kumar
Abstract:
In this note, we address the doubts of Singh (2001) and Gupta and Shabbir (2008) on the transformations of auxiliary variables by adding unit free constants. The original contribution by Sisodia and Dwivedi (1981) is correct.
In this note, we address the doubts of Singh (2001) and Gupta and Shabbir (2008) on the transformations of auxiliary variables by adding unit free constants. The original contribution by Sisodia and Dwivedi (1981) is correct.
△ Less
Submitted 15 October, 2012;
originally announced October 2012.
-
A general family of dual to ratio-cum-product estimator in sample surveys
Authors:
Rajesh Singh,
Mukesh Kumar,
P. Chauhan,
N. Sawan,
S. Florentin
Abstract:
This paper presents a family of dual to ratio-cum-product estimators for the finite population mean. Under simple random sampling without replacement (SRSWOR) scheme, expressions of the bias and mean-squared error (MSE) up to the first order of approximation are derived. We show that the proposed family is more efficient than usual unbiased estimator, ratio estimator, product estimator, Singh esti…
▽ More
This paper presents a family of dual to ratio-cum-product estimators for the finite population mean. Under simple random sampling without replacement (SRSWOR) scheme, expressions of the bias and mean-squared error (MSE) up to the first order of approximation are derived. We show that the proposed family is more efficient than usual unbiased estimator, ratio estimator, product estimator, Singh estimator (1967), Srivenkataramana (1980) and Bandyopadhyaya estimator (1980) and Singh et.al. (2005) estimator. An empirical study is carried out to illustrate the performance of the constructed estimator over others.
△ Less
Submitted 31 May, 2013; v1 submitted 10 October, 2012;
originally announced October 2012.
-
On the proficient use of GEV distribution: a case study of subtropical monsoon region in India
Authors:
Ripunjai K. Shukla,
M. Trivedi,
Manoj Kumar
Abstract:
The paper deals with the probabilistic estimates of extreme maximum rainfall (Annual basis) in the Ranchi, Jharkhand (India). Extreme Value Distribution family models are tried to capture the uncertainty of data and finally Generalized Extreme Value (GEV) distribution model is found as the best fitted distribution model. The GEV model satisfied the selection criteria [Anderson-Darling test (A-D te…
▽ More
The paper deals with the probabilistic estimates of extreme maximum rainfall (Annual basis) in the Ranchi, Jharkhand (India). Extreme Value Distribution family models are tried to capture the uncertainty of data and finally Generalized Extreme Value (GEV) distribution model is found as the best fitted distribution model. The GEV model satisfied the selection criteria [Anderson-Darling test (A-D test or Goodness of fit test) and Normality test (Q-Q plot)], which are adopted under the present study. The return levels are estimated for 5, 10, 50, 100 and 200 years which are consistently increasing for long run in future.
△ Less
Submitted 3 March, 2012;
originally announced March 2012.