Search | arXiv e-print repository

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: To appear in ICML 2021

arXiv:2104.08652 [pdf, ps, other]

doi 10.1017/jfm.2021.359

Significance of the strain-dominated region around a vortex on induced aerodynamic loads

Authors: Karthik Menon, Rajat Mittal

Abstract: The ability of vortices to induce aerodynamic loads on proximal surfaces plays a significant role in a wide variety of flows. However, most studies of vortex-induced effects primarily focus on analyzing the influence of the rotation-dominated cores of vortices. In this work, we show that not only are vortices in viscous flows surrounded by strain-dominated regions, but that these regions are dynam… ▽ More The ability of vortices to induce aerodynamic loads on proximal surfaces plays a significant role in a wide variety of flows. However, most studies of vortex-induced effects primarily focus on analyzing the influence of the rotation-dominated cores of vortices. In this work, we show that not only are vortices in viscous flows surrounded by strain-dominated regions, but that these regions are dynamically important and can sometimes even dictate the induced aerodynamic loads. We demonstrate this for a pitching airfoil, which exhibits dynamic stall and generates several force-inducing vortices. Using a data-driven force partitioning method, we quantify the influence of vortices as well as vortex-associated strain to show that our current understanding of vortex-dominated phenomena, such as dynamic stall, is incomplete without considering the substantial effect of strain-dominated regions that are associated with vortices. △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:2104.07932 [pdf, other]

Interval-censored Hawkes processes

Authors: Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

Abstract: Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point proce… ▽ More Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point process log-likelihood) cannot be computed without exact event times. Furthermore, it does not have the independent increments property to use the Poisson likelihood. This work builds a novel point process, a set of tools, and approximations for fitting Hawkes processes within interval-censored data scenarios. First, we define the Mean Behavior Poisson process (MBPP), a novel Poisson process with a direct parameter correspondence to the popular self-exciting Hawkes process. We fit MBPP in the interval-censored setting using an interval-censored Poisson log-likelihood (IC-LL). We use the parameter equivalence to uncover the parameters of the associated Hawkes process. Second, we introduce two novel exogenous functions to distinguish the exogenous from the endogenous events. We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes. Third, we provide several approximation methods to estimate the intensity and compensator function of MBPP when no analytical solution exists. Fourth and finally, we connect the interval-censored loss of MBPP to a broader class of Bregman divergence-based functions. Using the connection, we show that the popularity estimation algorithm Hawkes Intensity Process (HIP) is a particular case of the MBPP. We verify our models through empirical testing on synthetic data and real-world data. △ Less

Submitted 25 November, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

Journal ref: Journal of Machine Learning Research, 23(338):1-84, 2022. https://jmlr.org/papers/v23/21-0917.html

arXiv:2104.07274 [pdf, ps, other]

Pattern avoidance and dominating compositions

Authors: Krishna Menon, Anurag Singh

Abstract: Jelínek, Mansour, and Shattuck studied Wilf-equivalence among pairs of patterns of the form $\{σ,τ\}$ where $σ$ is a set partition of size $3$ with at least two blocks. They obtained an upper bound for the number of Wilf-equivalence classes for such pairs. We show that their upper bound is the exact number of equivalence classes, thus solving a problem posed by them. Jelínek, Mansour, and Shattuck studied Wilf-equivalence among pairs of patterns of the form $\{σ,τ\}$ where $σ$ is a set partition of size $3$ with at least two blocks. They obtained an upper bound for the number of Wilf-equivalence classes for such pairs. We show that their upper bound is the exact number of equivalence classes, thus solving a problem posed by them. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: Feedback welcome

MSC Class: 05A15; 05A18; 05A19

Journal ref: ECA 2:1 (2022) Article S2R4, http://ecajournal.haifa.ac.il/Volume2022/ECA2022_S2A4.pdf

arXiv:2103.03865 [pdf, ps, other]

A combinatorial statistic for labeled threshold graphs

Authors: Priyavrat Deshpande, Krishna Menon, Anurag Singh

Abstract: Consider the collection of hyperplanes in $\mathbb{R}^n$ whose defining equations are given by $\{x_i + x_j = 0\mid 1\leq i<j\leq n\}$. This arrangement is called the threshold arrangement since its regions are in bijection with labeled threshold graphs on $n$ vertices. Zaslavsky's theorem implies that the number of regions of this arrangement is the sum of coefficients of the characteristic polyn… ▽ More Consider the collection of hyperplanes in $\mathbb{R}^n$ whose defining equations are given by $\{x_i + x_j = 0\mid 1\leq i<j\leq n\}$. This arrangement is called the threshold arrangement since its regions are in bijection with labeled threshold graphs on $n$ vertices. Zaslavsky's theorem implies that the number of regions of this arrangement is the sum of coefficients of the characteristic polynomial of the arrangement. In the present article we give a combinatorial meaning to these coefficients as the number of labeled threshold graphs with a certain property, thus answering a question posed by Stanley. △ Less

Submitted 7 August, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: Minor changes, final version

MSC Class: 52C35; 05C30; 11B73

Journal ref: ECA 1:3 (2021) Article S2R22, http://ecajournal.haifa.ac.il/Volume2021/ECA2021_S2A22.pdf

arXiv:2102.06849 [pdf, other]

Distilling Double Descent

Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that, even when the teacher model is highly overparameterized, and provides \emph{hard} labels, using a very large held-out unlabeled dataset to train the student model can result in a model that outperforms more "traditional" approaches. Our explanation for this phenomenon is based on recent work on "double descent". It has been observed that, once a model's complexity roughly exceeds the amount required to memorize the training data, increasing the complexity \emph{further} can, counterintuitively, result in \emph{better} generalization. Researchers have identified several settings in which it takes place, while others have made various attempts to explain it (thus far, with only partial success). In contrast, we avoid these questions, and instead seek to \emph{exploit} this phenomenon by demonstrating that a highly-overparameterized teacher can avoid overfitting via double descent, while a student trained on a larger independent dataset labeled by this teacher will avoid overfitting due to the size of its training set. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2101.12060 [pdf, ps, other]

Counting regions of the boxed threshold arrangement

Authors: Priyavrat Deshpande, Krishna Menon, Anurag Singh

Abstract: In this paper we consider the hyperplane arrangement in $\mathbb{R}^n$ whose hyperplanes are $\{x_i + x_j = 1\mid 1\leq i < j\leq n\}\cup \{x_i=0,1\mid 1\leq i\leq n\}$. We call it the \emph{boxed threshold arrangement} since we show that the bounded regions of this arrangement are contained in an $n$-cube and are in one-to-one correspondence with the labeled threshold graphs on $n$ vertices. The… ▽ More In this paper we consider the hyperplane arrangement in $\mathbb{R}^n$ whose hyperplanes are $\{x_i + x_j = 1\mid 1\leq i < j\leq n\}\cup \{x_i=0,1\mid 1\leq i\leq n\}$. We call it the \emph{boxed threshold arrangement} since we show that the bounded regions of this arrangement are contained in an $n$-cube and are in one-to-one correspondence with the labeled threshold graphs on $n$ vertices. The problem of counting regions of this arrangement was studied earlier by Joungmin Song. He determined the characteristic polynomial of this arrangement by relating its coefficients to the count of certain graphs. Here, we provide bijective arguments to determine the number of regions. In particular, we construct certain signed partitions of the set $\{-n,\dots, n\}\setminus\{0\}$ and also construct colored threshold graphs on $n$ vertices and show that both these objects are in bijection with the regions of the boxed threshold arrangement. We independently count these objects and provide closed form formula for the number of regions. △ Less

Submitted 24 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: Typos and grammatical errors fixed, OEIS references added, final version. Will appear in Journal of Integer Sequeces

MSC Class: 52C35; 32S22; 05C30; 11B68

arXiv:2011.04632 [pdf, other]

doi 10.1016/j.jcp.2021.110515

Quantitative analysis of the kinematics and induced aerodynamic loading of individual vortices in vortex-dominated flows: a computation and data-driven approach

Authors: Karthik Menon, Rajat Mittal

Abstract: A physics-based data-driven computational framework for the quantitative analysis of vortex kinematics and vortex-induced loads in vortex-dominated problems is presented. Such flows are characterized by the dominant influence of a small number of vortex structures, but the complexity of these flows makes it difficult to conduct a quantitative analysis of this influence at the level of individual v… ▽ More A physics-based data-driven computational framework for the quantitative analysis of vortex kinematics and vortex-induced loads in vortex-dominated problems is presented. Such flows are characterized by the dominant influence of a small number of vortex structures, but the complexity of these flows makes it difficult to conduct a quantitative analysis of this influence at the level of individual vortices. The method presented here combines machine learning-inspired clustering methods with a rigorous mathematical partitioning of aerodynamic loads to enable detailed quantitative analysis of vortex kinematics and vortex-induced aerodynamic loads. We demonstrate the utility of this approach by applying it to an ensemble of 165 distinct Navier-Stokes simulations of flow past a sinusoidally pitching airfoil. Insights enabled by the current methodology include the identification of a period-doubling route to chaos in this flow, and the precise quantification of the role that leading-edge vortices play in driving aeroelastic pitch oscillations. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2010.07447 [pdf, ps, other]

Semantic Label Smoothing for Sequence to Sequence Problems

Authors: Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Abstract: Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approache… ▽ More Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approaches for seq2seq settings either do token level smoothing, or smooth over sequences generated by randomly substituting tokens in the target sequence. Unlike these works, in this paper, we propose a technique that smooths over \emph{well formed} relevant sequences that not only have sufficient n-gram overlap with the target sequence, but are also \emph{semantically similar}. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets. △ Less

Submitted 14 October, 2020; originally announced October 2020.

arXiv:2010.02568 [pdf, other]

SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

Authors: Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie

Abstract: Most work on multi-document summarization has focused on generic summarization of information present in each individual document set. However, the under-explored setting of update summarization, where the goal is to identify the new information present in each set, is of equal practical interest (e.g., presenting readers with updates on an evolving news topic). In this work, we present SupMMD, a… ▽ More Most work on multi-document summarization has focused on generic summarization of information present in each individual document set. However, the under-explored setting of update summarization, where the goal is to identify the new information present in each set, is of equal practical interest (e.g., presenting readers with updates on an evolving news topic). In this work, we present SupMMD, a novel technique for generic and update summarization based on the maximum mean discrepancy from kernel two-sample testing. SupMMD combines both supervised learning for salience and unsupervised learning for coverage and diversity. Further, we adapt multiple kernel learning to make use of similarity across multiple information sources (e.g., text features and knowledge based concepts). We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 15 pages

Journal ref: EMNLP 2020

arXiv:2007.07314 [pdf, other]

Long-tail learning via logit adjustment

Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance. △ Less

Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: Published as a conference paper in ICLR 2021

arXiv:2006.11649 [pdf, ps, other]

doi 10.1017/jfm.2020.854

On the initiation and sustenance of flow-induced vibration of cylinders: insights from force partitioning

Authors: Karthik Menon, Rajat Mittal

Abstract: The focus of this work is to dissect the physical mechanisms that drive and sustain flow-induced, transverse vibrations of cylinders. The influence of different mechanisms is quantified by using a method to partition the fluid dynamic force on the cylinder into distinct, physically relevant components. In conjunction with this force partitioning, calculations of the energy extracted by the oscilla… ▽ More The focus of this work is to dissect the physical mechanisms that drive and sustain flow-induced, transverse vibrations of cylinders. The influence of different mechanisms is quantified by using a method to partition the fluid dynamic force on the cylinder into distinct, physically relevant components. In conjunction with this force partitioning, calculations of the energy extracted by the oscillating body from the flow are used to make a direct connection between the phenomena responsible for force generation and their effect on driving flow-induced oscillations. These tools are demonstrated in a study of the effect of cylinder shape on flow-induced vibrations. Relatively small increases in cylinder aspect-ratio are found to have a significant influence on the amplitude of oscillation, resulting in a large drop in oscillation amplitude at reduced velocities that correspond to the upper range of the synchronization regime. By mapping out the energy transfer between the fluid and structure as a function of aspect-ratio, we identify the existence of a low-amplitude stationary state as the cause of the drop in amplitude. Partitioning the fluid dynamic forces on cylinders of varying aspect-ratio then allows us to uncover the physical mechanisms behind the appearance of the underlying bifurcation. The analysis also suggests that while vortex shedding in the wake is necessary to initiate oscillations, it is the vorticity associated with the boundary layer over the cylinder that is responsible for the sustenance of flow-induced vibrations. △ Less

Submitted 5 November, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

Journal ref: Journal of Fluid Mechanics, 907, A37 (2021)

arXiv:2005.10419 [pdf, other]

Why distillation helps: a statistical perspective

Authors: Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

Abstract: Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation w… ▽ More Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques. Our core observation is that the teacher seeks to estimate the underlying (Bayes) class-probability function. Building on this, we establish a fundamental bias-variance tradeoff in the student's objective: this quantifies how approximate knowledge of these class-probabilities can significantly aid learning. Finally, we show how distillation complements existing negative mining techniques for extreme multiclass retrieval, and propose a unified objective which combines these ideas. △ Less

Submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.11460 [pdf, other]

Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients

Authors: Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S

Abstract: Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - Internati… ▽ More Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - International Normalised Ratio (PT-INR) value obtained through a blood test. Our work aimed at predicting the maintenance dosage of warfarin, the present most widely recommended anticoagulant drug, using the de-identified medical data collected from 109 patients from Kerala. A Support Vector Machine (SVM) Regression model was built to predict the maintenance dosage of warfarin, for patients who have been undergoing treatment from a physician and have reached stable INR values between 2.0 and 4.0. △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2004.10915 [pdf, other]

Doubly-stochastic mining for heterogeneous retrieval

Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example. The second challenge concerns uniformity: one ideally wants good performance on each subpopulation. While several solutions have been proposed to address the first challenge, the second challenge has received relatively less attention. In this paper, we propose doubly-stochastic mining (S2M ), a stochastic optimization technique that addresses both challenges. In each iteration of S2M, we compute a per-example loss based on a subset of hardest labels, and then compute the minibatch loss based on the hardest examples. We show theoretically and empirically that by focusing on the hardest examples, S2M ensures that all data subpopulations are modelled well. △ Less

Submitted 22 April, 2020; originally announced April 2020.

arXiv:2004.10342 [pdf, ps, other]

Federated Learning with Only Positive Labels

Authors: Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

Abstract: We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentra… ▽ More We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentralized learning such as the distributed SGD or Federated Averaging may lead to trivial or extremely poor classifiers. In particular, for the embedding based classifiers, all the class embeddings might collapse to a single point. To address this problem, we propose a generic framework for training with only positive labels, namely Federated Averaging with Spreadout (FedAwS), where the server imposes a geometric regularizer after each round to encourage classes to be spreadout in the embedding space. We show, both theoretically and empirically, that FedAwS can almost match the performance of conventional learning where users have access to negative labels. We further extend the proposed method to the settings with large output spaces. △ Less

Submitted 21 April, 2020; originally announced April 2020.

arXiv:2003.12017 [pdf]

Prediction of number of cases expected and estimation of the final size of coronavirus epidemic in India using the logistic model and genetic algorithm

Authors: Ganesh Kumar M, Soman K. P, Gopalakrishnan E. A, Vijay Krishna Menon, Sowmya V

Abstract: In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India. In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India. △ Less

Submitted 26 March, 2020; originally announced March 2020.

arXiv:2003.02819 [pdf, other]

Does label smoothing mitigate label noise?

Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --… ▽ More Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --- being equivalent to injecting symmetric noise to the labels --- we show how it relates to a general family of loss-correction techniques from the label noise literature. Building on this connection, we show that label smoothing is competitive with loss-correction under label noise. Further, we show that when distilling models from noisy data, label smoothing of the teacher is beneficial; this is in contrast to recent findings for noise-free problems, and sheds further light on settings where label smoothing is beneficial. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:2003.02402 [pdf, other]

doi 10.1016/j.jfluidstructs.2020.103078

Aeroelastic response of an airfoil to gusts: Prediction and control strategies from computed energy maps

Authors: Karthik Menon, Rajat Mittal

Abstract: A method to predict the aeroelastic pitch response of an airfoil to gusts is presented. The prediction is based on energy maps generated by high-fidelity fluid dynamic simulations of the airfoil with prescribed pitch oscillations. The energy maps quantify the exchange of energy between the pitching airfoil and the flow, and serve as manifolds over which the dynamical states of aeroelastic airfoil… ▽ More A method to predict the aeroelastic pitch response of an airfoil to gusts is presented. The prediction is based on energy maps generated by high-fidelity fluid dynamic simulations of the airfoil with prescribed pitch oscillations. The energy maps quantify the exchange of energy between the pitching airfoil and the flow, and serve as manifolds over which the dynamical states of aeroelastic airfoil system grow, decay and attain stationary states. This method allows us to study the full nonlinear response of the system to large gusts, and predict the growth and saturation of aeroelastic pitch instabilities. We also show that the manifold topology in these maps can be used to make informed modifications to the system parameters in order to control the response to gusts. △ Less

Submitted 26 October, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Journal ref: Journal of Fluids and Structures 97 (2020) 103078

arXiv:2002.03555 [pdf, other]

Supervised Learning: No Loss No Cry

Authors: Richard Nock, Aditya Krishna Menon

Abstract: Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss funct… ▽ More Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss function} for a downstream task (e.g., classification) has garnered recent interest. However, works in this area have been generally empirical in nature. In this paper, we revisit the {\sc SLIsotron} algorithm of Kakade et al. (2011) through a novel lens, derive a generalisation based on Bregman divergences, and show how it provides a principled procedure for learning the loss. In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses. By interpreting this through the lens of \emph{proper losses}, we derive a generalisation of {\sc SLIsotron} based on Bregman divergences. The resulting {\sc BregmanTron} algorithm jointly learns the loss along with the classifier. It comes equipped with a simple guarantee of convergence for the loss it learns, and its set of possible outputs comes with a guarantee of agnostic approximability of Bayes rule. Experiments indicate that the {\sc BregmanTron} substantially outperforms the {\sc SLIsotron}, and that the loss it learns can be minimized by other algorithms for different tasks, thereby opening the interesting problem of \textit{loss transfer} between domains. △ Less

Submitted 10 February, 2020; originally announced February 2020.

ACM Class: I.2.6

arXiv:1911.05768 [pdf, other]

doi 10.1016/j.jfluidstructs.2020.102886

Dynamic mode decomposition based analysis of flow over a sinusoidally pitching airfoil

Authors: Karthik Menon, Rajat Mittal

Abstract: Dynamic mode decomposition (DMD) has proven to be a valuable tool for the analysis of complex flow-fields but the application of this technique to flows with moving boundaries is not straightforward. This is due to the difficulty in accounting in the DMD formulation, for a body of non-zero thickness moving through the field of interest. This work presents a method for decomposing the flow on or ne… ▽ More Dynamic mode decomposition (DMD) has proven to be a valuable tool for the analysis of complex flow-fields but the application of this technique to flows with moving boundaries is not straightforward. This is due to the difficulty in accounting in the DMD formulation, for a body of non-zero thickness moving through the field of interest. This work presents a method for decomposing the flow on or near a moving boundary by a change of reference frame, followed by a correction to the computed modes that is determined by the frequency spectrum of the motion. The correction serves to recover the modes of the underlying flow dynamics, while removing the effect of change in reference frame. This method is applied to flow over sinusoidally pitching airfoils, and the DMD analysis is used to derive useful insights regarding flow-induced pitch oscillations of these airfoils. △ Less

Submitted 24 July, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Journal ref: Journal of Fluids and Structures, 94, 102886 (2020)

arXiv:1909.09667 [pdf, other]

Online Hierarchical Clustering Approximations

Authors: Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

Abstract: Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this p… ▽ More Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this paper, we consider hierarchical clustering in the online setting, where points arrive one at a time. We propose two algorithms that seek to optimize the Moseley and Wang (MW) revenue function, a variant of the Dasgupta cost. These algorithms offer different tradeoffs between efficiency and MW revenue performance. The first algorithm, OTD, is a highly efficient Online Top Down algorithm which provably achieves a 1/3-approximation to the MW revenue under a data separation assumption. The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice. We show that OHAC approximates offline HAC by leveraging a novel split-merge procedure. We empirically show that OTD and OHAC offer significant efficiency and cluster quality gains respectively over baselines. △ Less

Submitted 20 September, 2019; originally announced September 2019.

Comments: 17 pages, 3 figures

arXiv:1905.10394 [pdf]

Enhanced solar evaporation using a photo-thermal umbrella: towards zero liquid discharge wastewater management

Authors: Akanksha K. Menon, Iwan Haechler, Sumanjeet Kaur, Sean Lubner, Ravi S. Prasher

Abstract: Rising water demands and depleting freshwater resources have brought desalination and wastewater treatment technologies to the forefront. For sustainable water management, there is a global push towards Zero Liquid Discharge (ZLD) with the goal to maximize water recovery for reuse, and to produce solid waste that lowers the environmental impact of wastewater disposal. Evaporation ponds harvest sol… ▽ More Rising water demands and depleting freshwater resources have brought desalination and wastewater treatment technologies to the forefront. For sustainable water management, there is a global push towards Zero Liquid Discharge (ZLD) with the goal to maximize water recovery for reuse, and to produce solid waste that lowers the environmental impact of wastewater disposal. Evaporation ponds harvest solar energy as heat for ZLD, but require large land areas due to low evaporation rates. Here, we demonstrate a passive and non-contact approach to enhance evaporation by more than 100% using a photo-thermal umbrella. By converting sunlight into only mid-infrared radiation where water is strongly absorbing, efficient utilization of solar energy and heat localization at the water surface through radiative coupling are achieved. The non-contact nature of the device makes it uniquely suited to treat a wide range of wastewater, and the use of commercially available materials enables a potentially low cost and scalable technology for the sustainable disposal of wastewater, with the added benefit of salt recovery. △ Less

Submitted 24 May, 2019; originally announced May 2019.

arXiv:1901.10837 [pdf, other]

Noise-tolerant fair classification

Authors: Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

Abstract: Fairness-aware learning involves designing algorithms that do not discriminate with respect to some sensitive feature (e.g., race or gender). Existing work on the problem operates under the assumption that the sensitive feature available in one's training sample is perfectly reliable. This assumption may be violated in many real-world cases: for example, respondents to a survey may choose to conce… ▽ More Fairness-aware learning involves designing algorithms that do not discriminate with respect to some sensitive feature (e.g., race or gender). Existing work on the problem operates under the assumption that the sensitive feature available in one's training sample is perfectly reliable. This assumption may be violated in many real-world cases: for example, respondents to a survey may choose to conceal or obfuscate their group identity out of fear of potential discrimination. This poses the question of whether one can still learn fair classifiers given noisy sensitive features. In this paper, we answer the question in the affirmative: we show that if one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity we only need to change the desired fairness-tolerance. The requisite tolerance can be estimated by leveraging existing noise-rate estimators from the label noise literature. We finally show that our procedure is empirically effective on two case-studies involving sensitive feature censoring. △ Less

Submitted 9 January, 2020; v1 submitted 30 January, 2019; originally announced January 2019.

arXiv:1901.08665 [pdf, other]

Fairness risk measures

Authors: Robert C. Williamson, Aditya Krishna Menon

Abstract: Ensuring that classifiers are non-discriminatory or fair with respect to a sensitive feature (e.g., race or gender) is a topical problem. Progress in this task requires fixing a definition of fairness, and there have been several proposals in this regard over the past few years. Several of these, however, assume either binary sensitive features (thus precluding categorical or real-valued sensitive… ▽ More Ensuring that classifiers are non-discriminatory or fair with respect to a sensitive feature (e.g., race or gender) is a topical problem. Progress in this task requires fixing a definition of fairness, and there have been several proposals in this regard over the past few years. Several of these, however, assume either binary sensitive features (thus precluding categorical or real-valued sensitive groups), or result in non-convex objectives (thus adversely affecting the optimisation landscape). In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective. The key idea is to enforce that the expected losses (or risks) across each subgroup induced by the sensitive feature are commensurate. We show how this relates to the rich literature on risk measures from mathematical finance. As a special case, this leads to a new convex fairness-aware objective based on minimising the conditional value at risk (CVaR). △ Less

Submitted 24 January, 2019; originally announced January 2019.

arXiv:1901.06125 [pdf, other]

Cold-start Playlist Recommendation with Multitask Learning

Authors: Dawei Chen, Cheng Soon Ong, Aditya Krishna Menon

Abstract: Playlist recommendation involves producing a set of songs that a user might enjoy. We investigate this problem in three cold-start scenarios: (i) cold playlists, where we recommend songs to form new personalised playlists for an existing user; (ii) cold users, where we recommend songs to form new playlists for a new user; and (iii) cold songs, where we recommend newly released songs to extend user… ▽ More Playlist recommendation involves producing a set of songs that a user might enjoy. We investigate this problem in three cold-start scenarios: (i) cold playlists, where we recommend songs to form new personalised playlists for an existing user; (ii) cold users, where we recommend songs to form new playlists for a new user; and (iii) cold songs, where we recommend newly released songs to extend users' existing playlists. We propose a flexible multitask learning method to deal with all three settings. The method learns from user-curated playlists, and encourages songs in a playlist to be ranked higher than those that are not by minimising a bipartite ranking loss. Inspired by an equivalence between bipartite ranking and binary classification, we show how one can efficiently approximate an optimal solution of the multitask learning objective by minimising a classification loss. Empirical results on two real playlist datasets show the proposed approach has good performance for cold-start playlist recommendation. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Comments: 15 pages

MSC Class: 68T05

arXiv:1812.02171 [pdf, other]

doi 10.1609/aaai.v33i01.330120

Comparative Document Summarisation via Classification

Authors: Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

Abstract: This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summar… ▽ More This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 14 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points. △ Less

Submitted 2 January, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: Accepted for AAAI 2019

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019

arXiv:1810.04327 [pdf, other]

Complementary-Label Learning for Arbitrary Losses and Models

Authors: Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

Abstract: In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to. The goal of this paper is to derive a novel framework of complementary-label learning with an unbiased estimator of… ▽ More In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to. The goal of this paper is to derive a novel framework of complementary-label learning with an unbiased estimator of the classification risk, for arbitrary losses and models---all existing methods have failed to achieve this goal. Not only is this beneficial for the learning stage, it also makes model/hyper-parameter selection (through cross-validation) possible without the need of any ordinarily labeled validation data, while using any linear/non-linear models or convex/non-convex loss functions. We further improve the risk estimator by a non-negative correction and gradient ascent trick, and demonstrate its superiority through experiments. △ Less

Submitted 18 November, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: accepted to ICML 2019 (Added errata on Nov. 19, 2019)

arXiv:1808.10585 [pdf, other]

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

Authors: Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

Abstract: Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. We prove that it is impossible to estimate the risk of an arbitrary binary classifier in an unbiased manner given a single set of U data, but it b… ▽ More Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. We prove that it is impossible to estimate the risk of an arbitrary binary classifier in an unbiased manner given a single set of U data, but it becomes possible given two sets of U data with different class priors. These two facts answer a fundamental question---what the minimal supervision is for training any binary classifier from only U data. Following these findings, we propose an ERM-based learning method from two sets of U data, and then prove it is consistent. Experiments demonstrate the proposed method could train deep models and outperform state-of-the-art methods for learning from two sets of U data. △ Less

Submitted 12 March, 2019; v1 submitted 30 August, 2018; originally announced August 2018.

arXiv:1806.02977 [pdf, other]

Monge blunts Bayes: Hardness Results for Adversarial Training

Authors: Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

Abstract: The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learnin… ▽ More The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learning, a non-trivial problem which entails at a minimum the choice of loss and classifiers. We suggest a formal answer for losses that satisfy the minimal statistical requirement of being proper. We pin down a simple sufficient property for any given class of adversaries to be detrimental to learning, involving a central measure of "harmfulness" which generalizes the well-known class of integral probability metrics. A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss. When classifiers are Lipschitz -- a now popular approach in adversarial training --, this optimisation resorts to optimal transport to make a low-budget compression of class marginals. Toy experiments reveal a finding recently separately observed: training against a sufficiently budgeted adversary of this kind improves generalization. △ Less

Submitted 7 May, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

ACM Class: I.2.6

arXiv:1802.06360 [pdf, other]

Anomaly Detection using One-Class Neural Networks

Authors: Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

Abstract: We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the O… ▽ More We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the OC-NN objective and is thus customized for anomaly detection. This is a departure from other approaches which use a hybrid approach of learning deep features using an autoencoder and then feeding the features into a separate anomaly detection method like one-class SVM (OC-SVM). The hybrid OC-SVM approach is sub-optimal because it is unable to influence representational learning in the hidden layers. A comprehensive set of experiments demonstrate that on complex data sets (like CIFAR and GTSRB), OC-NN performs on par with state-of-the-art methods and outperformed conventional shallow methods in some scenarios. △ Less

Submitted 10 January, 2019; v1 submitted 18 February, 2018; originally announced February 2018.

arXiv:1711.10219 [pdf, other]

doi 10.1103/PhysRevA.98.022130

Wave Particle Duality in Asymmetric Beam Interference

Authors: Keerthy K. Menon, Tabish Qureshi

Abstract: It is well known that in a two-slit interference experiment, acquiring which-path information about the particle, leads to a degrading of the interference. It is argued that path-information has a meaning only when one can umabiguously tell which slit the particle went through. Using this idea, two duality relations are derived for the general case where the two paths may not be equally probable,… ▽ More It is well known that in a two-slit interference experiment, acquiring which-path information about the particle, leads to a degrading of the interference. It is argued that path-information has a meaning only when one can umabiguously tell which slit the particle went through. Using this idea, two duality relations are derived for the general case where the two paths may not be equally probable, and the two slits may be of unequal widths. These duality relations, which are inequalities in general, saturate for all pure states. Earlier known results are recovered in suitable limit. △ Less

Submitted 23 August, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: Closest to the published version

Journal ref: Phys. Rev. A 98, 022130 (2018)

arXiv:1708.05165 [pdf, other]

Revisiting revisits in trajectory recommendation

Authors: Aditya Krishna Menon, Dawei Chen, Lexing Xie, Cheng Soon Ong

Abstract: Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit. It is strongly desirable for the recommended sequence to avoid loops, as tourists typically would not wish to revisit the same location. Given some learned model that scores sequences, how can we then find the highest-scoring sequence that is loop-free? This paper studies this problem, w… ▽ More Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit. It is strongly desirable for the recommended sequence to avoid loops, as tourists typically would not wish to revisit the same location. Given some learned model that scores sequences, how can we then find the highest-scoring sequence that is loop-free? This paper studies this problem, with three contributions. First, we detail three distinct approaches to the problem -- graph-based heuristics, integer linear programming, and list extensions of the Viterbi algorithm -- and qualitatively summarise their strengths and weaknesses. Second, we explicate how two ostensibly different approaches to the list Viterbi algorithm are in fact fundamentally identical. Third, we conduct experiments on real-world trajectory recommendation datasets to identify the tradeoffs imposed by each of the three approaches. Overall, our results indicate that a greedy graph-based heuristic offer excellent performance and runtime, leading us to recommend its use for removing loops at prediction time. △ Less

Submitted 17 August, 2017; originally announced August 2017.

Comments: 6 pages

MSC Class: 68T05

arXiv:1707.04385 [pdf, other]

f-GANs in an Information Geometric Nutshell

Authors: Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

Abstract: Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints… ▽ More Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator's design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture. In particular, we show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the $f$-GAN game. This result holds given a sufficient condition on \textit{activation functions} --- which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator. △ Less

Submitted 14 July, 2017; originally announced July 2017.

ACM Class: I.2.6; I.5.1

arXiv:1707.01627 [pdf, other]

PathRec: Visual Analysis of Travel Route Recommendations

Authors: Dawei Chen, Dongwoo Kim, Lexing Xie, Minjeong Shin, Aditya Krishna Menon, Cheng Soon Ong, Iman Avazpour, John Grundy

Abstract: We present an interactive visualisation tool for recommending travel trajectories. This system is based on new machine learning formulations and algorithms for the sequence recommendation problem. The system starts from a map-based overview, taking an interactive query as starting point. It then breaks down contributions from different geographical and user behavior features, and those from indivi… ▽ More We present an interactive visualisation tool for recommending travel trajectories. This system is based on new machine learning formulations and algorithms for the sequence recommendation problem. The system starts from a map-based overview, taking an interactive query as starting point. It then breaks down contributions from different geographical and user behavior features, and those from individual points-of-interest versus pairs of consecutive points on a route. The system also supports detailed quantitative interrogation by comparing a large number of features for multiple points. Effective trajectory visualisations can potentially benefit a large cohort of online map users and assist their decision-making. More broadly, the design of this system can inform visualisations of other structured prediction tasks, such as for sequences or trees. △ Less

Submitted 18 July, 2017; v1 submitted 5 July, 2017; originally announced July 2017.

Comments: 3 pages with appendix

MSC Class: 68T05; 68U35

arXiv:1706.09067 [pdf, other]

Structured Recommendation

Authors: Dawei Chen, Lexing Xie, Aditya Krishna Menon, Cheng Soon Ong

Abstract: Current recommender systems largely focus on static, unstructured content. In many scenarios, we would like to recommend content that has structure, such as a trajectory of points-of-interests in a city, or a playlist of songs. Dubbed Structured Recommendation, this problem differs from the typical structured prediction problem in that there are multiple correct answers for a given input. Motivate… ▽ More Current recommender systems largely focus on static, unstructured content. In many scenarios, we would like to recommend content that has structure, such as a trajectory of points-of-interests in a city, or a playlist of songs. Dubbed Structured Recommendation, this problem differs from the typical structured prediction problem in that there are multiple correct answers for a given input. Motivated by trajectory recommendation, we focus on sequential structures but in contrast to classical Viterbi decoding we require that valid predictions are sequences with no repeated elements. We propose an approach to sequence recommendation based on the structured support vector machine. For prediction, we modify the inference procedure to avoid predicting loops in the sequence. For training, we modify the objective function to account for the existence of multiple ground truths for a given input. We also modify the loss-augmented inference procedure to exclude the known ground truths. Experiments on real-world trajectory recommendation datasets show the benefits of our approach over existing, non-structured recommendation approaches. △ Less

Submitted 27 June, 2017; originally announced June 2017.

Comments: 18 pages

MSC Class: 68T05

arXiv:1705.09055 [pdf, other]

The cost of fairness in classification

Authors: Aditya Krishna Menon, Robert C. Williamson

Abstract: We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs. First, we relate two existing fairness measures to cost-sensitive risks. Second, we show that for cost-sensitive classification and fairness measures, the optimal classifier is an instance-dependent thresholding of the class-probab… ▽ More We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs. First, we relate two existing fairness measures to cost-sensitive risks. Second, we show that for cost-sensitive classification and fairness measures, the optimal classifier is an instance-dependent thresholding of the class-probability function. Third, we show how the tradeoff between accuracy and fairness is determined by the alignment between the class-probabilities for the target and sensitive features. Underpinning our analysis is a general framework that casts the problem of learning with a fairness requirement as one of minimising the difference of two statistical risks. △ Less

Submitted 25 May, 2017; originally announced May 2017.

arXiv:1704.06743 [pdf, other]

Robust, Deep and Inductive Anomaly Detection

Authors: Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

Abstract: PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique. However, it is limited in this regard by being sensitive to gross perturbations of the input, and by seeking a linear subspace that captures normal behaviour. The first issue has been dealt with by robust PCA, a variant of PCA that explicitly allows for some dat… ▽ More PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique. However, it is limited in this regard by being sensitive to gross perturbations of the input, and by seeking a linear subspace that captures normal behaviour. The first issue has been dealt with by robust PCA, a variant of PCA that explicitly allows for some data points to be arbitrarily corrupted, however, this does not resolve the second issue, and indeed introduces the new issue that one can no longer inductively find anomalies on a test set. This paper addresses both issues in a single model, the robust autoencoder. This method learns a nonlinear subspace that captures the majority of data points, while allowing for some data to have arbitrary corruption. The model is simple to train and leverages recent advances in the optimisation of deep neural networks. Experiments on a range of real-world datasets highlight the model's effectiveness. △ Less

Submitted 30 July, 2017; v1 submitted 22 April, 2017; originally announced April 2017.

Comments: Accepted ECML PKDD 2017 Skopje, Macedonia 18-22 September the European Conference On Machine Learning & Principles and Practice of Knowledge Discovery

arXiv:1704.05611 [pdf]

Dependency resolution and semantic mining using Tree Adjoining Grammars for Tamil Language

Authors: Vijay Krishna Menon, S Rajendran, M Anandkumar, K P Soman

Abstract: Tree adjoining grammars (TAGs) provide an ample tool to capture syntax of many Indian languages. Tamil represents a special challenge to computational formalisms as it has extensive agglutinative morphology and a comparatively difficult argument structure. Modelling Tamil syntax and morphology using TAG is an interesting problem which has not been in focus even though TAGs are over 4 decades old,… ▽ More Tree adjoining grammars (TAGs) provide an ample tool to capture syntax of many Indian languages. Tamil represents a special challenge to computational formalisms as it has extensive agglutinative morphology and a comparatively difficult argument structure. Modelling Tamil syntax and morphology using TAG is an interesting problem which has not been in focus even though TAGs are over 4 decades old, since its inception. Our research with Tamil TAGs have shown us that we can not only represent syntax of the language, but to an extent mine out semantics through dependency resolution of the sentence. But in order to demonstrate this phenomenal property, we need to parse Tamil language sentences using TAGs we have built and through parsing obtain a derivation we could use to resolve dependencies, thus proving the semantic property. We use an in-house developed pseudo lexical TAG chart parser; algorithm given by Schabes and Joshi (1988), for generating derivations of sentences. We do not use any statistics to rank out ambiguous derivations but rather use all of them to understand the mentioned semantic relation with in TAGs for Tamil. We shall also present a brief parser analysis for the completeness of our discussions. △ Less

Submitted 19 April, 2017; originally announced April 2017.

Comments: 9 pages. arXiv admin note: text overlap with arXiv:1604.01235

arXiv:1612.08894 [pdf, other]

Unsupervised domain adaptation in brain lesion segmentation with adversarial networks

Authors: Konstantinos Kamnitsas, Christian Baumgartner, Christian Ledig, Virginia F. J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Aditya Nori, Antonio Criminisi, Daniel Rueckert, Ben Glocker

Abstract: Significant advances have been made towards building accurate automatic segmentation systems for a variety of biomedical applications using machine learning. However, the performance of these systems often degrades when they are applied on new data that differ from the training data, for example, due to variations in imaging protocols. Manually annotating new data for each test domain is not a fea… ▽ More Significant advances have been made towards building accurate automatic segmentation systems for a variety of biomedical applications using machine learning. However, the performance of these systems often degrades when they are applied on new data that differ from the training data, for example, due to variations in imaging protocols. Manually annotating new data for each test domain is not a feasible solution. In this work we investigate unsupervised domain adaptation using adversarial neural networks to train a segmentation method which is more invariant to differences in the input data, and which does not require any annotations on the test domain. Specifically, we learn domain-invariant features by learning to counter an adversarial network, which attempts to classify the domain of the input data by observing the activations of the segmentation network. Furthermore, we propose a multi-connected domain discriminator for improved adversarial training. Our system is evaluated using two MR databases of subjects with traumatic brain injuries, acquired using different scanners and imaging protocols. Using our unsupervised approach, we obtain segmentation accuracies which are close to the upper bound of supervised domain adaptation. △ Less

Submitted 28 December, 2016; originally announced December 2016.

arXiv:1609.02478 [pdf, other]

Phase Separation and Coexistence of Hydrodynamically Interacting Microswimmers

Authors: Johannes Blaschke, Maurice Maurer, Karthik Menon, Andreas Zöttl, Holger Stark

Abstract: A striking feature of the collective behavior of spherical microswimmers is that for sufficiently strong self-propulsion they phase-separate into a dense cluster coexisting with a low-density dis- ordered surrounding. Extending our previous work, we use the squirmer as a model swimmer and the particle-based simulation method of multi-particle collision dynamics to explore the influence of hydrodyn… ▽ More A striking feature of the collective behavior of spherical microswimmers is that for sufficiently strong self-propulsion they phase-separate into a dense cluster coexisting with a low-density dis- ordered surrounding. Extending our previous work, we use the squirmer as a model swimmer and the particle-based simulation method of multi-particle collision dynamics to explore the influence of hydrodynamics on their phase behavior in a quasi-two-dimensional geometry. The coarsening dynamics towards the phase-separated state is diffusive in an intermediate time regime followed by a final ballistic compactification of the dense cluster. We determine the binodal lines in a phase diagram of Péclet number versus density. Interestingly, the gas binodals are shifted to smaller densities for increasing mean density or dense-cluster size, which we explain using a recently introduced pressure balance [S. C. Takatori et al., Phys. Rev. Lett. 113, 028103 (2014)] extended by a hydrodynamic contribution. Furthermore, we find that for pushers and pullers the binodal line is shifted to larger Péclet numbers compared to neutral squirmers. Finally, when lowering the Péclet number, the dense phase transforms from a hexagonal "solid" to a disordered "fluid" state. △ Less

Submitted 28 October, 2016; v1 submitted 8 September, 2016; originally announced September 2016.

Comments: 10 pages, 15 figures

arXiv:1607.00360 [pdf, other]

A scaled Bregman theorem with applications

Authors: Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Abstract: Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a… ▽ More Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, while admissible data include scalars, vectors and matrices. Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning. △ Less

Submitted 1 July, 2016; originally announced July 2016.

arXiv:1605.08405 [pdf, other]

doi 10.1039/C6SM01719C

Attraction-induced jamming in the flow of foam through a channel

Authors: Karthik Menon, Rama Govindarajan, Shubha Tewari

Abstract: We study the flow of a pressure-driven foam through a straight channel using numerical simulations, and examine the effects of a tuneable attractive potential between bubbles. This potential, which accounts for the effects of disjoining pressure in the liquid films between separating bubbles, is shown here to introduce jamming and stick-slip flow in a straight channel. We report on the behaviour o… ▽ More We study the flow of a pressure-driven foam through a straight channel using numerical simulations, and examine the effects of a tuneable attractive potential between bubbles. This potential, which accounts for the effects of disjoining pressure in the liquid films between separating bubbles, is shown here to introduce jamming and stick-slip flow in a straight channel. We report on the behaviour of these new regimes by varying the strength of the attractive potential. It is seen that there is a force threshold below which the flow jams, and on increasing the driving force, a cross over from intermittent (stick-slip) to smooth flow is observed. This threshold force below which the foam jams increases linearly with the strength of the attractive potential. By examining the spectra of energy fluctuations, we show that stick-slip flow is characterized by low frequency rearrangements and strongly local behaviour, whereas steady flow shows a broad spectrum of energy drop events and collective behaviour. Our work suggests that the stick-slip and the jamming regimes occur due to the increased stabilization of contact networks by the attractive potential - as the strength of attraction is increased, bubbles are increasingly trapped within networks, and there is a decrease in the number of contact changes. △ Less

Submitted 26 May, 2016; originally announced May 2016.

Comments: Supplementary materials available upon request - movies comparing the evolution of contact force networks as well as the fluctuations of elastic energy with the flow, for steady as well as stick-slip flow regimes

Journal ref: Soft Matter, 2016,12, 7772-7781

arXiv:1605.00751 [pdf, other]

Learning from Binary Labels with Instance-Dependent Corruption

Authors: Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

Abstract: Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm t… ▽ More Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm that is consistent for classification on the noisy distribution is also consistent on the clean distribution. Second, we prove that for a broad class of instance- and label-dependent noise, a similar consistency result holds for the area under the ROC curve. Third, for the latter noise model, when the noise-free class-probability function belongs to the generalised linear model family, we show that the Isotron can efficiently and provably learn from the corrupted sample. △ Less

Submitted 4 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

arXiv:1604.01235 [pdf]

A new TAG Formalism for Tamil and Parser Analytics

Authors: Vijay Krishna Menon, S. Rajendran, M. Anand Kumar, K. P. Soman

Abstract: Tree adjoining grammar (TAG) is specifically suited for morph rich and agglutinated languages like Tamil due to its psycho linguistic features and parse time dependency and morph resolution. Though TAG and LTAG formalisms have been known for about 3 decades, efforts on designing TAG Syntax for Tamil have not been entirely successful due to the complexity of its specification and the rich morpholog… ▽ More Tree adjoining grammar (TAG) is specifically suited for morph rich and agglutinated languages like Tamil due to its psycho linguistic features and parse time dependency and morph resolution. Though TAG and LTAG formalisms have been known for about 3 decades, efforts on designing TAG Syntax for Tamil have not been entirely successful due to the complexity of its specification and the rich morphology of Tamil language. In this paper we present a minimalistic TAG for Tamil without much morphological considerations and also introduce a parser implementation with some obvious variations from the XTAG system △ Less

Submitted 5 April, 2016; originally announced April 2016.

Comments: International Symposium for Dravidian Languages (iDravidian), co-located with ICON2014, Goa University, Dec 2014

arXiv:1603.05959 [pdf, other]

doi 10.1016/j.media.2016.10.004

Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation

Authors: Konstantinos Kamnitsas, Christian Ledig, Virginia F. J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Daniel Rueckert, Ben Glocker

Abstract: We propose a dual pathway, 11-layers deep, three-dimensional Convolutional Neural Network for the challenging task of brain lesion segmentation. The devised architecture is the result of an in-depth analysis of the limitations of current networks proposed for similar applications. To overcome the computational burden of processing 3D medical scans, we have devised an efficient and effective dense… ▽ More We propose a dual pathway, 11-layers deep, three-dimensional Convolutional Neural Network for the challenging task of brain lesion segmentation. The devised architecture is the result of an in-depth analysis of the limitations of current networks proposed for similar applications. To overcome the computational burden of processing 3D medical scans, we have devised an efficient and effective dense training scheme which joins the processing of adjacent image patches into one pass through the network while automatically adapting to the inherent class imbalance present in the data. Further, we analyze the development of deeper, thus more discriminative 3D CNNs. In order to incorporate both local and larger contextual information, we employ a dual pathway architecture that processes the input images at multiple scales simultaneously. For post-processing of the network's soft segmentation, we use a 3D fully connected Conditional Random Field which effectively removes false positives. Our pipeline is extensively evaluated on three challenging tasks of lesion segmentation in multi-channel MRI patient data with traumatic brain injuries, brain tumors, and ischemic stroke. We improve on the state-of-the-art for all three applications, with top ranking performance on the public benchmarks BRATS 2015 and ISLES 2015. Our method is computationally efficient, which allows its adoption in a variety of research and clinical settings. The source code of our implementation is made publicly available. △ Less

Submitted 8 January, 2017; v1 submitted 18 March, 2016; originally announced March 2016.

Comments: This version was accepted in the journal Medical Image Analysis (MedIA)

arXiv:1506.01520 [pdf, other]

An Average Classification Algorithm

Authors: Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

Abstract: Many classification algorithms produce a classifier that is a weighted average of kernel evaluations. When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion. Popular existing approaches focus on altering standard learning algorithms, such as the Support Vector Mac… ▽ More Many classification algorithms produce a classifier that is a weighted average of kernel evaluations. When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion. Popular existing approaches focus on altering standard learning algorithms, such as the Support Vector Machine, to induce sparsity, as well as post-hoc procedures for sparse approximations. Here we adopt the latter approach. We begin with a very simple classifier, given by the kernel mean $$ f(x) = \frac{1}{n} \sum\limits_{i=i}^{n} y_i K(x_i,x) $$ We then find a sparse approximation to this kernel mean via herding. The result is an accurate, easily parallelized algorithm for learning classifiers. △ Less

Submitted 15 December, 2015; v1 submitted 4 June, 2015; originally announced June 2015.

arXiv:1505.07634 [pdf, other]

Learning with Symmetric Label Noise: The Importance of Being Unhinged

Authors: Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

Abstract: Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, cla… ▽ More Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust. Experiments confirm the SLN-robustness of the unhinged loss. △ Less

Submitted 28 May, 2015; originally announced May 2015.

arXiv:1406.3044 [pdf]

Astronomy of two Indian tribes: Banjaras and Kolams

Authors: Mayank N Vahia, Ganesh Halkare, Kishore Menon, Harini Calamur

Abstract: We report field studies of the astronomical beliefs of two Indian tribes: the Banjaras and the Kolams. The Banjaras are an ancient tribe connected with the gypsies of Europe while the Kolams have been foragers until recently. They share their landscape with each other and also with the Gonds whose astronomy was reported previously (Vahia and Halkare, 2013). The primary profession of the Banjaras w… ▽ More We report field studies of the astronomical beliefs of two Indian tribes: the Banjaras and the Kolams. The Banjaras are an ancient tribe connected with the gypsies of Europe while the Kolams have been foragers until recently. They share their landscape with each other and also with the Gonds whose astronomy was reported previously (Vahia and Halkare, 2013). The primary profession of the Banjaras was trade, based on the large-scale movement of goods over long distances, but their services were taken over by the railways about one hundred years ago. Since then the Banjaras have begun the long journey to a sedentary lifestyle. Meanwhile, the Kolams were foragers until about fifty years ago when the Government of India began to help them lead a settled life. Here, we compare their astronomical beliefs of the Banjaras and the Kolams, which indicate the strong sense of identity that each community possesses. Our study also highlights their perspective about the sky and its relation to their daily lives. We show that apart from the absolute importance of the data on human perception of the sky, the data also reveal subtle aspects of interactions between physically co-located but otherwise isolated communities as well as their own lifestyles. We also show that there is a strong relationship between profession and perspective of the sky. △ Less

Submitted 10 June, 2014; originally announced June 2014.

Comments: 20 pages, 5 images and 5 tables

Journal ref: Journal of Astronomical History and Heritage, 17(1), 65-84 (2014)

arXiv:1209.3811 [pdf, other]

Textual Features for Programming by Example

Authors: Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, Adam Tauman Kalai

Abstract: In Programming by Example, a system attempts to infer a program from input and output examples, generally by searching for a composition of certain base functions. Performing a naive brute force search is infeasible for even mildly involved tasks. We note that the examples themselves often present clues as to which functions to compose, and how to rank the resulting programs. In text processing, w… ▽ More In Programming by Example, a system attempts to infer a program from input and output examples, generally by searching for a composition of certain base functions. Performing a naive brute force search is infeasible for even mildly involved tasks. We note that the examples themselves often present clues as to which functions to compose, and how to rank the resulting programs. In text processing, which is our domain of interest, clues arise from simple textual features: for example, if parts of the input and output strings are permutations of one another, this suggests that sorting may be useful. We describe a system that learns the reliability of such clues, allowing for faster search and a principled ranking over programs. Experiments on a prototype of this system show that this learning scheme facilitates efficient inference on a range of text processing tasks. △ Less

Submitted 17 September, 2012; originally announced September 2012.

Showing 51–100 of 105 results for author: Menon, K