-
The Silent Majority: Demystifying Memorization Effect in the Presence of Spurious Correlations
Authors:
Chenyu You,
Haocheng Dai,
Yifei Min,
Jasjeet S. Sekhon,
Sarang Joshi,
James S. Duncan
Abstract:
Machine learning models often rely on simple spurious features -- patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance…
▽ More
Machine learning models often rely on simple spurious features -- patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance through the lens of memorization, which refers to the ability to predict accurately on \textit{atypical} examples (minority groups) in the training set but failing in achieving the same accuracy in the testing set. This paper systematically shows the ubiquitous existence of spurious features in a small set of neurons within the network, providing the first-ever evidence that memorization may contribute to imbalanced group performance. Through three experimental sources of converging empirical evidence, we find the property of a small subset of neurons or channels in memorizing minority group information. Inspired by these findings, we articulate the hypothesis: the imbalanced group performance is a byproduct of ``noisy'' spurious memorization confined to a small set of neurons. To further substantiate this hypothesis, we show that eliminating these unnecessary spurious memorization patterns via a novel framework during training can significantly affect the model performance on minority groups. Our experimental results across various architectures and benchmarks offer new insights on how neural networks encode core and spurious knowledge, laying the groundwork for future research in demystifying robustness to spurious correlation.
△ Less
Submitted 15 January, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Authors:
Chenyu You,
Yifei Min,
Weicheng Dai,
Jasjeet S. Sekhon,
Lawrence Staib,
James S. Duncan
Abstract:
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent…
▽ More
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features -- patterns that correlate with the target in training data, but are not related to the true labeling function; and (iii) existing studies on mitigating the reliance on spurious features, largely based on the assumption that we can identify such features, does not provide definitive assurance for real-world applications. As a piloting study, this work focuses on exploring mitigating the reliance on spurious features for CLIP without using any group annotation. To this end, we systematically study the existence of spurious correlation on CLIP and CLIP+ERM. We first, following recent work on Deep Feature Reweighting (DFR), verify that last-layer retraining can greatly improve group robustness on pretrained CLIP. In view of them, we advocate a lightweight representation calibration method for fine-tuning CLIP, by first generating a calibration set using the pretrained CLIP, and then calibrating representations of samples within this set through contrastive learning, all without the need for group labels. Extensive experiments and in-depth visualizations on several benchmarks validate the effectiveness of our proposals, largely reducing reliance and significantly boosting the model generalization.
△ Less
Submitted 1 November, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator
Authors:
Dennis Shen,
Dogyoon Song,
Peng Ding,
Jasjeet S. Sekhon
Abstract:
Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, u…
▽ More
Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, underparameterized settings, its behavior in high-dimensional, overparameterized regimes is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem in the overparameterized regime. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Under the Gauss-Markov model, we present statistical results such as an extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors for the overparameterized regime. To substantiate our theoretical contributions, we conduct simulations that further explore the stochastic properties of the OLS interpolator.
△ Less
Submitted 30 May, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast
Authors:
Chenyu You,
Weicheng Dai,
Yifei Min,
Lawrence Staib,
Jasjeet S. Sekhon,
James S. Duncan
Abstract:
Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they…
▽ More
Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $τ$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $τ$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.
△ Less
Submitted 17 July, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Hybridized Threshold Clustering for Massive Data
Authors:
Jianmei Luo,
ChandraVyas Annakula,
Aruna Sai Kannamareddy,
Jasjeet S. Sekhon,
William Henry Hsu,
Michael Higgins
Abstract:
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm desig…
▽ More
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm designed to partition data into many small clusters in linearithmic time (on average). Our proposed clustering method is as follows. First, TC is performed and clusters are reduced into single "prototype" points. Then, TC is applied repeatedly on these prototype points until sufficient data reduction has been obtained. Finally, a more sophisticated clustering algorithm is applied to the reduced prototype points, thereby obtaining a clustering on all $n$ data points. This entire procedure for clustering is called iterative hybridized threshold clustering (IHTC). Through simulation results and by applying our methodology on several real datasets, we show that IHTC combined with $k$-means or HAC substantially reduces the run time and memory usage of the original clustering algorithms while still preserving their performance. Additionally, IHTC helps prevent singular data points from being overfit by clustering algorithms.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Linear Aggregation in Tree-based Estimators
Authors:
Sören R. Künzel,
Theo F. Saarinen,
Edward W. Liu,
Jasjeet S. Sekhon
Abstract:
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggrega…
▽ More
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithm's favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
△ Less
Submitted 9 September, 2021; v1 submitted 15 June, 2019;
originally announced June 2019.
-
Causaltoolbox---Estimator Stability for Heterogeneous Treatment Effects
Authors:
Sören R. Künzel,
Simon J. S. Walter,
Jasjeet S. Sekhon
Abstract:
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling appr…
▽ More
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling approaches that allow identification of the procedure that has assumptions that hew closest to the process generating the data set under study and researchers often select one arbitrarily. This approach risks making inferences that rely on incorrect assumptions and gives the experimenter too much scope for $p$-hacking. A single estimator will also tend to overlook patterns other estimators could have picked up. We believe that the conclusion of many published papers might change had a different estimator been chosen and we suggest that practitioners should evaluate many estimators and assess their similarity when investigating heterogeneous treatment effects. We demonstrate this by applying 28 different estimation procedures to an emulated observational data set; this analysis shows that different estimation procedures may give starkly different estimates. We also provide an extensible \texttt{R} package which makes it straightforward for practitioners to follow our recommendations.
△ Less
Submitted 28 March, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
Transfer Learning for Estimating Causal Effects using Neural Networks
Authors:
Sören R. Künzel,
Bradly C. Stadie,
Nikita Vemuri,
Varsha Ramakrishnan,
Jasjeet S. Sekhon,
Pieter Abbeel
Abstract:
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the exta…
▽ More
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the extant literature using extensive simulation studies based on large-scale voter persuasion experiments and the MNIST database. Our methods can perform an order of magnitude better than existing benchmarks while using a fraction of the data.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Inference on a New Class of Sample Average Treatment Effects
Authors:
Jasjeet S. Sekhon,
Yotam Shem-Tov
Abstract:
We derive new variance formulas for inference on a general class of estimands of causal average treatment effects in a Randomized Control Trial (RCT). We generalize Robins (1988) and show that when the estimand of interest is the Sample Average Treatment Effect of the Treated (SATT or SATC for controls), a consistent variance estimator exists. Although these estimands are equal to the Sample Avera…
▽ More
We derive new variance formulas for inference on a general class of estimands of causal average treatment effects in a Randomized Control Trial (RCT). We generalize Robins (1988) and show that when the estimand of interest is the Sample Average Treatment Effect of the Treated (SATT or SATC for controls), a consistent variance estimator exists. Although these estimands are equal to the Sample Average Treatment Effect (SATE) in expectation, potentially large differences in both accuracy and coverage can occur by the change of estimand, even asymptotically. Inference on the SATE, even using a conservative confidence interval, provides incorrect coverage of the SATT or SATC. We derive the variance and limiting distribution of a new and general class of estimands---any mixing between SATT and SATC---for which the SATE is a specific case. We demonstrate the applicability of the new theoretical results using Monte-Carlo simulations and an empirical application with hundreds of online experiments with an average sample size of approximately one hundred million observations per experiment. An R package, estCI, that implements all the proposed estimation procedures is available.
△ Less
Submitted 18 October, 2017; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
Authors:
Sören R. Künzel,
Jasjeet S. Sekhon,
Peter J. Bickel,
Bin Yu
Abstract:
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Ran…
▽ More
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks---to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
△ Less
Submitted 23 April, 2019; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Worth Weighting? How to Think About and Use Weights in Survey Experiments
Authors:
Luke W. Miratrix,
Jasjeet S. Sekhon,
Alexander G. Theodoridis,
Luis F. Campos
Abstract:
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the N…
▽ More
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the Neyman-Rubin model, for researchers producing and working with survey experimental data. We examine simple, efficient estimators for analyzing these data, and give formulae for their biases and variances. We provide simulations that examine these estimators as well as real examples from experiments administered online through YouGov. We find that for examining the existence of population treatment effects using high-quality, broadly representative samples recruited by top online survey firms, sample quantities, which do not rely on weights, are often sufficient. We found that Sample Average Treatment Effect (SATE) estimates did not appear to differ substantially from their weighted counterparts, and they avoided the substantial loss of statistical power that accompanies weighting. When precise estimates of Population Average Treatment Effects (PATE) are essential, we analytically show post-stratifying on survey weights and/or covariates highly correlated with the outcome to be a conservative choice. While we show these substantial gains in simulations, we find limited evidence of them in practice.
△ Less
Submitted 15 August, 2017; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Generalized full matching and extrapolation of the results from a large-scale voter mobilization experiment
Authors:
Fredrik Sävje,
Michael J. Higgins,
Jasjeet S. Sekhon
Abstract:
Matching is an important tool in causal inference. The method provides a conceptually straightforward way to make groups of units comparable on observed characteristics. The use of the method is, however, limited to situations where the study design is fairly simple and the sample is moderately sized. We illustrate the issue by revisiting a large-scale voter mobilization experiment that took place…
▽ More
Matching is an important tool in causal inference. The method provides a conceptually straightforward way to make groups of units comparable on observed characteristics. The use of the method is, however, limited to situations where the study design is fairly simple and the sample is moderately sized. We illustrate the issue by revisiting a large-scale voter mobilization experiment that took place in Michigan for the 2006 election. We ask what the causal effects would have been if the treatments in the experiment were scaled up to the full population. Matching could help us answer this question, but no existing matching method can accommodate the six treatment arms and the 6,762,701 observations involved in the study. To offer a solution this and similar empirical problems, we introduce a generalization of the full matching method and an associated algorithm. The method can be used with any number of treatment conditions, and it is shown to produce near-optimal matchings. The worst case maximum within-group dissimilarity is no worse than four times the optimal solution, and simulation results indicate that its performance is considerably closer to the optimal solution on average. Despite its performance, the algorithm is fast and uses little memory. It terminates, on average, in linearithmic time using linear space. This enables investigators to construct well-performing matchings within minutes even in complex studies with samples of several million units.
△ Less
Submitted 16 June, 2019; v1 submitted 10 March, 2017;
originally announced March 2017.
-
Blocking estimators and inference under the Neyman-Rubin model
Authors:
Michael J. Higgins,
Fredrik Sävje,
Jasjeet S. Sekhon
Abstract:
We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments.
We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.
-
Direct evidence of strong local ferroelectric ordering in a thermoelectric semiconductor
Authors:
Leena Aggarwal,
Jagmeet S. Sekhon,
Satya N. Guin,
Ashima Arora,
Devendra S. Negi,
Ranjan Datta,
Kanishka Biswas,
Goutam Sheet
Abstract:
It is thought that the proposed new family of multi-functional materials namely the ferroelectric thermoelectrics may exhibit enhanced functionalities due to the coupling of the thermoelectric parameters with ferroelectric polarization in solids. Therefore, the ferroelectric thermoelectrics are expected to be of immense technological and fundamental significance. As a first step towards this direc…
▽ More
It is thought that the proposed new family of multi-functional materials namely the ferroelectric thermoelectrics may exhibit enhanced functionalities due to the coupling of the thermoelectric parameters with ferroelectric polarization in solids. Therefore, the ferroelectric thermoelectrics are expected to be of immense technological and fundamental significance. As a first step towards this direction, it is most important to identify the existing high performance thermoelectric materials exhibiting ferroelectricity. Herein, through the direct measurement of local polarization switching we show that the recently discovered thermoelectric semiconductor $AgSbSe_{2}$ has local ferroelectric ordering. Using piezo-response force microscopy, we demonstrate the existence of nanometer scale ferroelectric domains that can be switched by external electric field. These observations are intriguing as $AgSbSe_{2}$ crystalizes in cubic rock salt structure with centro-symmetric space group (Fm-3m) and therefore no ferroelectricity is expected. However, from high resolution transmission electron microscopy (HRTEM) measurement we found the evidence of local superstructure formation which, we believe, leads to local distortion of the centro-symmetric arrangement in $AgSbSe_{2}$ and gives rise to the observed ferroelectricity. Stereochemically active $5s^{2}$ lone pair of Sb can also give rise to local structural distortion, which creates ferroelectricity in $AgSbSe_{2}$.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
Controlling the LSPR properties of Au triangular nanoprisms and nanoboxes by geometrical parameter: a numerical investigation
Authors:
Jagmeet Singh Sekhon,
S S Verma
Abstract:
We have simulated the extinction spectra of Au triangular nanoprisms and nanoboxes by finite difference time domain (FDTD) method. It is found that the refractive index sensitivity increases linearly and near exponentially as the aspect ratio of nanoprisms increases and wall thickness of nanoboxes decreases. A sensing figure of merit (FOM) calculations shows that there is an optimum wall thickness…
▽ More
We have simulated the extinction spectra of Au triangular nanoprisms and nanoboxes by finite difference time domain (FDTD) method. It is found that the refractive index sensitivity increases linearly and near exponentially as the aspect ratio of nanoprisms increases and wall thickness of nanoboxes decreases. A sensing figure of merit (FOM) calculations shows that there is an optimum wall thickness for each edge length and height of the box, which makes them to be promising candidate for effective sensing applications. We have also shown that the higher FOM in triangular nanoboxes compared to the cubic nanoboxes and other solid structure is inherent in the shape of nanoparticles.
△ Less
Submitted 9 May, 2014;
originally announced May 2014.
-
Observation of hysteretic phase-switching in silicon by piezoresponse force microscopy
Authors:
Jagmeet S. Sekhon,
Leena Aggarwal,
Goutam Sheet
Abstract:
We report the observation of $180^o$ phase switching on silicon wafers by piezo-response force microscopy (PFM). The switching is hysteretic and shows remarkable similarities with polarization switching in ferroelectrics. This is always accompanied by a hysteretic amplitude vs. voltage curve which resembles the "butterfly loops" for piezoelectric materials. From a detailed analysis of the data obt…
▽ More
We report the observation of $180^o$ phase switching on silicon wafers by piezo-response force microscopy (PFM). The switching is hysteretic and shows remarkable similarities with polarization switching in ferroelectrics. This is always accompanied by a hysteretic amplitude vs. voltage curve which resembles the "butterfly loops" for piezoelectric materials. From a detailed analysis of the data obtained under different environmental and experimental conditions, we show that the hysteresis effects in phase and amplitude do not originate from ferro-electricity or piezoelectricity. This further indicates that mere observation of hysteresis effects in PFM does not confirm the existence of ferroelectric and/or piezoelectric ordering in materials. We also show that when samples are mounted on silicon for PFM measurements, the switching properties of silicon may appear on the sample even if the sample thickness is large.
△ Less
Submitted 11 January, 2014;
originally announced January 2014.