Search | arXiv e-print repository

arXiv:2502.20097 [pdf, other]

Qini curve estimation under clustered network interference

Authors: Rickard K. A. Karlsson, Bram van den Akker, Felipe Moraes, Hugo M. Proença, Jesse H. Krijthe

Abstract: Qini curves are a widely used tool for assessing treatment policies under allocation constraints as they visualize the incremental gain of a new treatment policy versus the cost of its implementation. Standard Qini curve estimation assumes no interference between units: that is, that treating one unit does not influence the outcome of any other unit. In many real-life applications such as public p… ▽ More Qini curves are a widely used tool for assessing treatment policies under allocation constraints as they visualize the incremental gain of a new treatment policy versus the cost of its implementation. Standard Qini curve estimation assumes no interference between units: that is, that treating one unit does not influence the outcome of any other unit. In many real-life applications such as public policy or marketing, however, the presence of interference is common. Ignoring interference in these scenarios can lead to systematically biased Qini curves that over- or under-estimate a treatment policy's cost-effectiveness. In this paper, we address the problem of Qini curve estimation under clustered network interference, where interfering units form independent clusters. We propose a formal description of the problem setting with an experimental study design under which we can account for clustered network interference. Within this framework, we introduce three different estimation strategies suited for different conditions. Moreover, we introduce a marketplace simulator that emulates clustered network interference in a typical e-commerce setting. From both theoretical and empirical insights, we provide recommendations in choosing the best estimation strategy by identifying an inherent bias-variance trade-off among the estimation strategies. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.06231 [pdf, ps, other]

Falsification of Unconfoundedness by Testing Independence of Causal Mechanisms

Authors: Rickard K. A. Karlsson, Jesse H. Krijthe

Abstract: A major challenge in estimating treatment effects in observational studies is the reliance on untestable conditions such as the assumption of no unmeasured confounding. In this work, we propose an algorithm that can falsify the assumption of no unmeasured confounding in a setting with observational data from multiple heterogeneous sources, which we refer to as environments. Our proposed falsificat… ▽ More A major challenge in estimating treatment effects in observational studies is the reliance on untestable conditions such as the assumption of no unmeasured confounding. In this work, we propose an algorithm that can falsify the assumption of no unmeasured confounding in a setting with observational data from multiple heterogeneous sources, which we refer to as environments. Our proposed falsification strategy leverages a key observation that unmeasured confounding can cause observed causal mechanisms to appear dependent. Building on this observation, we develop a novel two-stage procedure that detects these dependencies with high statistical power while controlling false positives. The algorithm does not require access to randomized data and, in contrast to other falsification approaches, functions even under transportability violations when the environment has a direct effect on the outcome of interest. To showcase the practical relevance of our approach, we show that our method is able to efficiently detect confounding on both simulated and semi-synthetic data. △ Less

Submitted 2 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: ICML 2025 camera-ready version; 20 pages, including 5 figures, 2 tables, and appendices

MSC Class: 62F03 (Primary) 68T01 (Secondary)

arXiv:2501.11449 [pdf, other]

Irregular measurement times in estimating time-varying treatment effects: Categorizing biases and comparing adjustment methods

Authors: Wouter M. R. Kant, Jesse H. Krijthe

Abstract: To estimate the causal effect of treatments that vary over time from observational data, one must adjust for time-varying confounding. A common procedure to address confounding is the use of inverse probability of treatment weighting methods. However, the timing of covariate measurements is often irregular, which may introduce additional confounding bias as well as selection bias into the causal e… ▽ More To estimate the causal effect of treatments that vary over time from observational data, one must adjust for time-varying confounding. A common procedure to address confounding is the use of inverse probability of treatment weighting methods. However, the timing of covariate measurements is often irregular, which may introduce additional confounding bias as well as selection bias into the causal effect estimate. Two reweighting methods have been proposed to adjust for these biases: time-as-confounder and reweighting by measurement time. However, it is currently not well understood in which situations these irregularly timed measurements induce bias, and how the available reweighting methods compare to each other in different situations. In this work, we provide a complete inventarization of all possible backdoor paths through which bias is induced. Based on these paths, we distinguish three categories of confounding bias by measurement time: direct confounding (DC), confounding through measured variables (CMV), and confounding through unmeasured variables (CUV). These categories differ in the assumptions and reweighting methods necessary to adjust for bias and may occur simultaneously with selection bias. Through simulation studies, we illustrate: 1. Reweighting by measurement time may be used to adjust for selection bias and confounding through measured variables; 2. Time-as-confounder may be used to adjust for all categories of confounding bias, but not selection bias; 3. In some cases, the use of a combination of both techniques may be used to adjust for both confounding and selection bias. We finally apply the categorization and reweighting methods on the pre-DIVA data set. Adjusting for measurement times is crucial in order to avoid bias, and the categorization of biases and techniques that we introduce may help researchers to choose the appropriate analysis. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: 20 pages, 10 figures

arXiv:2406.17971 [pdf, other]

Robust integration of external control data in randomized trials

Authors: Rickard Karlsson, Guanbo Wang, Piersilvio De Bartolomeis, Jesse H. Krijthe, Issa J. Dahabreh

Abstract: One approach for increasing the efficiency of randomized trials is the use of "external controls" -- individuals who received the control treatment studied in the trial during routine practice or in prior experimental studies. Existing external control methods, however, can be biased if the populations underlying the trial and the external control data are not exchangeable. Here, we characterize a… ▽ More One approach for increasing the efficiency of randomized trials is the use of "external controls" -- individuals who received the control treatment studied in the trial during routine practice or in prior experimental studies. Existing external control methods, however, can be biased if the populations underlying the trial and the external control data are not exchangeable. Here, we characterize a randomization-aware class of treatment effect estimators in the population underlying the trial that remain consistent and asymptotically normal when using external control data, even when exchangeability does not hold. We consider two members of this class of estimators: the well-known augmented inverse probability weighting trial-only estimator, which is the efficient estimator when only trial data are used; and a potentially more efficient member of the class when exchangeability holds and external control data are available, which we refer to as the optimized randomization-aware estimator. To achieve robust integration of external control data in trial analyses, we then propose a combined estimator based on the efficient trial-only estimator and the optimized randomization-aware estimator. We show that the combined estimator is consistent and no less efficient than the most efficient of the two component estimators, whether the exchangeability assumption holds or not. We examine the estimators' performance in simulations and we illustrate their use with data from two trials of paliperidone extended-release for schizophrenia. △ Less

Submitted 25 March, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2402.17366 [pdf]

The risks of risk assessment: causal blind spots when using prediction models for treatment decisions

Authors: Nan van Geloven, Ruth H Keogh, Wouter van Amsterdam, Giovanni Cinà, Jesse H. Krijthe, Niels Peek, Kim Luijken, Sara Magliacane, Paweł Morzywołek, Thijs van Ommen, Hein Putter, Matthew Sperrin, Junfeng Wang, Daniala L. Weir, Vanessa Didelez

Abstract: Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as… ▽ More Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as a predictor, restricting data based on treatment status and ignoring treatments. When predictions are used to inform treatment decisions, confounders, colliders and mediators, as well as changes in treatment protocols over time may lead to misinformed decision-making. We illustrate potential harmful consequences in several medical applications. We advocate for an extension of guidelines for development, reporting and evaluation of prediction models to ensure that the intended use of the model is matched to an appropriate risk estimand. When prediction models are intended to inform treatment decisions, prediction models should specify upfront the treatment decisions they aim to support and target a prediction estimand in line with that goal. This requires a shift towards developing predictions under the specific treatment options under consideration (`predictions under interventions'). Predictions under interventions need causal reasoning and inference techniques during development and validation. We argue that this will improve the efficacy of prediction models in guiding treatment decisions and prevent potential negative effects on patient outcomes. △ Less

Submitted 6 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2312.01210 [pdf, other]

When accurate prediction models yield harmful self-fulfilling prophecies

Authors: Wouter A. C. van Amsterdam, Nan van Geloven, Jesse H. Krijthe, Rajesh Ranganath, Giovanni Ciná

Abstract: Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit… ▽ More Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before and after deployment are useless for decision making as they made no change in the data distribution. These results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions. △ Less

Submitted 26 August, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2205.13935 [pdf, other]

Detecting hidden confounding in observational data using multiple environments

Authors: Rickard K. A. Karlsson, Jesse H. Krijthe

Abstract: A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different… ▽ More A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different environments. We present a theory for testable conditional independencies that are only absent when there is hidden confounding and examine cases where we violate its assumptions: degenerate & dependent mechanisms, and faithfulness violations. Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset. In most cases, the proposed procedure correctly predicts the presence of hidden confounding, particularly when the confounding bias is large. △ Less

Submitted 3 November, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: NeurIPS 2023 camera-ready version; 30 pages including references and appendix

arXiv:2012.01172 [pdf, other]

doi 10.1007/978-3-030-76423-4_1

ReproducedPapers.org: Openly teaching and structuring machine learning reproducibility

Authors: Burak Yildiz, Hayley Hung, Jesse H. Krijthe, Cynthia C. S. Liem, Marco Loog, Gosia Migut, Frans Oliehoek, Annibale Panichella, Przemyslaw Pawelczak, Stjepan Picek, Mathijs de Weerdt, Jan van Gemert

Abstract: We present ReproducedPapers.org: an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest that students who do a reproduction project place more val… ▽ More We present ReproducedPapers.org: an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest that students who do a reproduction project place more value on scientific reproductions and become more critical thinkers. Students and AI researchers agree that our online reproduction repository is valuable. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted to RRPR 2020: Third Workshop on Reproducible Research in Pattern Recognition

arXiv:2004.04328 [pdf, ps, other]

doi 10.1073/pnas.2001875117

A Brief Prehistory of Double Descent

Authors: Marco Loog, Tom Viering, Alexander Mey, Jesse H. Krijthe, David M. J. Tax

Abstract: In their thought-provoking paper [1], Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners. Given a fixed training sample size $n$, such curves show the risk of a learner as a function of some (approximate) measure of its complexity $N$. With $N$ the number of features, these curves are also referred to as feature curves. A salient observa… ▽ More In their thought-provoking paper [1], Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners. Given a fixed training sample size $n$, such curves show the risk of a learner as a function of some (approximate) measure of its complexity $N$. With $N$ the number of features, these curves are also referred to as feature curves. A salient observation in [1] is that these curves can display, what they call, double descent: with increasing $N$, the risk initially decreases, attains a minimum, and then increases until $N$ equals $n$, where the training data is fitted perfectly. Increasing $N$ even further, the risk decreases a second and final time, creating a peak at $N=n$. This twofold descent may come as a surprise, but as opposed to what [1] reports, it has not been overlooked historically. Our letter draws attention to some original, earlier findings, of interest to contemporary machine learning. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:1710.06514 [pdf, ps, other]

Robust importance-weighted cross-validation under sample selection bias

Authors: Wouter M. Kouw, Jesse H. Krijthe, Marco Loog

Abstract: Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increas… ▽ More Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights. △ Less

Submitted 27 August, 2019; v1 submitted 17 October, 2017; originally announced October 2017.

Comments: 6 pages, 8 figures, Accepted to the IEEE International Workshop on Machine Learning for Signal Processing 2019

arXiv:1707.04025 [pdf, other]

On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL

Authors: Marco Loog, Jesse H. Krijthe, Are C. Jensen

Abstract: In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies. In this chapter, we basically argue that if such classifiers, in their respective training phases, optimize… ▽ More In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies. In this chapter, we basically argue that if such classifiers, in their respective training phases, optimize a so-called surrogate loss that it may also be valuable to compare the behavior of this loss on the test set, next to the regular classification error rate. It can provide us with an additional view on the classifiers' relative performances that error rates cannot capture. As an example, limited but convincing empirical results demonstrates that we may be able to find semi-supervised learning strategies that can guarantee performance improvements with increasing numbers of unlabeled data in terms of log-likelihood. In contrast, the latter may be impossible to guarantee for the classification error rate. △ Less

Submitted 13 July, 2017; originally announced July 2017.

Journal ref: In Handbook of Pattern Recognition and Computer Vision (pp. 53-68) (2016)

arXiv:1706.02645 [pdf, other]

Nuclear Discrepancy for Active Learning

Authors: Tom J. Viering, Jesse H. Krijthe, Marco Loog

Abstract: Active learning algorithms propose which unlabeled objects should be queried for their labels to improve a predictive model the most. We study active learners that minimize generalization bounds and uncover relationships between these bounds that lead to an improved approach to active learning. In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (M… ▽ More Active learning algorithms propose which unlabeled objects should be queried for their labels to improve a predictive model the most. We study active learners that minimize generalization bounds and uncover relationships between these bounds that lead to an improved approach to active learning. In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy, and a new and looser bound that we refer to as the Nuclear Discrepancy bound. We motivate this bound by a probabilistic argument: we show it considers situations which are more likely to occur. Our experiments indicate that active learning using the tightest Discrepancy bound performs the worst in terms of the squared loss. Overall, our proposed loosest Nuclear Discrepancy generalization bound performs the best. We confirm our probabilistic argument empirically: the other bounds focus on more pessimistic scenarios that are rarer in practice. We conclude that tightness of bounds is not always of main importance and that active learning methods should concentrate on realistic scenarios in order to improve performance. △ Less

Submitted 8 June, 2017; originally announced June 2017.

Comments: 32 pages, 5 figures, 4 tables

arXiv:1612.08875 [pdf, other]

The Pessimistic Limits and Possibilities of Margin-based Losses in Semi-supervised Learning

Authors: Jesse H. Krijthe, Marco Loog

Abstract: Consider a classification problem where we have both labeled and unlabeled data available. We show that for linear classifiers defined by convex margin-based surrogate losses that are decreasing, it is impossible to construct any semi-supervised approach that is able to guarantee an improvement over the supervised classifier measured by this surrogate loss on the labeled and unlabeled data. For co… ▽ More Consider a classification problem where we have both labeled and unlabeled data available. We show that for linear classifiers defined by convex margin-based surrogate losses that are decreasing, it is impossible to construct any semi-supervised approach that is able to guarantee an improvement over the supervised classifier measured by this surrogate loss on the labeled and unlabeled data. For convex margin-based loss functions that also increase, we demonstrate safe improvements are possible. △ Less

Submitted 8 January, 2019; v1 submitted 28 December, 2016; originally announced December 2016.

Comments: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada

arXiv:1612.08650 [pdf, other]

Reproducible Pattern Recognition Research: The Case of Optimistic SSL

Authors: Jesse H. Krijthe, Marco Loog

Abstract: In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible. We discuss our definition of reproducibility, the tools used, how the analysis was set up, show some examples of alternative analyses the code enables and discuss our views on reproducibility. In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible. We discuss our definition of reproducibility, the tools used, how the analysis was set up, show some examples of alternative analyses the code enables and discuss our views on reproducibility. △ Less

Submitted 27 December, 2016; originally announced December 2016.

Comments: Presented at RRPR 2016: 1st Workshop on Reproducible Research in Pattern Recognition

arXiv:1612.07993 [pdf, other]

RSSL: Semi-supervised Learning in R

Authors: Jesse H. Krijthe

Abstract: In this paper, we introduce a package for semi-supervised learning research in the R programming language called RSSL. We cover the purpose of the package, the methods it includes and comment on their use and implementation. We then show, using several code examples, how the package can be used to replicate well-known results from the semi-supervised learning literature. In this paper, we introduce a package for semi-supervised learning research in the R programming language called RSSL. We cover the purpose of the package, the methods it includes and comment on their use and implementation. We then show, using several code examples, how the package can be used to replicate well-known results from the semi-supervised learning literature. △ Less

Submitted 23 December, 2016; originally announced December 2016.

Comments: Presented at RRPR 2016: 1st Workshop on Reproducible Research in Pattern Recognition

arXiv:1610.05160 [pdf, other]

The Peaking Phenomenon in Semi-supervised Learning

Authors: Jesse H. Krijthe, Marco Loog

Abstract: For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-su… ▽ More For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training set. We explain why the learning curve has a more steep incline and a more gradual decline in this setting through simulation studies and by applying an approximation of the learning curve based on the work by Raudys & Duin. △ Less

Submitted 17 October, 2016; originally announced October 2016.

Comments: 11 pages, 5 figures. S+SSPR 2016, Mérida, Mexico

arXiv:1610.03713 [pdf, other]

Optimistic Semi-supervised Least Squares Classification

Authors: Jesse H. Krijthe, Marco Loog

Abstract: The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly differ… ▽ More The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly different objective functions. The resulting soft-label approach is related to an idea about dealing with missing data that dates back to the 1930s. We show that the soft-label variant typically outperforms the hard-label variant on benchmark datasets and partially explain this behaviour by studying the relative difficulty of finding good local minima for the corresponding objective functions. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: 6 pages, 6 figures. International Conference on Pattern Recognition (ICPR) 2016, Cancun, Mexico

arXiv:1602.07865 [pdf, other]

Projected Estimators for Robust Semi-supervised Classification

Authors: Jesse H. Krijthe, Marco Loog

Abstract: For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in ter… ▽ More For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. Unlike other approaches to semi-supervised learning, the procedure does not rely on assumptions that are not intrinsic to the classifier at hand. It is theoretically demonstrated that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy often considered in practice. △ Less

Submitted 25 February, 2016; originally announced February 2016.

Comments: 13 pages, 2 figures, 1 table

arXiv:1512.08240 [pdf, other]

doi 10.1016/j.patcog.2016.09.009

Robust Semi-supervised Least Squares Classification by Implicit Constraints

Authors: Jesse H. Krijthe, Marco Loog

Abstract: We introduce the implicitly constrained least squares (ICLS) classifier, a novel semi-supervised version of the least squares classifier. This classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, this approach does not introduce explicit additional assumpti… ▽ More We introduce the implicitly constrained least squares (ICLS) classifier, a novel semi-supervised version of the least squares classifier. This classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, this approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. This method can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a limited 1-dimensional setting, this approach never leads to performance worse than the supervised classifier. Experimental results show that also in the general multidimensional case performance improvements can be expected, both in terms of the squared loss that is intrinsic to the classifier, as well as in terms of the expected classification error. △ Less

Submitted 27 January, 2017; v1 submitted 27 December, 2015; originally announced December 2015.

Comments: Appeared as Pattern Recognition Volume 63, March 2017, Pages 115-126. This version of the manuscript fixes some typos in the equations on page 9 that are incorrect in the published version

Journal ref: Pattern Recognition Volume 63, March 2017, Pages 115-126

arXiv:1512.04829 [pdf, other]

Feature-Level Domain Adaptation

Authors: Wouter M. Kouw, Jesse H. Krijthe, Marco Loog, Laurens J. P. van der Maaten

Abstract: Domain adaptation is the supervised learning setting in which the training and test data are sampled from different distributions: training data is sampled from a source domain, whilst test data is sampled from a target domain. This paper proposes and studies an approach, called feature-level domain adaptation (FLDA), that models the dependence between the two domains by means of a feature-level t… ▽ More Domain adaptation is the supervised learning setting in which the training and test data are sampled from different distributions: training data is sampled from a source domain, whilst test data is sampled from a target domain. This paper proposes and studies an approach, called feature-level domain adaptation (FLDA), that models the dependence between the two domains by means of a feature-level transfer model that is trained to describe the transfer from source to target domain. Subsequently, we train a domain-adapted classifier by minimizing the expected loss under the resulting transfer model. For linear classifiers and a large family of loss functions and transfer models, this expected loss can be computed or approximated analytically, and minimized efficiently. Our empirical evaluation of FLDA focuses on problems comprising binary and count data in which the transfer can be naturally modeled via a dropout distribution, which allows the classifier to adapt to differences in the marginal probability of features in the source and the target domain. Our experiments on several real-world problems show that FLDA performs on par with state-of-the-art domain-adaptation techniques. △ Less

Submitted 7 June, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

Comments: 32 pages, 13 figures, 9 tables

Journal ref: JMLR 17:171 (2016) 1-32

arXiv:1507.06802 [pdf, other]

Implicitly Constrained Semi-Supervised Least Squares Classification

Authors: Jesse H. Krijthe, Marco Loog

Abstract: We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the obje… ▽ More We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate. △ Less

Submitted 24 July, 2015; originally announced July 2015.

Comments: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium on Intelligent Data Analysis (2015), Saint-Etienne, France

arXiv:1411.4521 [pdf, other]

Implicitly Constrained Semi-Supervised Linear Discriminant Analysis

Authors: Jesse H. Krijthe, Marco Loog

Abstract: Semi-supervised learning is an important and active topic of research in pattern recognition. For classification using linear discriminant analysis specifically, several semi-supervised variants have been proposed. Using any one of these methods is not guaranteed to outperform the supervised classifier which does not take the additional unlabeled data into account. In this work we compare traditio… ▽ More Semi-supervised learning is an important and active topic of research in pattern recognition. For classification using linear discriminant analysis specifically, several semi-supervised variants have been proposed. Using any one of these methods is not guaranteed to outperform the supervised classifier which does not take the additional unlabeled data into account. In this work we compare traditional Expectation Maximization type approaches for semi-supervised linear discriminant analysis with approaches based on intrinsic constraints and propose a new principled approach for semi-supervised linear discriminant analysis, using so-called implicit constraints. We explore the relationships between these methods and consider the question if and in what sense we can expect improvement in performance over the supervised procedure. The constraint based approaches are more robust to misspecification of the model, and may outperform alternatives that make more assumptions on the data, in terms of the log-likelihood of unseen objects. △ Less

Submitted 17 November, 2014; originally announced November 2014.

Comments: 6 pages, 3 figures and 3 tables. International Conference on Pattern Recognition (ICPR) 2014, Stockholm, Sweden

Showing 1–22 of 22 results for author: Krijthe, J H