Skip to main content

Showing 1–29 of 29 results for author: Karvanen, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.15215  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Clustering and Pruning in Causal Data Fusion

    Authors: Otto Tabell, Santtu Tikka, Juha Karvanen

    Abstract: Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. H… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2411.03848  [pdf, ps, other

    stat.ME

    Monotone Missing Data: A Blessing and a Curse

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Monotone missingness is commonly encountered in practice where a missing measurement compels another measurement to be missing. In graphical missing data models, monotonicity has implications for the identifiability of the full law, i.e., the joint distribution of actual variables and response indicators. In the general nonmonotone case, the full law is known to be nonparametrically identifiable i… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  3. arXiv:2403.02245  [pdf, ps, other

    stat.ME

    Dynamic programming principle in cost-efficient sequential design: application to switching measurements

    Authors: Jeongmin Han, Juha Karvanen, Mikko Parviainen

    Abstract: We study sequential cost-efficient design in a situation where each update of covariates involves a fixed time cost typically considerable compared to a single measurement time. The problem arises from parameter estimation in switching measurements on superconducting Josephson junctions which are components needed in quantum computers and other superconducting electronics. In switching measurement… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 28 pages, 3 figures

  4. arXiv:2402.05633  [pdf, ps, other

    stat.ME

    Full Law Identification under Missing Data with Categorical Variables

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Missing data may be disastrous for the identifiability of causal and statistical estimands. In graphical missing data models, colluders are dependence structures that have a special importance for identification considerations. It has been shown that the presence of a colluder makes the full law, i.e., the joint distribution of variables and response indicators, non-parametrically non-identifiable… ▽ More

    Submitted 3 July, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  5. arXiv:2306.15328  [pdf, ps, other

    stat.ML cs.CY cs.LG stat.CO

    Simulating counterfactuals

    Authors: Juha Karvanen, Santtu Tikka, Matti Vihola

    Abstract: Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

    Journal ref: Journal of Artificial Intelligence Research 80, 835-857, 2024

  6. Price Optimization Combining Conjoint Data and Purchase History: A Causal Modeling Approach

    Authors: Lauri Valkonen, Santtu Tikka, Jouni Helske, Juha Karvanen

    Abstract: Pricing decisions of companies require an understanding of the causal effect of a price change on the demand. When real-life pricing experiments are infeasible, data-driven decision-making must be based on alternative data sources such as purchase history (sales data) and conjoint studies where a group of customers is asked to make imaginary purchases in an artificial setup. We present an approach… ▽ More

    Submitted 30 April, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

    Journal ref: Observational Studies, 10(1), 37-53, 2024

  7. arXiv:2206.06699  [pdf, ps, other

    stat.ME cs.LG

    Generalizing experimental findings: identification beyond adjustments

    Authors: Juha Karvanen

    Abstract: We aim to generalize the results of a randomized controlled trial (RCT) to a target population with the help of some observational data. This is a problem of causal effect identification with multiple data sources. Challenges arise when the RCT is conducted in a context that differs from the target population. Earlier research has focused on cases where the estimates from the RCT can be adjusted b… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    MSC Class: 62D20; 62H12; 62H22

  8. arXiv:2111.15233  [pdf, other

    stat.ME

    Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency

    Authors: Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum

    Abstract: Semiparametric inference on average causal effects from observational data is based on assumptions yielding identification of the effects. In practice, several distinct identifying assumptions may be plausible; an analyst has to make a delicate choice between these models. In this paper, we study three identifying assumptions based on the potential outcome framework: the back-door assumption, whic… ▽ More

    Submitted 17 February, 2023; v1 submitted 30 November, 2021; originally announced November 2021.

    Journal ref: Journal of Machine Learning Research 24 (197), 1-65, 2023

  9. arXiv:2111.04513  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Clustering and Structural Robustness in Causal Diagrams

    Authors: Santtu Tikka, Jouni Helske, Juha Karvanen

    Abstract: Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagr… ▽ More

    Submitted 15 August, 2023; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: This is the version published in JMLR

    Journal ref: Journal of Machine Learning Research, 24(195):1-32, 2023

  10. arXiv:2008.13558  [pdf, other

    stat.ME stat.AP

    Simulation Framework for Realistic Large-scale Individual-level Data Generation with an Application in the Health Domain

    Authors: Santtu Tikka, Jussi Hakanen, Mirka Saarela, Juha Karvanen

    Abstract: We propose a framework for realistic data generation and simulation of complex systems and demonstrate its capabilities in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rig… ▽ More

    Submitted 5 June, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

  11. Do-search -- a tool for causal inference and study design with multiple data sources

    Authors: Juha Karvanen, Santtu Tikka, Antti Hyttinen

    Abstract: Epidemiological evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool,… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Journal ref: Epidemiology, 32(1), 111-119, 2020

  12. arXiv:2003.03187  [pdf, other

    stat.ME stat.CO

    Estimation of causal effects with small data in the presence of trapdoor variables

    Authors: Jouni Helske, Santtu Tikka, Juha Karvanen

    Abstract: We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples. This… ▽ More

    Submitted 24 March, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: 25 pages, 8 figures

    Journal ref: Journal of Royal Statistical Society: Series A. 2021, 184:1030-1051

  13. arXiv:1902.01073  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

    Authors: Santtu Tikka, Antti Hyttinen, Juha Karvanen

    Abstract: Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been consi… ▽ More

    Submitted 27 August, 2021; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: This is the version published in the Journal of Statistical Software

    Journal ref: Journal of Statistical Software, 99(5):1-40, 2021

  14. Surrogate Outcomes and Transportability

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Identification of causal effects is one of the most fundamental tasks of causal inference. We consider an identifiability problem where some experimental and observational data are available but neither data alone is sufficient for the identification of the causal effect of interest. Instead of the outcome of interest, surrogate outcomes are measured in the experiments. This problem is a generaliz… ▽ More

    Submitted 12 March, 2019; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: This is the version published in the International Journal of Approximate Reasoning

    Journal ref: International Journal of Approximate Reasoning, 2019; 108: 21-37

  15. Identifying Causal Effects with the R Package causaleffect

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Do-calculus is concerned with estimating the interventional distribution of an action from the observed joint probability distribution of the variables in a given causal structure. All identifiable causal effects can be derived using the rules of do-calculus, but the rules themselves do not give any direct indication whether the effect in question is identifiable or not. Shpitser and Pearl constru… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: This is the version published in the Journal of Statistical Software

    Journal ref: Journal of Statistical Software, 76(12):1-30, 2017

  16. arXiv:1806.07085  [pdf, ps, other

    stat.ML cs.LG

    Enhancing Identification of Causal Effects by Pruning

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Causal models communicate our assumptions about causes and effects in real-world phe- nomena. Often the interest lies in the identification of the effect of an action which means deriving an expression from the observed probability distribution for the interventional distribution resulting from the action. In many cases an identifiability algorithm may return a complicated expression that contains… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: This is the version published in JMLR

    Journal ref: Journal of Machine Learning Research (JMLR), 18(194):1-23, 2018

  17. arXiv:1806.07082  [pdf, other

    stat.ML cs.AI cs.LG

    Simplifying Probabilistic Expressions in Causal Inference

    Authors: Santtu Tikka, Juha Karvanen

    Abstract: Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are i… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: This is the version published in JMLR

    Journal ref: Journal of Machine Learning Research (JMLR), 18(36):1-30, 2017

  18. Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey

    Authors: Juho Kopra, Tommi Härkänen, Hanna Tolonen, Pekka Jousilahti, Kari Kuulasmaa, Jaakko Reinikainen, Juha Karvanen

    Abstract: Aims: A common objective of epidemiological surveys is to provide population-level estimates of health indicators. Survey results tend to be biased under selective non-participation. One approach to bias reduction is to collect information about non-participants by contacting them again and asking them to fill in a questionnaire. This information is called re-contact data, and it allows to adjust… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

    Comments: 16 pages, 4 tables, 0 figures

    Journal ref: Scandinavian Journal of Public Health, 2017

  19. arXiv:1610.03687  [pdf, other

    stat.ME stat.AP

    Bayesian models for data missing not at random in health examination surveys

    Authors: Juho Kopra, Juha Karvanen, Tommi Härkänen

    Abstract: In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data co… ▽ More

    Submitted 28 August, 2017; v1 submitted 12 October, 2016; originally announced October 2016.

    Comments: 19 pages, 2 figures

  20. arXiv:1609.08347  [pdf, ps, other

    math.ST stat.ME

    Optimal design of observational studies: overview and synthesis

    Authors: Juha Karvanen, Jarno Vanhatalo, Kari Auranen, Sangita Kulathinal, Samu Mäntyniemi

    Abstract: We review typical design problems encountered in the planning of observational studies and propose a unifying framework that allows us to use the same concepts and notation for different problems. In the framework, the design is defined as a probability measure in the space of observational processes that determine whether the value of a variable is observed for a specific unit at the given time.… ▽ More

    Submitted 1 November, 2017; v1 submitted 27 September, 2016; originally announced September 2016.

    Comments: Submitted

  21. Bayesian subcohort selection for longitudinal covariate measurements in follow-up studies

    Authors: Jaakko Reinikainen, Juha Karvanen

    Abstract: We consider planning longitudinal covariate measurements in follow-up studies where covariates are time-varying. We assume that the entire cohort cannot be selected for longitudinal measurements due to financial limitations and study how a subset of the cohort should be selected optimally in order to obtain precise estimates of covariate effects in a survival model. In our approach, the study will… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Journal ref: Statistica Neerlandica, 76(4), 372-390, 2022

  22. Prioritizing covariates in the planning of future studies in the meta-analytic framework

    Authors: Juha Karvanen, Mikko J. Sillanpää

    Abstract: Science can be seen as a sequential process where each new study augments evidence to the existing knowledge. To have the best prospects to make an impact in this process, a new study should be designed optimally taking into account the previous studies and other prior information. We propose a formal approach for the covariate prioritization, i.e., the decision about the covariates to be measured… ▽ More

    Submitted 8 August, 2016; originally announced August 2016.

    Journal ref: Biometrical Journal, Volume 59, Issue 1, Pages 110-125, 2017

  23. arXiv:1502.03609  [pdf, other

    stat.AP stat.ME

    Correcting for non-ignorable missingness in smoking trends

    Authors: Juho Kopra, Tommi Härkänen, Hanna Tolonen, Juha Karvanen

    Abstract: Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1… ▽ More

    Submitted 12 February, 2015; originally announced February 2015.

    Comments: in Stat, 2015

  24. arXiv:1403.1124  [pdf, ps, other

    stat.ME cs.LG math.ST stat.ML

    Estimating complex causal effects from incomplete observational data

    Authors: Juha Karvanen

    Abstract: Despite the major advances taken in causal modeling, causality is still an unfamiliar topic for many statisticians. In this paper, it is demonstrated from the beginning to the end how causal effects can be estimated from observational data assuming that the causal structure is known. To make the problem more challenging, the causal effects are highly nonlinear and the data are missing at random. T… ▽ More

    Submitted 2 July, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

  25. arXiv:1304.5380  [pdf, ps, other

    stat.AP q-fin.GN

    Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

    Authors: Juha Karvanen, Ari Rantanen, Lasse Luoma

    Abstract: We present a Bayesian framework for estimating the customer lifetime value (CLV) and the customer equity (CE) based on the purchasing behavior deducible from the market surveys on customer purchasing behavior. The proposed framework systematically addresses the challenges faced when the future value of customers is estimated based on survey data. The scarcity of the survey data and the sampling va… ▽ More

    Submitted 30 May, 2014; v1 submitted 19 April, 2013; originally announced April 2013.

    MSC Class: 62N02; 62-07; 62F15 ACM Class: G.3; J.1

    Journal ref: Quantitative Marketing and Economics, Volume 12, Issue 3, Pages 305-329, 2014

  26. arXiv:1211.2958  [pdf, ps, other

    stat.ME stat.AP stat.ML

    Study design in causal models

    Authors: Juha Karvanen

    Abstract: The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing data mechanism together with the causal structure and allow the direct application of cau… ▽ More

    Submitted 24 April, 2014; v1 submitted 13 November, 2012; originally announced November 2012.

    Comments: The example on the MORGAM Project extended is in this version

    MSC Class: 62A01; 62-09; 62F99; 62D05; 62P10; 62K99; 68T30 ACM Class: G.3; G.2.2

    Journal ref: Scandinavian Journal of Statistics, Volume 42, Issue 2, pages 361-377, 2015

  27. Characterizing the generalized lambda distribution by L-moments

    Authors: Juha Karvanen, Arto Nuutinen

    Abstract: The generalized lambda distribution (GLD) is a flexible four parameter distribution with many practical applications. L-moments of the GLD can be expressed in closed form and are good alternatives for the central moments. The L-moments of the GLD up to an arbitrary order are presented, and a study of L-skewness and L-kurtosis that can be achieved by the GLD is provided. The boundaries of L-skewn… ▽ More

    Submitted 26 June, 2007; v1 submitted 15 January, 2007; originally announced January 2007.

    Comments: Revised version, accepted for publication

    MSC Class: 60E05; 62E10; 62G30

    Journal ref: Computational Statistics & Data Analysis 2008, Vol. 52, 1971-1983

  28. Efficient initial designs for binary response data

    Authors: Juha Karvanen

    Abstract: In this paper we introduce a binary search algorithm that efficiently finds initial maximum likelihood estimates for sequential experiments where a binary response is modeled by a continuous factor. The problem is motivated by switching measurements on superconducting Josephson junctions. In this quantum mechanical experiment, the current is the factor controlled by the experimenter and a binary… ▽ More

    Submitted 6 February, 2008; v1 submitted 1 November, 2006; originally announced November 2006.

    MSC Class: 62L05; 62K05; 62P35

    Journal ref: Statistical Methodology 2008, Vol. 5, 462-473

  29. arXiv:cond-mat/0610507  [pdf, ps, other

    cond-mat.supr-con physics.data-an stat.AP

    Experimental Designs for Binary Data in Switching Measurements on Superconducting Josephson Junctions

    Authors: Juha Karvanen, Juha J. Vartiainen, Andrey Timofeev, Jukka Pekola

    Abstract: We study the optimal design of switching measurements of small Josephson junction circuits which operate in the macroscopic quantum tunnelling regime. Starting from the D-optimality criterion we derive the optimal design for the estimation of the unknown parameters of the underlying Gumbel type distribution. As a practical method for the measurements, we propose a sequential design that combines… ▽ More

    Submitted 18 October, 2006; originally announced October 2006.

    Journal ref: Journal of the Royal Statistical Society: Series C (Applied Statistics) 2007, Vol. 56, 167-181