-
Doubly-Robust Functional Average Treatment Effect Estimation
Authors:
Lorenzo Testa,
Tobia Boschi,
Francesca Chiaromonte,
Edward H. Kennedy,
Matthew Reimherr
Abstract:
Understanding causal relationships in the presence of complex, structured data remains a central challenge in modern statistics and science in general. While traditional causal inference methods are well-suited for scalar outcomes, many scientific applications demand tools capable of handling functional data -- outcomes observed as functions over continuous domains such as time or space. Motivated…
▽ More
Understanding causal relationships in the presence of complex, structured data remains a central challenge in modern statistics and science in general. While traditional causal inference methods are well-suited for scalar outcomes, many scientific applications demand tools capable of handling functional data -- outcomes observed as functions over continuous domains such as time or space. Motivated by this need, we propose DR-FoS, a novel method for estimating the Functional Average Treatment Effect (FATE) in observational studies with functional outcomes. DR-FoS exhibits double robustness properties, ensuring consistent estimation of FATE even if either the outcome or the treatment assignment model is misspecified. By leveraging recent advances in functional data analysis and causal inference, we establish the asymptotic properties of the estimator, proving its convergence to a Gaussian process. This guarantees valid inference with simultaneous confidence bands across the entire functional domain. Through extensive simulations, we show that DR-FoS achieves robust performance under a wide range of model specifications. Finally, we illustrate the utility of DR-FoS in a real-world application, analyzing functional outcomes to uncover meaningful causal insights in the SHARE (Survey of Health, Aging and Retirement in Europe) dataset.
△ Less
Submitted 2 May, 2025; v1 submitted 10 January, 2025;
originally announced January 2025.
-
A new computationally efficient algorithm to solve Feature Selection for Functional Data Classification in high-dimensional spaces
Authors:
Tobia Boschi,
Francesca Bonin,
Rodrigo Ordonez-Hurtado,
Alessandra Pascale,
Jonathan Epperlein
Abstract:
This paper introduces a novel methodology for Feature Selection for Functional Classification, FSFC, that addresses the challenge of jointly performing feature selection and classification of functional data in scenarios with categorical responses and multivariate longitudinal features. FSFC tackles a newly defined optimization problem that integrates logistic loss and functional features to ident…
▽ More
This paper introduces a novel methodology for Feature Selection for Functional Classification, FSFC, that addresses the challenge of jointly performing feature selection and classification of functional data in scenarios with categorical responses and multivariate longitudinal features. FSFC tackles a newly defined optimization problem that integrates logistic loss and functional features to identify the most crucial variables for classification. To address the minimization procedure, we employ functional principal components and develop a new adaptive version of the Dual Augmented Lagrangian algorithm. The computational efficiency of FSFC enables handling high-dimensional scenarios where the number of features may considerably exceed the number of statistical units. Simulation experiments demonstrate that FSFC outperforms other machine learning and deep learning methods in computational time and classification accuracy. Furthermore, the FSFC feature selection capability can be leveraged to significantly reduce the problem's dimensionality and enhance the performances of other classification algorithms. The efficacy of FSFC is also demonstrated through a real data application, analyzing relationships between four chronic diseases and other health and demographic factors.
△ Less
Submitted 5 March, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Contrasting pre-vaccine COVID-19 waves in Italy through Functional Data Analysis
Authors:
Tobia Boschi,
Jacopo Di Iorio,
Lorenzo Testa,
Marzia A. Cremona,
Francesca Chiaromonte
Abstract:
We use data from 107 Italian provinces to characterize and compare mortality patterns in the first two COVID-19 epidemic waves, which occurred prior to the introduction of vaccines. We also associate these patterns with mobility, timing of government restrictions, and socio-demographic, infrastructural, and environmental covariates. Notwithstanding limitations in the accuracy and reliability of pu…
▽ More
We use data from 107 Italian provinces to characterize and compare mortality patterns in the first two COVID-19 epidemic waves, which occurred prior to the introduction of vaccines. We also associate these patterns with mobility, timing of government restrictions, and socio-demographic, infrastructural, and environmental covariates. Notwithstanding limitations in the accuracy and reliability of publicly available data, we are able to exploit information in curves and shapes through Functional Data Analysis techniques. Specifically, we document differences in magnitude and variability between the two waves; while both were characterized by a co-occurrence of 'exponential' and 'mild' mortality patterns, the second spread much more broadly and asynchronously through the country. Moreover, we find evidence of a significant positive association between local mobility and mortality in both epidemic waves and corroborate the effectiveness of timely restrictions in curbing mortality. The techniques we describe could capture additional signals of interest if applied, for instance, to data on cases and positivity rates. However, we show that the quality of such data, at least in the case of Italian provinces, was too poor to support meaningful analyses.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions
Authors:
Tobia Boschi,
Lorenzo Testa,
Francesca Chiaromonte,
Matthew Reimherr
Abstract:
Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-funct…
▽ More
Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method, called FAStEN, combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. In addition, we derive asymptotic oracle properties, which guarantee estimation and selection consistency for the proposed FAStEN estimator. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance, without sacrificing the quality of the coefficients' estimation. The theoretical derivations and the simulation study provide a strong motivation for our approach. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study. Complete FAStEN code is provided at https://github.com/IBM/funGCN.
△ Less
Submitted 17 October, 2024; v1 submitted 26 March, 2023;
originally announced March 2023.
-
The shapes of an epidemic: using Functional Data Analysis to characterize COVID-19 in Italy
Authors:
Tobia Boschi,
Jacopo Di Iorio,
Lorenzo Testa,
Marzia A. Cremona,
Francesca Chiaromonte
Abstract:
We investigate patterns of COVID-19 mortality across 20 Italian regions and their association with mobility, positivity, and socio-demographic, infrastructural and environmental covariates. Notwithstanding limitations in accuracy and resolution of the data available from public sources, we pinpoint significant trends exploiting information in curves and shapes with Functional Data Analysis techniq…
▽ More
We investigate patterns of COVID-19 mortality across 20 Italian regions and their association with mobility, positivity, and socio-demographic, infrastructural and environmental covariates. Notwithstanding limitations in accuracy and resolution of the data available from public sources, we pinpoint significant trends exploiting information in curves and shapes with Functional Data Analysis techniques. These depict two starkly different epidemics; an "exponential" one unfolding in Lombardia and the worst hit areas of the north, and a milder, "flat(tened)" one in the rest of the country -- including Veneto, where cases appeared concurrently with Lombardia but aggressive testing was implemented early on. We find that mobility and positivity can predict COVID-19 mortality, also when controlling for relevant covariates. Among the latter, primary care appears to mitigate mortality, and contacts in hospitals, schools and work places to aggravate it. The techniques we describe could capture additional and potentially sharper signals if applied to richer data.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
An Efficient Semi-smooth Newton Augmented Lagrangian Method for Elastic Net
Authors:
Tobia Boschi,
Matthew Reimherr,
Francesca Chiaromonte
Abstract:
Feature selection is an important and active research area in statistics and machine learning. The Elastic Net is often used to perform selection when the features present non-negligible collinearity or practitioners wish to incorporate additional known structure. In this article, we propose a new Semi-smooth Newton Augmented Lagrangian Method to efficiently solve the Elastic Net in ultra-high dim…
▽ More
Feature selection is an important and active research area in statistics and machine learning. The Elastic Net is often used to perform selection when the features present non-negligible collinearity or practitioners wish to incorporate additional known structure. In this article, we propose a new Semi-smooth Newton Augmented Lagrangian Method to efficiently solve the Elastic Net in ultra-high dimensional settings. Our new algorithm exploits both the sparsity induced by the Elastic Net penalty and the sparsity due to the second order information of the augmented Lagrangian. This greatly reduces the computational cost of the problem. Using simulations on both synthetic and real datasets, we demonstrate that our approach outperforms its best competitors by at least an order of magnitude in terms of CPU time. We also apply our approach to a Genome Wide Association Study on childhood obesity.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy
Authors:
Paolo Cintia,
Luca Pappalardo,
Salvatore Rinzivillo,
Daniele Fadda,
Tobia Boschi,
Fosca Giannotti,
Francesca Chiaromonte,
Pietro Bonato,
Francesco Fabbri,
Francesco Penone,
Marcello Savarese,
Francesco Calabrese,
Giorgio Guzzetta,
Flavia Riccardo,
Valentina Marziano,
Piero Poletti,
Filippo Trentini,
Antonino Bella,
Xanthi Andrianou,
Martina Del Manso,
Massimo Fabiani,
Stefania Bellino,
Stefano Boros,
Alberto Mateo Urdiales,
Maria Fenicia Vescio
, et al. (7 additional authors not shown)
Abstract:
In 2020, countries affected by the COVID-19 pandemic implemented various non-pharmaceutical interventions to contrast the spread of the virus and its impact on their healthcare systems and economies. Using Italian data at different geographic scales, we investigate the relationship between human mobility, which subsumes many facets of the population's response to the changing situation, and the sp…
▽ More
In 2020, countries affected by the COVID-19 pandemic implemented various non-pharmaceutical interventions to contrast the spread of the virus and its impact on their healthcare systems and economies. Using Italian data at different geographic scales, we investigate the relationship between human mobility, which subsumes many facets of the population's response to the changing situation, and the spread of COVID-19. Leveraging mobile phone data from February through September 2020, we find a striking relationship between the decrease in mobility flows and the net reproduction number. We find that the time needed to switch off mobility and bring the net reproduction number below the critical threshold of 1 is about one week. Moreover, we observe a strong relationship between the number of days spent above such threshold before the lockdown-induced drop in mobility flows and the total number of infections per 100k inhabitants. Estimating the statistical effect of mobility flows on the net reproduction number over time, we document a 2-week lag positive association, strong in March and April, and weaker but still significant in June. Our study demonstrates the value of big mobility data to monitor the epidemic and inform control interventions during its unfolding.
△ Less
Submitted 1 April, 2021; v1 submitted 4 June, 2020;
originally announced June 2020.