-
Machine learning approaches to identify thresholds in a heat-health warning system context
Authors:
Pierre Masselot,
Fateh Chebana,
Céline Campagna,
Eric Lavigne,
Taha B. M. J. Ouarda,
Pierre Gosselin
Abstract:
During the last two decades, a number of countries or cities established heat-health warning systems in order to alert public health authorities when some heat indicator exceeds a predetermined threshold. Different methods were considered to establish thresholds all over the world, each with its own strengths and weaknesses. The common ground is that current methods are based on exposure-response…
▽ More
During the last two decades, a number of countries or cities established heat-health warning systems in order to alert public health authorities when some heat indicator exceeds a predetermined threshold. Different methods were considered to establish thresholds all over the world, each with its own strengths and weaknesses. The common ground is that current methods are based on exposure-response function estimates that can fail in many situations. The present paper aims at proposing several data-driven methods to establish thresholds using historical data of health issues and environmental indicators. The proposed methods are model-based regression trees (MOB), multivariate adaptive regression splines (MARS), the patient rule-induction method (PRIM) and adaptive index models (AIM). These methods focus on finding relevant splits in the association between indicators and the health outcome but do it in different fashions. A simulation study and a real-world case study hereby compare the discussed methods. Results show that proposed methods are better at predicting adverse days than current thresholds and benchmark methods. The results nonetheless suggest that PRIM is overall the more reliable method with low variability of results according to the scenario or case.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
A new look at weather-related health impacts through functional regression
Authors:
Pierre Masselot,
Fateh Chebana,
Taha B. M. J. Ouarda,
Diane Bélanger,
André St-Hilaire,
Pierre Gosselin
Abstract:
A major challenge of climate change adaptation is to assess the effect of changing weather on human health. In spite of an increasing literature on the weather-related health subject, many aspect of the relationship are not known, limiting the predictive power of epidemiologic models. The present paper proposes new models to improve the performances of the currently used ones. The proposed models…
▽ More
A major challenge of climate change adaptation is to assess the effect of changing weather on human health. In spite of an increasing literature on the weather-related health subject, many aspect of the relationship are not known, limiting the predictive power of epidemiologic models. The present paper proposes new models to improve the performances of the currently used ones. The proposed models are based on functional data analysis (FDA), a statistical framework dealing with continuous curves instead of scalar time series. The models are applied to the temperature-related cardiovascular mortality issue in Montreal. By making use of the whole information available, the proposed models improve the prediction of cardiovascular mortality according to temperature. In addition, results shed new lights on the relationship by quantifying physiological adaptation effects. These results, not found with classical model, illustrate the potential of FDA approaches.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality
Authors:
Pierre Masselot,
Fateh Chebana,
Diane Bélanger,
André St-Hilaire,
Belkacem Abdous,
Pierre Gosselin,
Taha B. M. J. Ouarda
Abstract:
In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated seri…
▽ More
In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality
Authors:
Pierre Masselot,
Fateh Chebana,
Diane Bélanger,
André St-Hilaire,
Belkacem Abdous,
Pierre Gosselin,
Taha B. M. J. Ouarda
Abstract:
In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology…
▽ More
In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
△ Less
Submitted 7 September, 2017;
originally announced September 2017.
-
Streamflow forecasting using functional regression
Authors:
Pierre Masselot,
Sophie Dabo-Niang,
Fateh Chebana,
Taha B. M. J. Ouarda
Abstract:
Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression…
▽ More
Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression models based on curves instead of single values. They allow to consider the whole process instead of a limited number of time points or features. We apply these models to analyse the flow volume and the whole streamflow curve during a given period by using precipitations curves. The functional model is shown to lead to encouraging results. The potential of functional linear models to detect special features that would have been hard to see otherwise is pointed out. The functional model is also compared to the artificial neural network approach and the advantages and disadvantages of both models are discussed. Finally, future research directions involving the functional model in hydrology are presented.
△ Less
Submitted 19 October, 2016;
originally announced October 2016.
-
Fast and direct nonparametric procedures in the L-moment homogeneity test
Authors:
Pierre Masselot,
Fateh Chebana,
Taha B. M. J. Ouarda
Abstract:
Regional frequency analysis is an important tool to properly estimate hydrological characteristics at ungauged or partially gauged sites in order to prevent hydrological disasters. The delineation of homogeneous groups of sites is an important first step in order to transfer information and obtain accurate quantile estimates at the target site. The Hosking-Wallis homogeneity test is usually used t…
▽ More
Regional frequency analysis is an important tool to properly estimate hydrological characteristics at ungauged or partially gauged sites in order to prevent hydrological disasters. The delineation of homogeneous groups of sites is an important first step in order to transfer information and obtain accurate quantile estimates at the target site. The Hosking-Wallis homogeneity test is usually used to test the homogeneity of the selected sites. Despite its usefulness and good power, it presents some drawbacks including the subjective choice of a parametric distribution for the data and a poorly justified rejection threshold. The present paper addresses these drawbacks by integrating nonparametric procedures in the L-moment homogeneity test. To assess the rejection threshold, three resampling methods (permutation, bootstrap and Pólya resampling) are considered. Results indicate that permutation and bootstrap methods perform better than the parametric Hosking-Wallis test in terms of power as well as in time and procedure simplicity. A real-world case study shows that the nonparametric tests agree with the HW test concerning the homogeneity of the volume and the bivariate case while they disagree for the peak case, but that the assumptions of the HW test are not well respected.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
Usefulness of the Reversible Jump Markov Chain Monte Carlo Model in Regional Flood Frequency Analysis
Authors:
Mathieu Ribatet,
Eric Sauquet,
Jean-Michel Grésillon,
Taha B. M. J. Ouarda
Abstract:
Regional flood frequency analysis is a convenient way to reduce estimation uncertainty when few data are available at the gauging site. In this work, a model that allows a non-null probability to a regional fixed shape parameter is presented. This methodology is integrated within a Bayesian framework and uses reversible jump techniques. The performance on stochastic data of this new estimator is…
▽ More
Regional flood frequency analysis is a convenient way to reduce estimation uncertainty when few data are available at the gauging site. In this work, a model that allows a non-null probability to a regional fixed shape parameter is presented. This methodology is integrated within a Bayesian framework and uses reversible jump techniques. The performance on stochastic data of this new estimator is compared to two other models: a conventional Bayesian analysis and the index flood approach. Results show that the proposed estimator is absolutely suited to regional estimation when only a few data are available at the target site. Moreover, unlike the index flood estimator, target site index flood error estimation seems to have less impact on Bayesian estimators. Some suggestions about configurations of the pooling groups are also presented to increase the performance of each estimator.
△ Less
Submitted 4 February, 2008;
originally announced February 2008.
-
Modeling All Exceedances Above a Threshold Using an Extremal Dependence Structure: Inferences on Several Flood Characteristics
Authors:
Mathieu Ribatet,
Taha B. M. J. Ouarda,
Eric Sauquet,
Jean-Michel Grésillon
Abstract:
Flood quantile estimation is of great importance for many engineering studies and policy decisions. However, practitioners must often deal with small data available. Thus, the information must be used optimally. In the last decades, to reduce the waste of data, inferential methodology has evolved from annual maxima modeling to peaks over a threshold one. To mitigate the lack of data, peaks over…
▽ More
Flood quantile estimation is of great importance for many engineering studies and policy decisions. However, practitioners must often deal with small data available. Thus, the information must be used optimally. In the last decades, to reduce the waste of data, inferential methodology has evolved from annual maxima modeling to peaks over a threshold one. To mitigate the lack of data, peaks over a threshold are sometimes combined with additional information - mostly regional and historical information. However, whatever the extra information is, the most precious information for the practitioner is found at the target site. In this study, a model that allows inferences on the whole time series is introduced. In particular, the proposed model takes into account the dependence between successive extreme observations using an appropriate extremal dependence structure. Results show that this model leads to more accurate flood peak quantile estimates than conventional estimators. In addition, as the time dependence is taken into account, inferences on other flood characteristics can be performed. An illustration is given on flood duration. Our analysis shows that the accuracy of the proposed models to estimate the flood duration is related to specific catchment characteristics. Some suggestions to increase the flood duration predictions are introduced.
△ Less
Submitted 4 February, 2008;
originally announced February 2008.
-
A regional Bayesian POT model for flood frequency analysis
Authors:
Mathieu Ribatet,
Eric Sauquet,
Jean-Michel Grésillon,
Taha B. M. J. Ouarda
Abstract:
Flood frequency analysis is usually based on the fitting of an extreme value distribution to the local streamflow series. However, when the local data series is short, frequency analysis results become unreliable. Regional frequency analysis is a convenient way to reduce the estimation uncertainty. In this work, we propose a regional Bayesian model for short record length sites. This model is le…
▽ More
Flood frequency analysis is usually based on the fitting of an extreme value distribution to the local streamflow series. However, when the local data series is short, frequency analysis results become unreliable. Regional frequency analysis is a convenient way to reduce the estimation uncertainty. In this work, we propose a regional Bayesian model for short record length sites. This model is less restrictive than the index flood model while preserving the formalism of "homogeneous regions". The performance of the proposed model is assessed on a set of gauging stations in France. The accuracy of quantile estimates as a function of the degree of homogeneity of the pooling group is also analysed. The results indicate that the regional Bayesian model outperforms the index flood model and local estimators. Furthermore, it seems that working with relatively large and homogeneous regions may lead to more accurate results than working with smaller and highly homogeneous regions.
△ Less
Submitted 4 February, 2008;
originally announced February 2008.