-
Testing the Missing Completely at Random Assumption for Functional Data
Authors:
Maximilian Ofner,
Siegfried Hörmann,
David Kraus,
Dominik Liebl
Abstract:
We consider functional data which have only been observed on a subset of their domain. This paper aims to develop statistical tests to determine whether the function and the domain over which it is observed are independent. The assumption that data is missing completely at random (MCAR) is essential for many functional data methods handling incomplete observations. However, no general testing proc…
▽ More
We consider functional data which have only been observed on a subset of their domain. This paper aims to develop statistical tests to determine whether the function and the domain over which it is observed are independent. The assumption that data is missing completely at random (MCAR) is essential for many functional data methods handling incomplete observations. However, no general testing procedures have been established to validate this assumption. We address this critical gap by introducing such tests, along with their asymptotic theory and real data applications.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Vine copula based post-processing of ensemble forecasts for temperature
Authors:
Annette Möller,
Ludovica Spazzini,
Daniel Kraus,
Thomas Nagler,
Claudia Czado
Abstract:
Today weather forecasting is conducted using numerical weather prediction (NWP) models, consisting of a set of differential equations describing the dynamics of the atmosphere. The output of such NWP models are single deterministic forecasts of future atmospheric states. To assess uncertainty in NWP forecasts so-called forecast ensembles are utilized. They are generated by employing a NWP model fo…
▽ More
Today weather forecasting is conducted using numerical weather prediction (NWP) models, consisting of a set of differential equations describing the dynamics of the atmosphere. The output of such NWP models are single deterministic forecasts of future atmospheric states. To assess uncertainty in NWP forecasts so-called forecast ensembles are utilized. They are generated by employing a NWP model for distinct variants. However, as forecast ensembles are not able to capture the full amount of uncertainty in an NWP model, they often exhibit biases and dispersion errors. Therefore it has become common practise to employ statistical post processing models which correct for biases and improve calibration. We propose a novel post processing approach based on D-vine copulas, representing the predictive distribution by its quantiles. These models allow for much more general dependence structures than the state-of-the-art EMOS model and is highly data adapted. Our D-vine quantile regression approach shows excellent predictive performance in comparative studies of temperature forecasts over Europe with different forecast horizons based on the 52-member ensemble of the European Centre for Medium-Range Weather Forecasting (ECMWF). Specifically for larger forecast horizons the method clearly improves over the benchmark EMOS model.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Classification of functional fragments by regularized linear classifiers with domain selection
Authors:
David Kraus,
Marco Stefanucci
Abstract:
We consider the problem of classification of functional data into two groups by linear classifiers based on one-dimensional projections of functions. We reformulate the task to find the best classifier as an optimization problem and solve it by regularization techniques, namely the conjugate gradient method with early stopping, the principal component method and the ridge method. We study the empi…
▽ More
We consider the problem of classification of functional data into two groups by linear classifiers based on one-dimensional projections of functions. We reformulate the task to find the best classifier as an optimization problem and solve it by regularization techniques, namely the conjugate gradient method with early stopping, the principal component method and the ridge method. We study the empirical version with finite training samples consisting of incomplete functions observed on different subsets of the domain and show that the optimal, possibly zero, misclassification probability can be achieved in the limit along a possibly non-convergent empirical regularization path. Being able to work with fragmentary training data we propose a domain extension and selection procedure that finds the best domain beyond the common observation domain of all curves. In a simulation study we compare the different regularization methods and investigate the performance of domain selection. Our methodology is illustrated on a medical data set, where we observe a substantial improvement of classification accuracy due to domain extension.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
D-vine quantile regression with discrete variables
Authors:
Niklas Schallhorn,
Daniel Kraus,
Thomas Nagler,
Claudia Czado
Abstract:
Quantile regression, the prediction of conditional quantiles, finds applications in various fields. Often, some or all of the variables are discrete. The authors propose two new quantile regression approaches to handle such mixed discrete-continuous data. Both of them generalize the continuous D-vine quantile regression, where the dependence between the response and the covariates is modeled by a…
▽ More
Quantile regression, the prediction of conditional quantiles, finds applications in various fields. Often, some or all of the variables are discrete. The authors propose two new quantile regression approaches to handle such mixed discrete-continuous data. Both of them generalize the continuous D-vine quantile regression, where the dependence between the response and the covariates is modeled by a parametric D-vine. D-vine quantile regression provides very flexible models, that enable accurate and fast predictions. Moreover, it automatically takes care of major issues of classical quantile regression, such as quantile crossing and interactions between the covariates. The first approach keeps the parametric estimation of the D-vines, but modifies the formulas to account for the discreteness. The second approach estimates the D-vine using continuous convolution to make the discrete variables continuous and then estimates the D-vine nonparametrically. A simulation study is presented examining for which scenarios the discrete-continuous D-vine quantile regression can provide superior prediction abilities. Lastly, the functionality of the two introduced methods is demonstrated by a real-world example predicting the number of bike rentals.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Stress Testing German Industry Sectors: Results from a Vine Copula Based Quantile Regression
Authors:
Matthias Fischer,
Daniel Kraus,
Marius Pfeuffer,
Claudia Czado
Abstract:
Measuring interdependence between probabilities of default (PDs) in different industry sectors of an economy plays a crucial role in financial stress testing. Thereby, regression approaches may be employed to model the impact of stressed industry sectors as covariates on other response sectors. We identify vine copula based quantile regression as an eligible tool for conducting such stress tests a…
▽ More
Measuring interdependence between probabilities of default (PDs) in different industry sectors of an economy plays a crucial role in financial stress testing. Thereby, regression approaches may be employed to model the impact of stressed industry sectors as covariates on other response sectors. We identify vine copula based quantile regression as an eligible tool for conducting such stress tests as this method has good robustness properties, takes into account potential nonlinearities of conditional quantile functions and ensures that no quantile crossing effects occur. We illustrate its performance by a data set of sector specific PDs for the German economy. Empirical results are provided for a rough and a fine-grained industry sector classification scheme. Amongst others, we confirm that a stressed automobile industry has a severe impact on the German economy as a whole at different quantile levels whereas e.g., for a stressed financial sector the impact is rather moderate. Moreover, the vine copula based quantile regression approach is benchmarked against both classical linear quantile regression and expectile regression in order to illustrate its methodological effectiveness in the scenarios evaluated.
△ Less
Submitted 12 April, 2017; v1 submitted 4 April, 2017;
originally announced April 2017.
-
Growing simplified vine copula trees: improving Dißmann's algorithm
Authors:
Daniel Kraus,
Claudia Czado
Abstract:
Vine copulas are pair-copula constructions enabling multivariate dependence modeling in terms of bivariate building blocks. One of the main tasks of fitting a vine copula is the selection of a suitable tree structure. For this the prevalent method is a heuristic called Dißmann's algorithm. It sequentially constructs the vine's trees by maximizing dependence at each tree level, where dependence is…
▽ More
Vine copulas are pair-copula constructions enabling multivariate dependence modeling in terms of bivariate building blocks. One of the main tasks of fitting a vine copula is the selection of a suitable tree structure. For this the prevalent method is a heuristic called Dißmann's algorithm. It sequentially constructs the vine's trees by maximizing dependence at each tree level, where dependence is measured in terms of absolute Kendall's $τ$. However, the algorithm disregards any implications of the tree structure on the simplifying assumption that is usually made for vine copulas to keep inference tractable. We develop two new algorithms that select tree structures focused on producing simplified vine copulas for which the simplifying assumption is violated as little as possible. For this we make use of a recently developed statistical test of the simplifying assumption. In a simulation study we show that our proposed methods outperform the benchmark given by Dißmann's algorithm by a great margin. Several real data applications emphasize their practical relevance.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.
-
Using model distances to investigate the simplifying assumption, model selection and truncation levels for vine copulas
Authors:
Matthias Killiches,
Daniel Kraus,
Claudia Czado
Abstract:
Vine copulas are a useful statistical tool to describe the dependence structure between several random variables, especially when the number of variables is very large. When modeling data with vine copulas, one often is confronted with a set of candidate models out of which the best one is supposed to be selected. For example, this may arise in the context of non-simplified vine copulas, truncatio…
▽ More
Vine copulas are a useful statistical tool to describe the dependence structure between several random variables, especially when the number of variables is very large. When modeling data with vine copulas, one often is confronted with a set of candidate models out of which the best one is supposed to be selected. For example, this may arise in the context of non-simplified vine copulas, truncations of vines and other simplifications regarding pair-copula families or the vine structure. With the help of distance measures we develop a parametric bootstrap based testing procedure to decide between copulas from nested model classes. In addition we use distance measures to select among different candidate models. All commonly used distance measures, e.g. the Kullback-Leibler distance, suffer from the curse of dimensionality due to high-dimensional integrals. As a remedy for this problem, Killiches, Kraus and Czado (2017) propose several modifications of the Kullback-Leibler distance. We apply these distance measures to the above mentioned model selection problems and substantiate their usefulness.
△ Less
Submitted 9 May, 2017; v1 submitted 27 October, 2016;
originally announced October 2016.
-
Examination and visualisation of the simplifying assumption for vine copulas in three dimensions
Authors:
Matthias Killiches,
Daniel Kraus,
Claudia Czado
Abstract:
Vine copulas are a highly flexible class of dependence models, which are based on the decomposition of the density into bivariate building blocks. For applications one usually makes the simplifying assumption that copulas of conditional distributions are independent of the variables on which they are conditioned. However this assumption has been criticised for being too restrictive. We examine bot…
▽ More
Vine copulas are a highly flexible class of dependence models, which are based on the decomposition of the density into bivariate building blocks. For applications one usually makes the simplifying assumption that copulas of conditional distributions are independent of the variables on which they are conditioned. However this assumption has been criticised for being too restrictive. We examine both simplified and non-simplified vine copulas in three dimensions and investigate conceptual differences. We show and compare contour surfaces of three-dimensional vine copula models, which prove to be much more informative than the contour lines of the bivariate marginals. Our investigation shows that non-simplified vine copulas can exhibit arbitrarily irregular shapes, whereas simplified vine copulas appear to be smooth extrapolations of their bivariate margins to three dimensions. In addition to a variety of constructed examples, we also investigate a three-dimensional subset of the well-known uranium data set and visually detect that a non-simplified vine copula is necessary to capture its complex dependence structure.
△ Less
Submitted 28 October, 2016; v1 submitted 18 February, 2016;
originally announced February 2016.
-
D-vine copula based quantile regression
Authors:
Daniel Kraus,
Claudia Czado
Abstract:
Quantile regression, that is the prediction of conditional quantiles, has steadily gained importance in statistical modeling and financial applications. The authors introduce a new semiparametric quantile regression method based on sequentially fitting a likelihood optimal D-vine copula to given data resulting in highly flexible models with easily extractable conditional quantiles. As a subclass o…
▽ More
Quantile regression, that is the prediction of conditional quantiles, has steadily gained importance in statistical modeling and financial applications. The authors introduce a new semiparametric quantile regression method based on sequentially fitting a likelihood optimal D-vine copula to given data resulting in highly flexible models with easily extractable conditional quantiles. As a subclass of regular vine copulas, D-vines enable the modeling of multivariate copulas in terms of bivariate building blocks, a so-called pair-copula construction (PCC). The proposed algorithm works fast and accurate even in high dimensions and incorporates an automatic variable selection by maximizing the conditional log-likelihood. Further, typical issues of quantile regression such as quantile crossing or transformations, interactions and collinearity of variables are automatically taken care of. In a simulation study the improved accuracy and saved computational time of the approach in comparison with established quantile regression methods is highlighted. An extensive financial application to international credit default swap (CDS) data including stress testing and Value-at-Risk (VaR) prediction demonstrates the usefulness of the proposed method.
△ Less
Submitted 16 November, 2016; v1 submitted 14 October, 2015;
originally announced October 2015.
-
Model distances for vine copulas in high dimensions
Authors:
Matthias Killiches,
Daniel Kraus,
Claudia Czado
Abstract:
Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper we provide numerically tractable methods to measure the distance between two vine copulas even in high dimension…
▽ More
Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback-Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations we approximate these expectations on adequately designed grids, outperforming Monte Carlo-integration with respect to computational time. In numerous examples and applications we illustrate the strengths and weaknesses of the developed distance measures.
△ Less
Submitted 21 April, 2016; v1 submitted 13 October, 2015;
originally announced October 2015.