-
Can Generalized Extreme Value Model Fit the Real Stocks
Authors:
Sen Lin,
Ao Kong,
Robert Azencott
Abstract:
The Generalized Extreme Value (GEV) distribution plays a critical role in risk assessment across various domains, such as hydrology, climate science, and finance. In this study, we investigate its application in analyzing intraday trading risks within the Chinese stock market, focusing on abrupt price movements influenced by unique trading regulations. To address limitations of traditional GEV par…
▽ More
The Generalized Extreme Value (GEV) distribution plays a critical role in risk assessment across various domains, such as hydrology, climate science, and finance. In this study, we investigate its application in analyzing intraday trading risks within the Chinese stock market, focusing on abrupt price movements influenced by unique trading regulations. To address limitations of traditional GEV parameter estimators, we leverage recently developed robust and asymptotically normal estimators, enabling accurate modeling of extreme intraday price fluctuations. We introduce two risk indicators: the mean risk level (mEVI) and a Stability Indicator (STI) to evaluate the stability of the shape parameter over time. Using data from 261 Chinese and 32 U.S. stocks (2015-2017), we find that Chinese stocks exhibit higher mEVI, corresponding to greater tail risk, while maintaining high model stability. Additionally, we show that Value at Risk (VaR) estimates derived from our GEV models outperform traditional GP and normal-based VaR methods in terms of variance and portfolio optimization. These findings underscore the versatility and efficiency of GEV modeling for intraday risk management and portfolio strategies.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Multi-Quantile Estimators for the parameters of Generalized Extreme Value distribution
Authors:
Sen Lin,
Ao Kong,
Robert Azencott
Abstract:
We introduce and study Multi-Quantile estimators for the parameters $( ξ, σ, μ)$ of Generalized Extreme Value (GEV) distributions to provide a robust approach to extreme value modeling. Unlike classical estimators, such as the Maximum Likelihood Estimation (MLE) estimator and the Probability Weighted Moments (PWM) estimator, which impose strict constraints on the shape parameter $ξ$, our estimator…
▽ More
We introduce and study Multi-Quantile estimators for the parameters $( ξ, σ, μ)$ of Generalized Extreme Value (GEV) distributions to provide a robust approach to extreme value modeling. Unlike classical estimators, such as the Maximum Likelihood Estimation (MLE) estimator and the Probability Weighted Moments (PWM) estimator, which impose strict constraints on the shape parameter $ξ$, our estimators are always asymptotically normal and consistent across all values of the GEV parameters. The asymptotic variances of our estimators decrease with the number of quantiles increasing and can approach the Cramér-Rao lower bound very closely whenever it exists. Our Multi-Quantile Estimators thus offer a more flexible and efficient alternative for practical applications. We also discuss how they can be implemented in the context of Block Maxima method.
△ Less
Submitted 27 February, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Participation bias in the estimation of heritability and genetic correlation
Authors:
Shuang Song,
Stefania Benonisdottir,
Jun S. Liu,
Augustine Kong
Abstract:
It is increasingly recognized that participation bias can pose problems for genetic studies. Recently, to overcome the challenge that genetic information of non-participants is unavailable, it is shown that by comparing the IBD (identity by descent) shared and not-shared segments among the participants, one can estimate the genetic component underlying participation. That, however, does not direct…
▽ More
It is increasingly recognized that participation bias can pose problems for genetic studies. Recently, to overcome the challenge that genetic information of non-participants is unavailable, it is shown that by comparing the IBD (identity by descent) shared and not-shared segments among the participants, one can estimate the genetic component underlying participation. That, however, does not directly address how to adjust estimates of heritability and genetic correlation for phenotypes correlated with participation. Here, for phenotypes whose mean differences between population and sample are known, we demonstrate a way to do so by adopting a statistical framework that separates out the genetic and non-genetic correlations between participation and these phenotypes. Crucially, our method avoids making the assumption that the effect of the genetic component underlying participation is manifested entirely through these other phenotypes. Applying the method to 12 UK Biobank phenotypes, we found 8 have significant genetic correlations with participation, including body mass index, educational attainment, and smoking status. For most of these phenotypes, without adjustments, estimates of heritability and the absolute value of genetic correlation would have underestimation biases.
△ Less
Submitted 20 November, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Markov Random Fields and Mass Spectra Discrimination
Authors:
Ao Kong,
Robert Azencott
Abstract:
For mass spectra acquired from cancer patients by MALDI or SELDI techniques, automated discrimination between cancer types or stages has often been implemented by machine learnings. These techniques typically generate "black-box" classifiers, which are difficult to interpret biologically. We develop new and efficient signature discovery algorithms leading to interpretable signatures combining the…
▽ More
For mass spectra acquired from cancer patients by MALDI or SELDI techniques, automated discrimination between cancer types or stages has often been implemented by machine learnings. These techniques typically generate "black-box" classifiers, which are difficult to interpret biologically. We develop new and efficient signature discovery algorithms leading to interpretable signatures combining the discriminating power of explicitly selected small groups of biomarkers, identified by their m/z ratios. Our approach is based on rigorous stochastic modeling of "homogeneous" datasets of mass spectra by a versatile class of parameterized Markov Random Fields. We present detailed algorithms validated by precise theoretical results. We also outline the successful tests of our approach to generate efficient explicit signatures for six benchmark discrimination tasks, based on mass spectra acquired from colorectal cancer patients, as well as from ovarian cancer patients.
△ Less
Submitted 13 October, 2014;
originally announced October 2014.
-
Rejoinder: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies
Authors:
Dan L. Nicolae,
Xiao-Li Meng,
Augustine Kong
Abstract:
Rejoinder to "Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies" [arXiv:1102.2774]
Rejoinder to "Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies" [arXiv:1102.2774]
△ Less
Submitted 15 February, 2011;
originally announced February 2011.
-
Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies
Authors:
Dan L. Nicolae,
Xiao-Li Meng,
Augustine Kong
Abstract:
Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as…
▽ More
Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.
△ Less
Submitted 14 February, 2011;
originally announced February 2011.
-
Unsupervised empirical Bayesian multiple testing with external covariates
Authors:
Egil Ferkingstad,
Arnoldo Frigessi,
Håvard Rue,
Gudmar Thorleifsson,
Augustine Kong
Abstract:
In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a l…
▽ More
In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.
△ Less
Submitted 29 July, 2008;
originally announced July 2008.