-
Improving Genomic Prediction using High-dimensional Secondary Phenotypes: the Genetic Latent Factor Approach
Authors:
Killian A. C. Melsen,
Jonathan F. Kunst,
José Crossa,
Margaret R. Krause,
Fred A. van Eeuwijk,
Willem Kruijer,
Carel F. W. Peeters
Abstract:
Decreasing costs and new technologies have led to an increase in the amount of data available to plant breeding programs. High-throughput phenotyping (HTP) platforms routinely generate high-dimensional datasets of secondary features that may be used to improve genomic prediction accuracy. However, integration of these data comes with challenges such as multicollinearity, parameter estimation in…
▽ More
Decreasing costs and new technologies have led to an increase in the amount of data available to plant breeding programs. High-throughput phenotyping (HTP) platforms routinely generate high-dimensional datasets of secondary features that may be used to improve genomic prediction accuracy. However, integration of these data comes with challenges such as multicollinearity, parameter estimation in $p > n$ settings, and the computational complexity of many standard approaches. Several methods have emerged to analyze such data, but interpretation of model parameters often remains challenging. We propose genetic latent factor best linear unbiased prediction (glfBLUP), a prediction pipeline that reduces the dimensionality of the original secondary HTP data using generative factor analysis. In short, glfBLUP uses redundancy filtered and regularized genetic and residual correlation matrices to fit a maximum likelihood factor model and estimate genetic latent factor scores. These latent factors are subsequently used in multi-trait genomic prediction. Our approach performs better than alternatives in extensive simulations and a real-world application, while producing easily interpretable and biologically relevant parameters. We discuss several possible extensions and highlight glfBLUP as the basis for a flexible and modular multi-trait genomic prediction framework.
△ Less
Submitted 4 March, 2025; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Misspecification in mixed-model based association analysis
Authors:
Willem Kruijer
Abstract:
Additive genetic variance in natural populations is commonly estimated using mixed models, in which the covariance of the genetic effects is modeled by a genetic similarity matrix derived from a dense set of markers. An important but usually implicit assumption is that the presence of any non-additive genetic effect only increases the residual variance, and does not affect estimates of additive ge…
▽ More
Additive genetic variance in natural populations is commonly estimated using mixed models, in which the covariance of the genetic effects is modeled by a genetic similarity matrix derived from a dense set of markers. An important but usually implicit assumption is that the presence of any non-additive genetic effect only increases the residual variance, and does not affect estimates of additive genetic variance. Here we show that this is only true for panels of unrelated individuals. In case there is genetic relatedness, the combination of population structure and epistatic interactions can lead to inflated estimates of additive genetic variance.
△ Less
Submitted 8 September, 2015;
originally announced September 2015.
-
Marker-based estimation of heritability in immortal populations
Authors:
Willem Kruijer,
Martin Boer,
Marcos Malosetti,
Padraic J. Flood,
Bas Engel,
Rik Kooke,
Joost Keurentjes,
Fred van Eeuwijk
Abstract:
Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing inte…
▽ More
Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here, is to use mixed models at individual plant or plot level. Using statistical arguments, simulations and real data we investigate the feasibility of both approaches, and how these affect genomic prediction with G-BLUP and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For GWAS on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.
△ Less
Submitted 16 February, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.