-
REML Implementations of Kernel-based Multi-environment Genomic Prediction Models
Authors:
Killian A. C. Melsen,
Salvador Gezan,
Fred van Eeuwijk,
Carel F. W. Peeters
Abstract:
High-throughput pheno-, geno-, and envirotyping allows routine characterization of plant varieties and the trials they are evaluated in. These datasets can be integrated into statistical models for genomic prediction in several ways. One approach is to create linear or non-linear kernels which are subsequently used in reproducing kernel hilbert spaces (RKHS) regression. Software packages implement…
▽ More
High-throughput pheno-, geno-, and envirotyping allows routine characterization of plant varieties and the trials they are evaluated in. These datasets can be integrated into statistical models for genomic prediction in several ways. One approach is to create linear or non-linear kernels which are subsequently used in reproducing kernel hilbert spaces (RKHS) regression. Software packages implementing a Bayesian approach are typically used for these RKHS models. However, they often lack some of the flexibility offered by dedicated linear mixed model software such as ASReml-R. Furthermore, a Bayesian approach is often computationally more demanding than a frequentist model. Here we show how frequentist RKHS models can be implemented in ASReml-R and extend these models to allow for heterogeneous (i.e., trial-specific) genetic variances. We also show how an alternative to the typically Bayesian kernel averaging approach can be implemented by treating the bandwidth associated with the non-linear kernel as a parameter to be estimated using restricted maximum likelihood. We show that these REML implementations with homo- or heterogeneous variances perform similarly or better than the Bayesian models. We also show that the REML implementation comes with a significant increase in computational efficiency, being up to 12 times faster than the Bayesian models while using less memory. Finally, we discuss the significant flexibility provided by this approach and the options regarding further customization of variance models.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Improving Genomic Prediction using High-dimensional Secondary Phenotypes: the Genetic Latent Factor Approach
Authors:
Killian A. C. Melsen,
Jonathan F. Kunst,
José Crossa,
Margaret R. Krause,
Fred A. van Eeuwijk,
Willem Kruijer,
Carel F. W. Peeters
Abstract:
Decreasing costs and new technologies have led to an increase in the amount of data available to plant breeding programs. High-throughput phenotyping (HTP) platforms routinely generate high-dimensional datasets of secondary features that may be used to improve genomic prediction accuracy. However, integration of these data comes with challenges such as multicollinearity, parameter estimation in…
▽ More
Decreasing costs and new technologies have led to an increase in the amount of data available to plant breeding programs. High-throughput phenotyping (HTP) platforms routinely generate high-dimensional datasets of secondary features that may be used to improve genomic prediction accuracy. However, integration of these data comes with challenges such as multicollinearity, parameter estimation in $p > n$ settings, and the computational complexity of many standard approaches. Several methods have emerged to analyze such data, but interpretation of model parameters often remains challenging. We propose genetic latent factor best linear unbiased prediction (glfBLUP), a prediction pipeline that reduces the dimensionality of the original secondary HTP data using generative factor analysis. In short, glfBLUP uses redundancy filtered and regularized genetic and residual correlation matrices to fit a maximum likelihood factor model and estimate genetic latent factor scores. These latent factors are subsequently used in multi-trait genomic prediction. Our approach performs better than alternatives in extensive simulations and a real-world application, while producing easily interpretable and biologically relevant parameters. We discuss several possible extensions and highlight glfBLUP as the basis for a flexible and modular multi-trait genomic prediction framework.
△ Less
Submitted 4 March, 2025; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Spatial Models for Field Trials
Authors:
María Xosé Rodríguez-Álvarez,
Martin P. Boer,
Fred A. van Eeuwijk,
Paul H. C. Eilers
Abstract:
An important aim of the analysis of agricultural field trials is to obtain good predictions for genotypic performance, by correcting for spatial effects. In practice these corrections turn out to be complicated, since there can be different types of spatial effects; those due to management interventions applied to the field plots and those due to various kinds of erratic spatial trends. This paper…
▽ More
An important aim of the analysis of agricultural field trials is to obtain good predictions for genotypic performance, by correcting for spatial effects. In practice these corrections turn out to be complicated, since there can be different types of spatial effects; those due to management interventions applied to the field plots and those due to various kinds of erratic spatial trends. This paper presents models for field trials in which the random spatial component consists of tensor product Penalized splines (P-splines). A special ANOVA-type reformulation leads to five smooth additive spatial components, which form the basis of a mixed model with five unknown variance components. On top of this spatial field, effects of genotypes, blocks, replicates, and/or other sources of spatial variation are described by a mixed model in a standard way. We show the relation between several definitions of heritability and the effective dimension or the effective degrees of freedom associated to the genetic component. The approach is illustrated with large-scale field trial experiments. An R-package is provided.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
Marker-based estimation of heritability in immortal populations
Authors:
Willem Kruijer,
Martin Boer,
Marcos Malosetti,
Padraic J. Flood,
Bas Engel,
Rik Kooke,
Joost Keurentjes,
Fred van Eeuwijk
Abstract:
Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing inte…
▽ More
Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here, is to use mixed models at individual plant or plot level. Using statistical arguments, simulations and real data we investigate the feasibility of both approaches, and how these affect genomic prediction with G-BLUP and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For GWAS on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.
△ Less
Submitted 16 February, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.