Jinfeng Zhng,Hrsimreep S.Gill,Nvreet K.Brr,Jyotirmoy Hler,Shukt Ali,Xiotin Liu,Amy Bernro,Pul St.Amn,Guihu Bi,Upiner S.Gill,Brent Turnipsee,Sunish K.Sehgl,*
a Department of Agronomy,Horticulture & Plant Science,South Dakota State University,Brookings,SD 57007,USA
b School of Computing,Queen’s University,Kingston,ON K7L 3N6,Canada
c USDA-ARS,Hard Winter Wheat Genetics Research Unit,Manhattan,KS 66506,USA
d Department of Plant Pathology,North Dakota State University,Fargo,ND 58108,USA
Keywords:FHB GBS Genomic selection Multi-trait models Winter wheat Wheat scab
ABSTRACT Fusarium head blight(FHB),also known as scab,is a devastating fungal disease of wheat that causes significant losses in grain yield and quality.Quantitative inheritance and cumbersome phenotyping make FHB resistance a challenging trait for direct selection in wheat breeding.Genomic selection to predict FHB resistance traits has shown promise in several studies.Here,we used univariate and multivariate genomic prediction models to evaluate the prediction accuracy(PA)for different FHB traits using 476 elite and advanced breeding lines developed by South Dakota State University hard winter wheat breeding program.These breeding lines were assessed for FHB disease index(DIS),and percentage of Fusarium damaged kernels(FDK)in three FHB nurseries in 2018,2019,and 2020(TP18,TP19,and TP20)and were evaluated as training populations(TP)for genomic prediction(GP)of FHB traits.We observed a moderate PA using univariate models for DIS(0.39 and 0.35)and FDK(0.35 and 0.37)using TP19 and TP20,respectively,while slightly higher PA was observed(0.41 for DIS and 0.38 for FDK)when TP19 and TP20(TP19+20)were combined to leverage the advantage of a large training population.Although GP with multivariate approach including plant height and days to heading as covariates did not significantly improve PA for DIS and FDK over univariate models,PA for DON increased by 20% using DIS,FDK,DTH as covariates using multi-trait model in 2020.Finally,we used TP19,TP20,and TP19+20 in forward prediction to calculate genomic-estimated breeding values(GEBVs)for DIS and FDK in preliminary breeding lines at an early stage of the breeding program.We observed moderate PA of up to 0.59 for DIS and 0.54 for FDK,demonstrating the promise in genomic prediction for FHB resistance in earlier stages using advanced lines.Our results suggest GP for expensive FHB traits like DON and FDK can facilitate the rejection of highly susceptible materials at an early stage in a breeding program.
Several fungal pathogens continuously constrain global wheat production and food security.Fusarium head blight(FHB),also known as scab,is a devastating fungal disease of wheat that causes significant losses in grain yield and quality[1].FHB is expanding its horizons throughout major wheat-producing areas due to climate change,an increased wheat acreage under no-till cultivation,and adoption of maize-wheat rotations[2,3].Though several Fusarium species can cause FHB,Fusarium graminearum is the prominent pathogen for FHB in the United States,Canada,China,and some European countries[4].In addition to grain yield losses,FHB results in reduced quality due to contamination by mycotoxins,primarily deoxynivalenol(DON),posing serious health consequences to humans and animals if ingested in certain quantities[4,5].In 2014,the revenue losses for hard wheat due to FHB were estimated to be around $600 million in the Great Plains region of the US[6].
Even though fungicides are frequently used to reduce FHB damage,the utilization of resistant varieties is considered the most effective and economical approach to combat diseases like FHB[7,8].Being quantitative in nature,resistance to FHB is governed by multiple quantitative trait loci(QTL)and highly influenced by changing environments.Resistance to FHB has been categorized as type I(resistance to initial infection),type II(resistance to spread of FHB symptoms within a spike),or type III(low mycotoxin accumulation)[9].However,type II resistance is more stable and utilized in breeding programs as compared to type I and III FHB resistance[7,9].Conventional QTL mapping and genome-wide association study(GWAS)have been used to dissect the genetic basis of FHB resistance and a large number of QTL have been identified across all the wheat chromosomes,including seven named QTL,Fhb1 to Fhb7[10-13].However,only a handful of QTL have a major effect on type II resistance(resistance to FHB symptom spread in a spike)and have been effectively utilized in wheat breeding globally,in particular Fhb1[9,14].In the US hard winter wheat region,most of the variation in FHB resistance are from native sources including cultivars‘Everest’,‘Overland’,‘Lyman’,and‘Expedition’.Furthermore,phenotypic selection for FHB resistance in both field and greenhouse is very complicated and some of the measurements such as DON content can only be obtained after harvest and are costly.Thus,genomic selection(GS)could be a promising approach to improve FHB resistance in wheat with reduced phenotyping efforts and costs.
GS is an approach that employs linkage disequilibrium(LD)and estimates the genetic worth of an individual using genomewide markers[15,16].GS addresses the primary limitation of QTL mapping and marker-assisted selection by using a joint estimate of all marker effects[17].Thus,GS is useful for predicting and selecting complex traits controlled by several minor QTL that are difficult to map using QTL mapping[18].The genomic prediction(GP)models are developed by using genotypic and phenotypic data in a training population(TP)to predict the genomicestimated breeding value(GEBV)of individuals in the breeding population(BP)[16].GS has shown immense potential in plant breeding,and several studies have reported successful implementation of these strategies in different crops in recent years[19-21].GS is particularly useful for these traits where phenotyping is cumbersome or costly[22,23].
Predictive ability of the GS model refers to the correlation between estimated GEBVs and the actual phenotypic values of the individuals in the validation set.The PA mainly depends on heritability of the traits,TP nature and size,and choice and optimization of the statistical models[23,24].Several studies have evaluated the GP models for predicting different FHB traits including disease index(DIS),fusarium damaged kernels(FDK),and DON in wheat.Most of these studies employed a cross-validation approach to evaluate the PA of GP in spring and soft winter wheat using various prediction models and strategies[25-30].However,the focus of these studies was limited to improving the PA within the TP being evaluated,rather than validating the improved models in forward prediction.Unlike other traits,such as yield,only a few studies reported the implementation of GP to select preliminary breeding lines for FHB resistance[31,32].So far,the potential of GS in improving the FHB resistance in early generations of a hard winter wheat breeding program has not been examined.
Most of the previous studies compared univariate GP approaches,including ridge-regression best linear unbiased prediction(rrBLUP),and genomic best linear unbiased prediction(GBLUP),LASSO,Random Forest(RF),and several Bayesian approaches[25,26,29].In most cases,the performance of these GP models varied with FHB traits and cross-validation schemes used in the analyses.On the other hand,multi-trait(MT)approaches are used to improve the prediction ability for a primary trait,when secondary traits genetically correlated to the primary trait are available[33].MT models are of particular importance when the primary trait is difficult or expensive to phenotype and has low heritability.Plant height(PH)and days to heading(DTH)are often associated with FHB resistance in bread wheat and durum wheat[30,32].Thus,several studies have used PH and DTH as covariates to predict FHB resistance[14,30,32,34],and only two studies suggested improvement in PA using PH or DTH as covariates[30,32].Furthermore,different FHB traits,including DIS,FDK,and DON are known to have moderate positive genetic correlations[35].Recent studies have evaluated the use of different combinations of FHB traits along with DTH and PH as secondary traits to utilize this genetic correlation for improving prediction ability of MT models[30].For instance,FHB traits like DIS and FDK can be used as secondary traits for improving the prediction of DON,which is cumbersome and costly to phenotype.Thus,there is a need to evaluate the usefulness of these covariates or to figure out the best combinations of secondary traits in MT models to predict FHB traits,especially in hard winter wheat.
Another aspect of the successful application of GS in a breeding program is establishing a training population.Previous studies have evaluated strategies to optimize training populations using advanced breeding lines to predict GEBVs of preliminary breeding lines for various traits in wheat,including FHB[28,31,36,37].The application of GS in this scenario can be handy as it is challenging to phenotype a large set of preliminary lines in expensive FHB nurseries.
The primary objective of this study was to use different sets of advanced breeding lines as training populations to predict FHB traits in preliminary breeding lines of our hard winter wheat breeding program.Secondly,we wanted to evaluate the performance of MT models to predict FHB traits,including DIS,FDK,and DON,using different combinations of secondary traits.For this,we evaluated FHB nurseries comprising advanced breeding lines from three years for their usability as training populations to predict these traits.Thus,our specific objectives for this study were to(a)evaluate the usability of FHB nurseries comprising advanced breeding lines for predicting FHB traits and assessing the improvement in predictive ability if the best-performing TPs from individual years were combined based on the lines shared among the FHB nurseries over years,(b)compare predictive abilities of MT models when different combinations of secondary traits are used to predict DIS,FDK,and DON,and(c)to validate the selected models and TPs in a forward prediction scheme to calculate the GEBVs for FHB traits in an independent breeding population comprising the preliminary breeding lines.
A panel of 476 wheat breeding lines from the winter wheat breeding program at South Dakota State University was used in this study.A set of breeding lines that is tested every year in the advanced yield trial(AYT)and the elite yield trial(EYT)was evaluated for FHB resistance in a mist irrigated field nursery.Among the 476 breeding lines,153 were evaluated in the 2018 nursery,169 in the 2019 nursery,and 154 in the 2020 nursery to optimize a training population for predicting preliminary breeding lines.Further,65 lines were shared between the 2018 and 2019 nurseries;and 58 lines were shared between the 2019 and 2020 nurseries.The majority of the breeding lines were either F4:7or F4:8filial generation.Lines without genotypic data and lacking consistency between replications were excluded and the final analysis was conducted on 152,161,and 153 lines from 2018,2019,and 2020 nurseries,respectively.In addition,a set of 200 breeding lines from the preliminary yield trial(PYT)was used as a breeding population for the prediction of FHB indices(independent validation)using the optimized training population.A random set of 60 lines was selected from the BP and evaluated in the 2020 nursery to validate the prediction accuracy(PA)in forward selection.
Plant materials were planted in the FHB nurseries at Brookings,South Dakota(44.3114°N,96.7984°W)during the 2018,2019,and 2020 growing seasons(Table S1)using a randomized complete block design with 2 or 3 replicates for different sets of lines with corresponding checks.We used cultivars‘Lyman’and‘Emerson’as resistant checks while‘Overley’and‘Flourish’were used as susceptible checks.The experimental unit was a single-row plot(40 plants per meter row)for each line.FHB resistance and related traits were evaluated in 2018,2019,and 2020.DTH was recorded using Julian date when 50% of the main tillers in the row had completely emerged heads.PH was measured from the soil surface to the top of main tiller spikes excluding awns when the plant materials matured.
All FHB nurseries were artificially inoculated using both cornspawn and spraying a spore suspension of F.graminearum isolates(SD-FG1)as described in Halder et al.[13].Briefly,Fusariuminfested corn kernels were scattered on soil surface once at the boot stage(Feekes 10)and again at heading stage(Feekes 10.1)to enhance the chances of infection in the field.At anthesis,a conidial suspension containing 100,000 spores/ml was sprayed on the heads of each line at 50%anthesis to avoid any escape.The nursery was misted using sprinklers every night(19:00-7:00)to maintain the humidity for disease development.Disease incidence and severity was recorded by scoring FHB symptoms on 20 heads/replication/line 21 days post-flowering using a visual scale described by Stack and McMullen[38].The DIS was calculated as(Incidence(INC)×Severity(SEV))/100.Percentage of Fusarium damaged kernel was evaluated using grain samples harvested by a low airspeed harvester.Sampled kernels were compared against a set of known FDK standards(https://agcrops.osu.edu/newsletter/corn-newsletter/2015-21/rating-fusarium-damaged-kernels-fdkscabby-wheat)to estimate the FDK values in two replications per sample.We used DIS data from three seasons,whereas FDK from 2019 and 2020 seasons for evaluating GP.For DON estimation,the samples were analysed at Department of Plant Science at North Dakota State University using gas chromatography-mass spectrometry method.As limited number of unreplicated samples were analysed for DON during 2018 and 2019,we used DON data only from the 2020 season.
Fresh leaf tissues were taken from each line for DNA isolation using the hexadecyltrimethylammonium bromide(CTAB)method[39].The seedlings for each line in advanced or elite trials in their respective year of evaluation were grown in small pots/cones for tissue sampling and DNA extraction.A genotyping-by-sequencing(GBS)library was constructed using the PstI and MspI restriction enzymes[20]and sequenced in an IonTorrent Proton sequencer(Thermo Fisher Scientific,Waltham,MA,USA)at the USDA Central Small Grain Genotyping Lab,Manhattan,KS,USA.DNA sequence data was used to call single-nucleotide polymorphisms(SNP)using the previously described approach employing TASSEL v5.0(Trait Analysis by aSSociation,Evolution and Linkage)[40].After removal of SNPs with more than 30% missing genotypes,minor allele frequency(MAF)of less than 0.05 and unmapped on any chromosome,9321 high-quality SNPs were imputed using BEAGLE v4.1[41]for further analysis.
The phenotypic data for FHB DIS and FDK was analyz ed as best linear unbiased estimates(BLUEs)for individual nurseries.The following model was used to estimate the BLUEs:
where yijis the trait of interest,μis the overall mean,Riis the effect of the ithreplicate,Gjis the effect of the jthgenotype,and eijis the residual error effect associated with the ithreplication and jthgenotype.The lines shared between 2019 and 2020 seasons were used to combine the DIS and FDK data,BLUEs was estimated across environments using the following statistical model:
where yijkis the trait of interest,μis the overall mean,Eiis the effect of the ithenvironment,Rj(i)is the effect of the jthreplicate nested within the ithenvironment,Gkis the effect of the kthgenotype,GEikis the effect of the genotype×environment(G×E)interaction,and eijkis the residual error associated with the replication and genotype effects.
The broad-sense heritability(H2)for DIS and FDK was estimated for independent nurseries as follow:
where σ2gandσ2e,are the genotype and error variance components,respectively.We used META-R[42]based on the LME4 R-package[43]for the linear mixed model analysis and heritability estimation.The Pearson correlations among traits and environments were estimated based on the BLUEs for each trait using R environment[44].
The principal component analysis(PCA)was conducted using the genotypic data from 457 lines(257 lines from 2019 and 2020 nurseries and 200 lines from the breeding population)to study the relationship between the training and breeding populations.The‘prcomp’function in R was selected to perform PCA using 9321 SNP markers,and the first two principal components were used for the scatterplot.
The univariate genomic prediction for DIS and FDK was performed using four different algorithms.The rrBLUP model[45]is the widely used GS model in plant breeding.In rrBLUP,we assume a normal distribution of marker effects with equal variance.The GEBVs for DIS and FDK were estimated for each training population using the trait BLUEs.A linear mixed model was implemented using the following model:
where y is the vector(n×1)of adjusted means(BLUEs)from n genotypes for a given trait;μis the overall mean;Z is the design matrix(n×p)with known values of p markers for n genotypes;u is a genotypic predictor with u~N(0,),where G is positive semidefinite matrix,obtained from markers using‘A.mat’which is an additive relation matrix function andis the additive genetic variance;εis the residual error with e~N(0,).The model was implemented in the‘rrBLUP’R package[45]for one trait at a time.
The rrBLUP model assumes common variance across the marker effects,which causes an underestimation of the large-effect QTL.However,the Bayesian method assumes unequal variances across marker effects and uses different priors to estimate these variances to overcome the limitations of rrBLUP[16].We used two Bayesian algorithms,BayesA(BA)and BayesB(BB),to estimate GEBVs for given traits.BA assumes that all markers have a non-zero effect by treating the proportion of markers with no effect(π)as zero.The markers are included in the model after shrinking their estimates to a normal distribution.BB is an extension of BA,which employs an inverse chi-square distribution for marker effects and assumes that some markers have no effect,which are excluded from the model.Thus,BB considers the presence of some large effect QTL controlling the given trait[16,46].A detailed description of Bayesian models can be found in Pérez and de los Campos[47].The Bayesian models were implemented in the‘BGLR’package using a Gibbs sampler with 5000 burn-in and 15,000 iterations for each run[47].Random Forest(RF)is an ensemble method that uses a collection of classification or regression trees to conduct prediction.The idea is that by combining a large number of smaller decision trees,RF can reduce the variance of prediction.The bias of RF models converges to a limiting value in the limits.RF methods have been applied for both genetic association studies and phenotype predictions[48].In this study,we extended the method used in Grinberg et al.[49]with 1000 iterations and 64 random states.The algorithm was implemented in Python using Sklearn library.A multivariate model was used to predict DIS and FDK by including DTH and PH as secondary traits in the model.Furthermore,we evaluated the performance of multivariate model to predict DON content using different combinations of secondary traits including DIS,FDK,DTH,and PH.A Bayesian Multivariate Gaussian model with an unstructured variance-covariance matrix was used for MT model[50].The MT model predicts FDK/DIS/DON using the secondary traits as described in the following:
where y is the vector with a length of n×t(n genotypes and t traits);μis the means vector;Zrepresents the incidence matrix of order[(n×t)p],;u[(n×t)p]is genotypic predictor for all individuals and traits with u~N(0,∑?G).The matrix G represents the positive semidefinite matrix obtained from markers.The residuals of the MT model are represented by the vectorε,withε~N(0,R?I).The matrices∑and R are the variance-covariance matrices for depicting the genetic and residual effects for each individual in all traits,respectively.∑was estimated as an unstructured matrix and R as a diagonal matrix following Lado et al.[50].The variancecovariance matrices were estimated using a Gibbs sampler with 15,000 iterations,where the first 5000 iterations were used for burn-in.The MT model was implemented in R package‘MTM’[51].
We evaluated the breeding lines from 2018,2019,and 2020 nurseries as training populations to predict the performance of preliminary breeding lines.Furthermore,we combined and evaluated the lines from 2019 and 2020 as one training population.Though the data for these two nurseries come from different years,a set of 58 lines that were common between the two nurseries was used to obtain the trait BLUEs.The resulting TPs were referred to as TP18,TP19,TP20,and TP19+20 for further genomic prediction.
We first used each of these TPs for predicting single-year DIS and FDK.Briefly,each TP was randomly divided into five sets of equal size.Four of the five sets(80%)were used as a training set(phenotyped and genotyped)to train the model,and the remaining set(20%)was used as a testing set(genotyped only)for prediction(Fig.S1).Predictive ability was estimated as Pearson’s correlation between the GEBVs and observed phenotypes for the testing set.The predictions were assessed using five different models,as discussed earlier.The cross-validation process was repeated 500 times for rrBLUP and 100 times for Bayesian models,where each iteration included different lines in the training and testing sets.Cross-validation was used to evaluate the ability of four TPs using different models to estimate GEBVs for DIS and FDK.
For the MT model,the lines were randomly split into a training set(80%)and a testing set(20%).To train the model,we used phenotypic data of secondary traits(PH and DTH)from both the training and testing sets,but the phenotypic data of the target trait(DIS or FDK)from the training set only(Fig.S1).
As mentioned earlier,we used phenotypic data for DON from 2020 season to evaluate the performance of MT models and compare it with standard ST model(rrBLUP).The lines were randomly split into a training set(80%)and a testing set(20%)as in earlier case(Fig.S1).In this case,we evaluated the inclusion of different secondary traits(DIS,FDK,DTH,and PH)in different combinations to evaluate the predictive ability of MT model for DON.For instance,we used a MT model which included all four secondary traits to predict DON,then only three traits as secondary traits,and overall evaluating eight such combinations.
Based on cross-validation analysis,we selected TP19,TP20,and TP19+20 for independent prediction of DIS and FDK in the preliminary lines from the breeding population.TP18 was not used for forward predictions as we did not observe a good PA for DIS and FDK in cross-validation.As all the prediction models yielded comparable results for DIS and FDK using cross-validation,we selected rrBLUP over other models for independent predictions based on its easy and less-intensive implementation.The model was trained using genotypic and phenotypic data from TP19,TP20,and TP19+20 in the‘rrBLUP’package to predict the GEBVs of 200 individuals in the breeding population.To assess the predictive ability,we randomly selected 60 lines of the BP,phenotyped these lines for DIS and FDK in the 2020 nursery,and used the observed phenotypic values to compare to the GEBVs from the TPs.
A significant genotypic variation in DIS and FDK(P less than 0.001)was observed in FHB nurseries from three seasons(Table S2).The largest variation in DIS was observed in 2019 ranging from 16.0 to 91.2 with a mean DIS of 49.1(Table 1),while the smallest variation was observed in 2018,ranging from 21.0 to 59.7(Table 1;Fig.1).The mean percent FDK were also different between 2019 and 2020,with 80% in 2019 and 58.6% in 2020.Despite high mean FDK values,sufficient phenotypic variation was observed among evaluated lines in both years(Table 1).Overall,2019 had the highest disease occurrence and 2018 the lowest.
Significant phenotypic correlations(0.45 and 0.46)were observed between DIS and FDK in both 2019 and 2020.DIS and FDK exhibited negative correlations with plant height in all three nurseries(Figs.S1,S2 and S3).A negative correlation was observed between DIS and days to heading in 2018.However,both DIS and FDK showed weak but positive correlations with days to heading in 2019 and 2020(Fig.S2).As DON was also estimated in 2020,we observed significant positive phenotypic correlations for DON with DIS(0.34),FDK(0.33),and DTH(0.33)(Fig.S4).Broad-sense heritability for both FHB traits were moderate(Table 1)with the highest heritability for DIS(0.77)in 2020,and for FDK(0.75)in 2019.
The principal component analysis was conducted using 9321 SNP markers and 457 lines.The first two principal components(PCs)explained 6.9%(PC1)and 5.0%(PC2)of the genetic variance(Fig.1).The PCA revealed two primary clusters for the 457 lines,including 257 unique lines from the training populations(TP19 and TP20)and 200 lines of the breeding population(BP).The mixed distribution of the lines from both TPs and BP in the two clusters suggested a close relationship between the two populations,therefore it will be useful to use the TP for forward prediction in a breeding program.
Table 1Descriptive statistics of two Fusarium head blight(FHB)resistance traits,disease index(DIS),and Fusarium-damaged kernels(FDK)for advanced winter wheat breeding lines evaluated in three independent FHB nurseries from 2018,2019,and 2020.
Fig.1.The scatterplot for principal component analysis(PCA)of 457 lines based on 9321 SNP markers.The 457 breeding lines include 257 lines from training populations(TP19 and TP20)and 200 lines of the breeding population(BP)used in the forward prediction.The blue triangles represent the lines from TPs,and the green circles represents lines from BP.The first two PCs explained 6.9%and 5.0%of the total variation,respectively.
Various GP models were used to evaluate the prediction accuracy within each TP using a cross-validation approach.We masked the observed phenotype in 20%of the lines in each TP,and treated them as untested new lines.The PA of DIS and FDK was measured as the Pearsons correlations between the predicted and the observed phenotypes of the masked lines.A general comparison of DIS and FDK prediction using different models is presented using boxplots in Figs.2 and 3.TP18 provided a low mean predictive ability of 0.20 for DIS,ranging from 0.15 to 0.23 using different GP models(Table 2).The TP20 provided a moderate mean predictive accuracy of 0.35 for DIS,and the TP19 yielded the highest mean prediction accuracy(0.39)for DIS,ranging from 0.37 to 0.41 using various models.Furthermore,for FDK,the mean predictive ability was 0.35 ranging from 0.32 to 0.37 in TP19,whereas,in TP20 mean predictive ability was slightly higher ranging from 0.34 to 0.40 with a mean of 0.37(Table 2).Among four different ST models,rrBLUP outperformed other models in all the TPs for predicting DIS and FDK(Figs.2 and 3).
To improve prediction accuracies for different traits,we combined two individual populations(TP19 and TP20)into one large population(TP19+20).The BLUE values based on the shared set of lines evaluated in both years were used to assess PA for DIS and FDK using a cross-validation procedure.TP18 had poor performance in the cross-validation,therefore,it was excluded from the combined TP.The resulting population produced a slightly higher average PA of 0.41 for DIS and 0.38 for FDK using different GP models(Table 2;Figs.3,4)than for either TP19 or TP20.Besides a slight increase in the PA using larger TP,we observed that PA of all the models was quite similar,indicating a stable PA irrespective of the models.
PA also varied with types of GP models used for predicting DIS and FDK.Among four univariate models used for both DIS and FDK predictions,the rrBLUP model performed slightly better than BA,BB,and RF models in all TP scenarios(Table 2).When the univariate GP models were compared with a MT model with plant height and days to heading as covariates,the MT model did not show much improvement in DIS predictions than univariate models using TP18,TP19,and TP19+20.However,the prediction accuracy of MT was improved by 20%(0.35 to 0.42)for DIS in TP20 when two secondary traits(DTH and PH)were included in the model(Table 2).Improvement using the MT model in TP20 can be attributed to the moderate correlation between DIS and secondary traits in the model(Fig.S4).For FDK,the MT model did not improve prediction accuracy in any of the TPs(Table 2)as FDK seemed to be less correlated with DTH or PH in any of the growing seasons.
The lines from 2020 nursery were also evaluated for DON along with other FHB traits.As DON is a costly trait to phenotype and a smaller number of samples are analyzed for DON each year,we were interested to see if we can use other FHB traits in MT models to predict DON.For this,we used the MT model with different combinations of DIS,FDK,DTH,and PH as covariates to predict DON.We used ST rrBLUP model as a benchmark to compare the performance of MT model(Fig.4).Using TP20,the rrBLUP model yielded a predictive accuracy of 0.49 for DON,which was higher as compared to DIS and FDK using the same model.The MT model showed an improvement of up to 20%,yielding PA ranging from 0.54 to 0.59 with different combinations of secondary traits(Table S3).The MT model having DIS,FDK,DTH,and PH as covariates had PA of 0.56,whereas the MT model with DIS,FDK,and DTH had the highest PA(0.59)among all the combinations(Table S3).We also evaluated the MT model if only single trait(DIS,FDK,or DTH)is used as covariate to predict DON.Interestingly,the MT model with only FDK as covariate had PA of 0.56,which is comparable to the combination having all traits as covariates(Table S3).
Fig.2.The predictive ability(PA)for Fusarium head blight(FHB)resistance disease index(DIS)in different sets of training populations(TPs)used in the study.Boxplots compare the PA using five genomic prediction models:rrBLUP,ridge-regression best linear unbiased prediction;BA,BayesA;BB,BayesB;RF,Random Forest;and MT,multitrait model.TP18,training population based on 2018 FHB nursery;TP19,training population based on 2019 FHB nursery;TP20,training population based on 2020 FHB nursery;TP19+20,training population combining 2019 and 2020 FHB nurseries.
Fig.3.The predictive ability(PA)for Fusarium-damaged kernels(FDK)in different sets of training populations(TPs)used in the study.Boxplots compare the PA using five genomic prediction models:rrBLUP,ridge-regression best linear unbiased prediction;BA,BayesA;BB,BayesB;RF,Random Forest;and MT,multi-trait model.TP19,training population based on 2019 FHB nursery;TP20,training population based on 2020 FHB nursery;TP19+20,training population combining 2019 and 2020 FHB nurseries.
Table 2Mean prediction accuracy and standard error for two Fusarium head blight(FHB)resistance traits,disease index(DIS)and Fusarium-damaged kernels(FDK)with cross-validation in different Training Populations using different genomic prediction models.
Fig.4.Boxplots comparing the predictive ability of ST model and the MT model with different combinations of secondary traits.DIS,FHB disease Index;FDK,fusarium damaged kernel percentage;DTH,days to heading;PH,plant height;DON,deoxynivalenol content.
To validate usefulness of different FHB nurseries as possible TPs,GEBVs for DIS and FDK of a random set of preliminary breeding lines from PYT were predicted using TP19,TP20,and TP19+20 with the rrBLUP model.Moderate prediction accuracies(Table 3)were observed for DIS and FDK using the three TPs.The TP19 provided the highest prediction accuracy(0.59)for DIS,following the trend observed in our cross-validation.The TP19+20 produced the highest prediction accuracy of 0.54 for FDK,followed by TP19 and TP20(0.50 and 0.49,Table 3).Overall,independent predictions provided better PA than the cross-validation.Furthermore,we used a scatterplot to compare the breeding lines which were rejected based on estimated GEBVs but retained based on observed data(Figs.5,S5).Interestingly,we observed there was low probability of rejecting lines with lower DIS or FDK ratings as most of lines rejected based on GEBVs were having a moderate DIS or FDK observed value in both 2019 and 2020(refer to top-left quadrant of scatterplots in Figs.5,S5).These results demonstrate that the genomic prediction can be implemented to improve FHB resistance in wheat breeding programs.
Several studies have evaluated the inclusion of GS in wheat breeding programs to predict FHB resistance in recent years[25,26,29,31];however,most of these studies were done in soft winter wheat.The current study used a cross-validation strategy to evaluate the potential of hard winter wheat breeding lines as training populations for GS of FHB resistance.The multi-trait GP model was also evaluated for predicting different FHB traits.Finally,the current study demonstrates the use of different sets of advanced breeding lines as training sets to predict preliminary breeding lines in a forward prediction scheme.
Table 3Prediction accuracy for two Fusarium head blight(FHB)resistance traits,disease index(DIS)and Fusarium-damaged kernels(FDK)in forward prediction scheme.
We used an FHB disease index estimated based on incidence and severity,and FDK percentage to evaluate FHB resistance in advanced breeding lines.We observed a wide variation for FHB resistance in the advanced breeding lines from the South Dakota State University winter wheat breeding program in three disease nurseries from 2018 to 2020.For example,variations for DIS were from 16.0%to 91.2%and FDK from 38.1%to 99.0%in the 2019 nursery(Table 1).A similar trend was observed in two other nurseries(Table 1).The majority of the advanced and elite breeding material from our program does not carry Fhb1 likely due to yield drag and negative agronomic potential,suggesting minor genes from native sources predominantly govern FHB resistance in our breeding program.This is similar to other hard winter wheat breeding programs in the region as only one variety(TAM 205)carrying Fhb1 has been released till date.Therefore,GS seems to be a suitable approach to breed for the FHB resistance.
Although the FHB nursery had a controlled mist system,the prevailing environment in respective years is believed to play a significant role in varying disease pressure.The 2018 season was very dry in South Dakota,but it was reasonably wet in 2019.These environmental factors lead to fluctuations in the disease pressure affecting the spread of the data over the years,consistent with previous studies[28,30,31,36].However,these fluctuations did not alter the ranking of different check genotypes over different nurseries.For instance,the DIS indices were 27.2,29.8,and 32.6 in 2018,2019,and 2020,respectively,for resistant check‘Lyman’,and 84.6 and 72.4 in 2019 and 2020,respectively,for susceptible check‘Overley’.The consistent performance of checks shows uniformity and reliability of phenotyping across nurseries,which is supported by higher heritability for DIS(0.54-0.77)and FDK(0.66-0.75)(Table 1).The moderate H2estimates were in similar range as that of related studies using different types of population in spring-or winter wheat[30,36,52-54].
Fig.5.Scatterplots showing observed v/s predicted values for FHB disease index(DIS).The observed DIS estimates were based on phenotypic evaluation of 60 independent lines,and the predicted values are GEBVs using(A)TP19,(B)TP20,and(C)TP19+20.The red dashed line represents the cutoff(50%)to discard genotypes based on the observed data and estimated GEBVs.Genotype to the right side of the red dashed line would be discarded based on observed phenotype,and genotype above the red dashed line would be discarded based on GEBVs.
Several studies successfully used advanced or preliminary breeding lines from a breeding program[28,36,37]and from unrelated regional nurseries[31]as training populations to predict various traits in wheat.We evaluated advanced breeding lines in three FHB nurseries as three possible training sets for genomic prediction using cross-validation(Table S1),and obtained moderate prediction accuracies when TP19 and TP20 were used to predict DIS(up to 0.41 using TP19 and 0.42 in TP20)and FDK(up to 0.37 in TP19 and 0.40 in TP20)(Table 2).The prediction accuracy for both traits was consistent with several previous studies in soft winter or spring wheat[29,31,36,54].Similarly,prediction accuracies for FDK in our study were also similar to Rutkoski et al.[25]and Adeyemo et al.[36],but lower than those reported by Arruda et al.[26]and Verges et al.[31].The poor performance of TP18 for DIS prediction could result from low disease pressure and phenotypic variation observed in the 2018 nursery,hence,such a nursery is not recommended for a forward genomic prediction.Overall,based on mutliple years of data our study demonstrates the usefulness of advanced breeding lines as TPs for genomic prediction,given that the quality of phenotyping is robust.
Training population size is another crucial factor that affects the PA of GP models.Previous studies have reported an increase in PA with an increased TP size[55-57].Lorenz et al.[57]and Arruda et al.[26]obtained higher PA for FHB traits in barley and wheat when the TP size contained 250 to 300 lines.In the current study,the three populations,each with around 150 lines,were used as independent TPs for FHB trait prediction.Also,the performance of a larger TP(TP19+20)developed by combining two TPs(TP19 and TP20)were evaluated.Although TP19 and TP20 were phenotyped in two independent nurseries,they shared a subset of 58 breeding lines.Thus,a large TP(TP19+20)was formed with 265 unique lines and a higher PA was observed for DIS(0.41)and FDK(0.38)when TP19+20 was used as the TP.Apart from improvement in PA using TP19+20,we observed that all prediction models yielded similar results which was not the case when using smaller TPs(Table 2).Furthermore,a lower standard deviation for PA based on several repeats of cross-validation suggests consistency of prediction when using a large TP(Figs.3 and 4).The results of this study suggest that selection of the right TP is crucial to improve PA.TP19+20 showed the best PA due to the large TP size and correction of BLUEs for DIS and FDK across two different nurseries/years,thus it can be used as the TP in forward genomic prediction of preliminary breeding lines.
We compared the PA for DIS and FHB using four univariate and one multivariate GP model.Among the univariate models,rrBLUP outstripped the other three models for predicting DIS and FDK in all individual TPs(Table 2).Further,rrBLUP is preferred model for predicting FHB traits owing to its better performance and computational advantage.Previous studies have also reported that rrBLUP has better PA than Bayesian models[25,26]is one of the most often used methods in GS for FHB resistance.
Using a MT GP model is another approach to increase the PA for FHB traits.This model includes correlated secondary traits such as PH and DTH as covariates to predict DIS and FDK.Schulthess et al.[32]and Larkin et al.[30]reported an increase in PA by including PH or DTH in the GP model.In the current study,PH and DTH were included as secondary traits in a MT-GBLUP model to predict DIS and FDK,however,improvement of PA were not considerable except for DIS in TP20 where the PA was improved by at least 20%over the other univariate models(Table 2).The better PA using MT model in the TP20 resulted from the moderate correlation of DIS with PH and DTH(Fig.S1).In other TPs,a lower correlation was observed among evaluated traits.The MT models have been used to improve the PA of low-heritability traits by using the information from correlated traits with high heritability[23,33,58].However,moderate to high heritability estimates for DIS and FDK in this study could be another reason for MT models showing no advantages over univariate models.Thus,our results suggest that the MT models could be useful and employed in forward prediction in early generation nurseries if the observed correlations between FHB traits and the covariates are moderate to high.
DON is an important and primary FHB trait;however,most of the winter wheat breeders across the US are unable to make decisions based on DON estimates.For instance,the South Dakota State University winter wheat program harvests FHB nursery in August and then plants the next cycle in September,making the turnaround cycle very short.Contrary to this,phenotyping for DON is mostly outsourced and it takes several months before the breeder gets the data.Thus,it will be of great importance if we can predict DON and utilize the predictions to advance resistant lines to the next generation.Furthermore,previous studies have suggested the use of different pre-harvest traits,such as DIS,FDK,DTH or PH,in MT models to predict DON with better accuracy[30].We used DON estimates available for TP20 to evaluate the predictive ability for DON using ST and different versions of MT model based on cross-validation.The ST(rrBLUP)model predicted DON with PA of 0.49,which was better than DIS and FDK using any of the models(Table S3).Further,we compared MT model having different combinations of traits as covariates and observed that trait combinations of DIS,FDK,DTH(0.59)and FDK,DTH(0.58)had the highest PA(Fig.4),similar to the results reported by Larkin et al.[30].Interestingly,it was found that using FDK as‘only’covariate in the MT model had high PA(0.56),suggesting that using only FDK can improve the performance of MT model over the ST model.Overall,the results suggest that we can use GP for predicting DON in earlier stages and the PA can be further improved by using secondary traits such as FDK.
Relatedness among the individuals in TP and BP is considered crucial for getting higher PA in genomic prediction.In this study,we aimed to evaluate the usefulness of advanced breeding lines as TPs to predict earlier generations lines from our breeding program.The PCA showed a good association between the lines from TP and BP(Fig.2).Hence,we obtained moderate prediction accuracy for DIS and FDK in an independent BP(Table 3),comparable to the other reports[31,59,60].Though a moderate PA for DIS and FDK were achieved in the current study,the phenotypic data of advanced lines evaluated from FHB nurseries could be used to predict GEBVs for the early generation breeding lines in wheat breeding programs(Table 3).The predicted GEBVs can be used to discard susceptible lines at the earlier stages of the breeding cycle(Figs.3,5).For instance,we achieved a PA of 0.59 for DIS using TP19 and discarded 50% of the most susceptible lines based on the TP19 based GEBVs and selected the remaining 50% lines for further selection(Table 3).Among these discarded lines,87%were highly susceptible based on their observed disease index in the mist irrigated inoculated FHB nursery,and only 13%susceptible genotypes were carried forward based on GEBVs(Fig.5).Similar results were observed for FDK(Fig.3),which suggests that the estimated GEBVs from the TP can be used to discard the most susceptible lines at an early stage to reduce phenotyping costs.Our results also suggest that these lines discarded based on GEBVs had an extremely low chance to be highly resistant in terms of DIS and FDK.
In summary,our study demonstrated that advanced breeding lines evaluated in the FHB nurseries can serve as TPs and predict GEBVs for untested earlier generation lines in the breeding programs.Further advanced lines from several years can be combined to increase the PA in the forward breeding.However,we recommend evaluating the performance of individual advanced breeding nurseries through cross-validation before pooling multiple nurseries to develop larger TPs for forward prediction.Furthermore,the MT models using secondary traits as covariates could be useful in predicting cumbersome FHB traits(DON and FDK).Finally,our results suggest genomic prediction can be successfully applied in a wheat breeding program in discarding the most FHB susceptible lines at an early stage as opposed to laborious phenotypic selection in most years and especially in abnormal years when phenotypic evaluation is unreliable or unavailable due to environmental conditions.
CRediT authorship contribution statement
Jinfeng Zhang:Conceptualization;Investigation;Formal analysis;Visualization;Software implementation;Writing-original draft.Harsimardeep S.Gill:Conceptualization;Investigation;Formal analysis;Data curation;Visualization;Software implementation;Writing-original draft.Navreet K Brar:Investigation.Jyotirmoy Halder:Formal analysis;Investigation.Shaukat Ali:Investigation;Writing-review & editing.Xiaotian Liu:Formal analysis.Amy Bernardo:Genotyping and SNP discovery analysis.Paul St Amand:Genotyping and SNP discovery analysis.Guihua Bai:Genotyping and SNP discovery analysis;Interpretation of results;Writing-review&editing.Upinder S.Gill:Interpretation of results;Writing-review&editing.Brent Turnipseed:Interpretation of results;Writing-review & editing.Sunish K.Sehgal:Conceptualization;Data curation;Visualization;Methodology;Supervision;Resources;Writing-original draft,review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This project was collectively funded by the USDA hatch projects SD00H695-20,USDA-ARS agreement 59-0206-0-177(USDAUSWBSI),the Agriculture and Food Research Initiative Competitive Grant 2022-68013-36439(WheatCAP)from the USDA National Institute of Food and Agriculture,and South Dakota Wheat Commission Grant 3X1340.The funders had no role in the study design,data collection and analysis,decision to publish,or manuscript preparation.The authors would like to thank the South Dakota Agriculture Experimental Station(Brookings,SD,USA)for providing the resources to conduct the experiments.
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.03.010.