TENG Jin-yan,YE Shao-pan,GAO Ning,CHEN Zi-tao,DIAO Shu-qi,LI Xiu-jin,YUAN Xiao-long,ZHANG Hao,LI Jia-qi,ZHANG Xi-quan,ZHANG Zhe
1 Guangdong Laboratory of Lingnan Modern Agriculture/Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, P.R.China
2 State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510006, P.R.China
3 Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Sciences and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, P.R.China
Abstract Single-step genomic best linear unbiased prediction (ssGBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model.With the increasing accessibility of whole-genome sequence (WGS) data at the population level,more attention is being paid to the usage of WGS data in ssGBLUP.The predictive ability of ssGBLUP using WGS data might be improved by incorporating biological knowledge from public databases.Thus,we extended ssGBLUP,incorporated genomic annotation information into the model,and evaluated them using a yellow-feathered chicken population as the examples.The chicken population consisted of 1 338 birds with 23 traits,where imputed WGS data including 5 127 612 single nucleotide polymorphisms (SNPs) are available for 895 birds.Considering different combinations of annotation information and models,original ssGBLUP,haplotype-based ssGHBLUP,and four extended ssGBLUP incorporating genomic annotation models were evaluated.Based on the genomic annotation (GRCg6a) of chickens,3 155 524 and 94 837 SNPs were mapped to genic and exonic regions,respectively.Extended ssGBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits,and their advantages ranged from 2.5 to 6.1% compared with original ssGBLUP.In addition,to further enhance the performance of genomic prediction with imputed WGS data,we investigated the genotyping strategies of reference population on ssGBLUP in the chicken population.Comparing two strategies of individual selection for genotyping in the reference population,the strategy of evenly selection by family (SBF) performed slightly better than random selection in most situations.Overall,we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ssGBLUP,and validated the idea that properly handling the genomic annotation information andWGS data increased the predictive ability of ssGBLUP.Moreover,while using WGS data,the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ssGBLUP.The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.
Keywords:genomic selection,prior information,sequencing data,genotype imputation,haplotype
The rapid development of molecular biological techniques has enabled the implementation of genomic selection(GS) (Meuwissenet al.2001) in breeding.GS has been widely implemented in the breeding programs of the main livestock species (e.g.,Tenget al.2020),crops and aquaculture species,since its first proposal.It can achieve a large degree of genetic gains and reduce breeding cost (Goddard and Hayes 2007) in many circumstances by accurately predicting the genetic and phenotypic values of the genotyped candidate individuals(denoted as genomic prediction,GP).GS is currently a promising technical tool and is attracting breeders’attention world-wide.
In the breeding practice of the past decades,the conventional best linear unbiased prediction (BLUP)(Henderson 1975) has successfully contributed to the goal of maximizing the breeding profits with all available information.By incorporating the genotype into conventional BLUP,the single-step genomic best linear unbiased prediction (ssGBLUP) was proposed(Legarraet al.2009) and showed advantages over both the conventional BLUP and standard genomic best linear unbiased prediction (GBLUP) methods in various situations (Christensenet al.2012).Due to the beneficial features of ssGBLUP,it became the standard application of breeding practice for several species,such as pigs(Christensenet al.2012).To further explore the potent of GS,several efforts were taken to increase the accuracy of GS.
One simple means of improving the accuracy of GP is to increase the density of genetic markers.With the decreasing sequence price and increasing power of genotype imputation techniques,high density genotype and whole-genome sequence (WGS) data have become available for various livestock species.WGS is expected to improve the predictive ability of GP (Druetet al.2014),because it includes many more causal mutations (both rare and common variations) than single nucleotide polymorphism (SNP) chip data.In recent years,the use of WGS for improving the GP accuracy has received much attention (e.g.,Pérez-Encisoet al.2015;VanRadenet al.2017;Moghaddaret al.2019).
Another key means of improving the accuracy of GP is to modify the GP models.By posing idealized assumptions regarding the underlying genetic architecture,the classical GP models (GBLUP and ssGBLUP) might not always be optimal for dealing with the relation between the simplistic statistical models and the complex biological processes.Currently,public databases and resources provide growing biological knowledge on the relationship between genomes and phenotypes.By incorporating prior biological knowledge specifically related to the trait of interest,such as results or information from genomewide association studies (GWAS) (Zhanget al.2014),quantitative trait loci database (QTLdb) (Br?ndumet al.2015),or information not specifically related to the trait of interest such as genomic annotation databases(Gaoet al.2017),several modified GP models have been proposed on the basis of GBLUP.With respect to ssGBLUP,its predictive ability was shown to be improved by incorporating major gene or GWAS results (Teissieret al.2018) in few works with limited scenarios.For traits lacking trait-related prior information,the potential of incorporating public genomic annotation information into ssGBLUP is yet to be validated.
Although different types of prior information were investigated for GBLUP,the potential of using public genomic annotation information in ssGBLUP remains need to be investigated.Here,we hypothesized that the utilization of genomic annotation information would benefit to improve the performance of ssGBLUP while using sequence data.Therefore,our study aims to extend ssGBLUP models for incorporating genomic annotation information.Taking a Chinese indigenous chicken population as an example,extended ssGBLUP models were tested and compared with three standard models:conventional BLUP,GBLUP,and original ssGBLUP.Meanwhile,strategies for selection of reference individuals were compared in ssGBLUP using imputed whole-genome genome data.This may provide useful strategies to improve the performance of ssGBLUP and benefit to the use of imputed whole-genome sequence data in future animal and plant breeding programs.
The yellow-feathered chicken population used in this study was derived from a Chinese indigenous breed maintained for 25 generations by Wens Nanfang Poultry Breeding Co.,Ltd.(Yunfu,Guangdong,China) (Zhanget al.2017;Yeet al.2018).The used population was the 3rd batch of the 25th generation.It consisted of 1 338 individuals (721 males,617 females) which came from full-or half-sib families with the mating of 30 males and 299 females from the 24th generation.The average sire family size was 44.6 with a range of 28 to 61 in the population.All these 1 338 individuals were systematically phenotyped for three types of traits including growth traits,carcass traits,and feeding traits.The growth traits included average daily gain(ADG) during 45 and 84 days and body weights (BW)at the age of 45,49,56,63,70,77,84,and 91 days.The carcass traits included abdominal fat weight(AFW),abdominal fat percentage (AFP),breast muscle weight (BMW),carcass weight (CW),drumstick weight(DW),eviscerated weight (EW),eviscerated weight with giblets (EWG),gizzard weight (GW),and intestine length(IL).The feeding traits included average daily feed intake(ADFI),feed conversion ratio (FCR),mid-term body weight (MBW),mid-term metabolic body weight (MMBW),and residual feed intake (RFI).The RFI was calculated from the model ADFI=b0+b1×MMBW+b2×ADG+RFI,in which b0is the intercept,b1and b2are the regression coefficients,and RFI is the residual of the model.
The phenotypic values of all traits were pre-adjusted for the fixed effects.The considered fixed effects included sex (two levels) and pen (six levels) and were estimated by the statistical modely=Xb+Zu+e,whereyis a vector of raw phenotypic records,XandZare design matrices,bis the vector of fixed effects (sex and pen),u~N(0,) is the vector of genetic values,Ais the pedigree-based numerator relationship matrix,is the genetic variance,e~N(0,) is the vector of residuals,is the residual variance andIis the identity matrix.The estimated fixed effects were subtracted from the original phenotypes and then the adjusted phenotypesyc=y-X^bwere used as model response in all GP models.Descriptive statistics for the dataset and the considered traits are shown in Appendix A.
In this chicken population,15 sires (24th generation) and 435 male (25th generation) individuals were genotyped with the Affymetrix 600 K SNP Array (Axiom Genome-Wide Chicken Genotyping Array,Affymetrix,American).Of these individuals,24 individuals that have a maximized expected genetic relationship of the whole population were selected for whole genome re-sequencing referring to Druetet al.(2014).In addition,463 individuals (25th generation,194 males,269 females) were genotyped with the Affymetrix 55K genotyping array.WGS data with 11 645 758 SNPs were then obtained after quality control.Referring to our previous study (Yeet al.2018),a two-step genotype imputation approach from 55 K to 600 K and then from 600 K to WGS data was performed using Beagle v4.1 Software (Browning and Browning 2016).A combined reference panel,which includes the 24 individuals from our chicken population and 311 birds with WGS data downloaded from 10 BioProjects in ENA(https://www.ebi.ac.uk/ena) or NCBI (https://www.ncbi.nlm.nih.gov/),was used for the imputation process of from 600 K to WGS data (Yeet al.2019).After genotype imputation and data quality control,three individuals were discarded.Finally,895 individuals with WGS data in the 25th generation were used for further study.5 127 612 SNPs remained in the WGS data after the quality control with the criteria of minor allele frequency>0.01 and linkage disequilibrium (LD)<1.
The conventional BLUP (Henderson 1975),GBLUP(VanRaden 2008),original ssGBLUP (Legarraet al.2009),and extended ssGBLUP models (described below)were used in this study.The statistical model can in all cases be expressed as:
whereycis a vector of the adjusted phenotypes;μis the overall mean;Zis a design matrix that allocates observations to genetic values;g~N(0,) is a vector of genetic values,in whichKis the kinship matrix,andis the genetic variance;ande~N(0,)is a vector of random residuals,in whichis the residuals variance andIis an identity matrix.
In the BLUP model,the kinship matrixKis the pedigreebased numerator relationship matrixA.In the GBLUP model,the kinship matrixKis the marker-based relationship matrixG,and this was constructed as(VanRaden 2008):
whereMis a matrix of centered genotypes,and piis the minor allele frequency of SNP i.In the original ssGBLUP model,the kinship matrixKis the combined relationship matrixHthat was constructed as (Legarraet al.2009):
whereAis the pedigree-based numerator relationship matrix,where subscripts 1 and 2 represent non-genotyped and genotyped individuals,respectively;and is the relationship matrix of genotyped individuals,where ω is 0.6 and represents the relative weight on polygenic effect(Christensen and Lund 2010),andGis the kinship matrix as used in GBLUP.To ensure that the elements in bothAandGwere at the same scale,Gwas adjusted and replaced according to the following formula (Christensenet al.2012):
In this study,the original ssGBLUP,SNP-based model,was extended to haplotype-based model.Compared with the SNP-based original ssGBLUP,the difference in haplotype-based ssGBLUP (ssGHBLUP) is that the marker-based relationship matrixGinHwas replaced by haplotype-based relationship matrixwhereMHis a matrix representing the number of copies of haplotype alleles,and Q is the number of haploblocks(Meuwissenet al.2014).Phased alleles were obtained using Beagle v4.1 Software (Browning and Browning 2016) and used to construct haploblocks referring to our previous study (Gaoet al.2017).Note that no more than 10 haplotype alleles were included in a haploblock.
To incorporate the annotation information into singlestep GP model,two SNP subsets (genic and exonic SNP sets) were extracted from WGS data with the genomic annotation of chicken (GRCg6a) from Ensembl (http://www.ensembl.org/).Genic SNP set includes SNPs between 5 kb upstream of the protein-coding gene start and 5 kb downstream of the protein-coding gene end.Exonic SNP set includes SNPs within the exonic regions of protein-coding gene.Extracting SNPs subset was performed in the R platform (https://www.r-project.org/).Considering different combinations of annotated SNP subsets and models,four ssGBLUP incorporating genomic annotation (~|GA) models,ssGBLUPgene,ssGBLUPexon,ssGHBLUPgene,and ssGHBLUPexon,were used in this study.ssGBLUPgeneand ssGBLUPexonwere based on SNPcoding model,of which the genic/exonic SNPs were used to construct the marker-based relationship matrixGinHmatrix.For ssGHBLUPgeneand ssGHBLUPexon,the genic/exonic SNP sets were respectively used to construct the haplotype-basedMHinHmatrix.
Extended ssGBLUP models were compared with the conventional BLUP,GBLUP,and original ssGBLUP in the yellow-feathered chicken population.Predictive ability of all models was assessed using ten-fold cross-validation.Genotyped individuals were evenly divided into ten groups.Each group served as the test set (Ntest=89 or 90) once,the rest individuals of the entire population were treated as the training set (Ntrain=895-Ntestfor GBLUP,Ntrain=1 338-Ntestfor other models).This cross-validation procedure was repeated ten times.Variance components were estimated using restricted maximum likelihoodviaLDAK Software (http://dougspeed.com/ldak/) importing binary file of corresponding relationship matrices generated from R platform.Predictive ability was defined as the Pearson’s correlation coefficient between the predicted genetic values and the adjusted phenotypic values in the test set.
While using WGS data in ssGBLUP,the effect of genotyping strategies in reference population on predictive ability was investigated.We evaluated two strategies for selection of reference individuals for genotyping:evenly selection by family (SBF) and random (SBR).In SBF,individuals in the reference population were selected evenly from each family for genotyping to cover maximally all full-or half-sib families.In SBR,individuals in the reference population were selected completely randomly for genotyping.The selected proportion of individuals for genotyping in the reference population ranged from 0.05 to 0.40 in this study.All individuals of the candidate population have WGS data.Note that genotyping strategies of reference population were evaluated in different scenarios as follow.
To mimic the actual conditions of implementing ssGBLUP,three scenarios were considered to divide individuals with WGS data into reference and candidate population,based on their genetic relationships.Scenario I:Offspring from half of the sire families were randomly sampled as the reference population,and other sire families were used as the candidate population.Scenario II:For each sire family,offspring from half of the dam families were randomly sampled within each sire halfsib family as the reference population,and others as the candidate population.Scenario III:Offspring were randomly sampled within each sire-dam full-sib family as the reference population,and others were used as the candidate population.The above mimicking process was repeated 50 times to assess the predictive ability of the WGS-based ssGBLUP in different genotyping strategies(SBF and SBR) and scenarios.For each repetition,reference and candidate populations were resampled.Individuals with WGS data in the reference population were selected by SBF and SBR in different proportions.
Genotype imputation and quality control resulted in imputed WGS data with 5 127 612 SNPs for 895 birds.All these imputed WGS data were used in the GBLUP,original ssGBLUP and ssGHBLUP.The average LD (r2) between adjacent SNPs was 0.24 across all chromosomes.Additionally,genomic annotation were used in~|GA models (ssGBLUPgene,ssGBLUPexon,ssGHBLUPgene,and ssGHBLUPexon).The genomic annotation of chicken (GRCg6a) were obtained from Ensembl (http://www.ensembl.org/).Using the positions information of genomic annotation,SNPs were mapped to the genic or exonic regions.Numbers of SNPs assigned to each of the genomic regions (genes and exons) and numbers of constructed haploblocks using the corresponding SNPs were shown in Table 1.Of imputed WGS data,61.5 and 1.8% SNPs were mapped in genic and exonic regions,respectively.
Table 1 Numbers of single nucleotide polymorphisms (SNPs)or constructed haploblocks in each genomic region
In this study,conventional BLUP,GBLUP,original ssGBLUP,and extended ssGBLUP models were performed in a yellow-feathered chicken population.The heatmaps of genetic relationship matrices constructed by original ssGBLUP and extended ssGBLUP models areshown in Fig.1,where the individuals were in the same order in these matrices.There were high covariances(red color blocks) along the diagonal shared similar positions for all matrices.The main differences for different matrices are the elements of off-diagonal.The elapsed time of building genomic relationship matrix using genic and exonic regions were reduced more than 30 and 90% compared to using WGS data (Appendix B).The predictive abilities and estimated heritabilities (h2) of all models for different traits are shown in Fig.2 and Table 2,respectively.Overall,original ssGBLUP and the extended ssGBLUP models outperformed conventional BLUP and GBLUP in 22 out of 23 test traits with respect of predictive ability (Table 2).The predictive ability of GBLUP slightly lower than BLUP due to the different size of reference population used in these two models.Using exonic SNPs,ssGBLUPexonand ssGHBLUPexonoutperformed other models with respect to predictive ability in 15 traits,and their advantages ranged from 2.5 to 6.1%compared with original ssGBLUP.Also,ssGBLUPexonand ssGHBLUPexongave higher estimated heritability than original ssGBLUP (Fig.2).Extended ssGBLUP using genic SNPs (ssGBLUPgeneand ssGHBLUPgene)yielded higher predictive ability than original ssGBLUP in 9 out of 23 traits with advantages ranging from 2.5 to 2.9%.Compared with SNP-based ssGBLUP models,the corresponding haplotype-based ssGBLUP models yielded similar predictive ability.The unbiasedness for all models are shown in Appendix C.
Table 2 Predictive ability of all models using whole-genome sequence data in a yellow-feathered chicken population1)
Fig.1 Heatmaps of genetic relationship matrices of 1 338 individuals in the yellow-feathered chicken population.A,original singlestep genomic best linear unbiased prediction (ssGBLUP).B,C,E,and F,four ssGBLUP incorporating genomic annotation (~|GA)models,ssGBLUPgene (B),ssGBLUPexon (C),ssGHBLUPgene (E),and ssGHBLUPexon (F),respectively.D,haplotype-based ssGBLUP(ssGHBLUP).Individuals are in the same order in all six panels.
Fig.2 Estimated heritability (h2) of the all traits using different models and whole-genome sequence data in a yellow-feathered chicken population.ADG=average daily gain;BW*=body weight,e.g.,body weight at 45 d (BW45);AFP=abdominal fat percentage;AFW=abdominal fat weight;BMW=breast muscle weight;CW=carcass weight;DW=drumstick weight;EW=eviscerated weight;EWG=eviscerated weight with giblets;GW=gizzard weight;IL=intestine length;ADFI=average daily feed intake;FCR=feed conversion ratio;MBW=mid-term body weight;MMBW=mid-term metabolic body weight;RFI=residual feed intake.ssGBLUP,original singlestep genomic best linear unbiased prediction;ssGHBLUP,haplotype-based ssGBLUP.
Genetic relationship between reference and candidate populationTo investigate the effect of the genetic relationship on prediction ability,we considered three scenarios varying the grouping of reference and candidate populations.Results showed that the genetic relationship between the reference and candidate population increased from scenario I (weak) to scenario III (strong).The average kinship coefficient (standard deviation) for scenario I,II,and III across the 50 repetitions was 0.0129(0.0005),0.0173 (0.0002),and 0.0180 (0.0000),respectively.The predictive abilities of BLUP and the WGSbased ssGBLUP for BW70 and AFP traitsare shown in Fig.3.The predictive ability of GP increased from scenario I to scenario III for both two traits.For BW70,the average predictive ability increased,ranging from 0.266 to 0.309 in BLUP,0.268 to 0.312 in ssGBLUPSBF,and 0.266 to 0.327 in ssGBLUPSBR,respectively.For AFP,the average predictive ability increased,ranging from 0.256 to 0.317 in BLUP,0.266 to 0.328 in ssGBLUPSBF,and 0.2661)BLUP=conventional best linear unbiased prediction;GBLUP=genomic best linear unbiased prediction;ssGBLUP=single-step genomic best linear unbiased prediction.Subscript gene/exon represents that the model used genic/exonic SNPs for constructing the genomic relationship matrix.
Fig.3 Predictive ability for different genetic relationships between the reference and candidate population for body weight at 70 d(BW70) and abdominal fat percentage (AFP).BLUP indicates the conventional best linear unbiased prediction.ssGBLUPSBF indicates the original single-step genomic best linear unbiased prediction (ssGBLUP) using the genotyping strategy of evenly selection by family.ssGBLUPSBR indicates the original ssGBLUP model using the genotyping strategy of selection by random.
2)ADG=average daily gain;BW*=body weight,e.g.,body weight at 45 d (BW45);AFP=abdominal fat percentage;AFW=abdominal fat weight;BMW=breast muscle weight;CW=carcass weight;DW=drumstick weight;EW=eviscerated weight;EWG=eviscerated weight with giblets;GW=gizzard weight;IL=intestine length;ADFI=average daily feed intake;FCR=feed conversion ratio;MBW=mid-term body weight;MMBW=mid-term metabolic body weight;RFI=residual feed intake.
Data are mean±standard error from 10×10 cross-validation.The bold format representsthe predictive ability of genomic prediction higher than original ssGBLUP in the trait (row).to 0.327 in ssGBLUPSBR,respectively.Moreover,a similar trend for other traits was shown in Appendix D.
Proportion and strategy of individual selection for genotyping in the reference populationThe relationship (regression curve) between the predictive ability of the WGS-based ssGBLUP and the genotyped proportion of reference individuals was obtainedvialinear fitting(Fig.4;Appendix E).With the increase of the genotyped proportion,an upward tendency for predictive ability was shown for BW70 and AFP in major scenarios.For BW70,the slope of the regression curve ranged from 0.005 to 0.019,except for ssGBLUPSBRin scenario I.Other six growth traits (BW49,BW56,BW63,BW77,BW84,and BW91) showed a tendency similar to BW70.For AFP,the slope of the fitted curve ranges from 0.014 to 0.027.The same upward tendency as AFP was observed in major scenarios for the other eight carcass traits.Moreover,the tendency of feeding traits was similar to growth traits.
Two strategies for selection of reference individuals for genotyping were compared.These strategies included SBF and SBR.The predictive ability of ssGBLUP with SBF outperformed SBR for BW70 in scenarios I and III(Fig.4).The results showed that the slope of SBF (0.005)was greater than the slope of SBR (-0.001) in scenario I,and the slope of SBF (0.006) was greater than the slope of SBR (0.004) in scenario III.The advantage of SBF over SBR was also observed for the other eight growth traits (Appendix E).In all nine growth traits,the predictive ability of ssGBLUP with SBF was better than SBR for seven traits in scenario I and nine traits in scenario III,respectively.Moreover,it was also observed that the predictive ability of ssGBLUP with SBF outperformed SBR for AFP in scenario III (Fig.4),in which the slope and intercept of SBF (0.027 and 0.318) were greater than SBR (0.025 and 0.317).Compared to using SBR,the advantage of using SBF was shown in seven out of nine carcass traits in scenario III (Appendix E).For feeding traits,ssGBLUP with SBF was slightly better than SBR for MBW,MMBW and RFI in several scenarios,and others showed little advantage (Appendix E).
Fig.4 Predictive ability of single-step genomic best linear unbiased prediction (ssGBLUP) using whole-genome sequence data under different genotyped individual selection strategies in body weight at 70 d (BW70) and abdominal fat percentage (AFP).SBF indicates the original ssGBLUP model using the genotyping strategy of evenly selection by family.SBR indicates the original ssGBLUP model using the genotyping strategy of selection by random.Scenario I-III represents the genetic relationship between the reference and candidate population increased from weak to strong.
This study extended ssGBLUP and incorporated genomic annotation into the model.This model was evaluated in a yellow-feather chicken population.Results showed that incorporating genomic annotation improved certain predictive ability and reduced greatly runtime of building genomic relationship matrix.Hence,our hypothesis that the utilization of genomic annotation information would benefit to improve the performance of ssGBLUP while using WGS data has been confirmed by the results of the current study.Moreover,our study investigated two strategies of individual selection for genotyping in the reference population,and showed that maximizing the expected genetic relationship between reference and candidate populations would further improve the predictive ability of ssGBLUP using imputed WGS data in the yellow-feather chicken population.This study will provide useful strategies to improve the performance of ssGBLUP and benefit to the use of WGS data in breeding programs.
Our analyses showed that incorporating genomic annotation information improved certain predictive ability of ssGBLUP for most traits (Table 2).Meanwhile,the models incorporating genomic annotation explained more genetic variance than other models for most traits (Fig.2).This indicated that incorporating gene annotation into ssGBLUP may be a useful approach of improving genetic gain,especially for these traits lacking relevant research.For traits having major genes or QTNs,a previous study suggested that differential treatment of genetic variants should be considered in ssGBLUP (Fragomeniet al.2017).Several strategies were proposed to weight important variants in relationship matrix for improving the performance of ssGBLUP (Legarra and Vitezica 2015;Teissieret al.2018).Overall,our results and previous studies demonstrated that the predictive ability of ssGBLUP can be improved by incorporating various prior knowledge including major genes,QTNs,or gene annotation information.In practice,several crucial factors affecting the performance of extended ssGBLUP should be considered comprehensively,such as the reliability of gene annotation,marker density,or linkage disequilibrium.
From the comparison of predictive ability between~|GA models and the corresponding models using all SNPs(Table 2),~|GA models using genic or exonic markers were more accurate.In~|GA models,the hypothesis was that the trait of interest would be primarily affected by protein-coding functional element (such as genes and exons) rather than another non-protein-coding functional element.The previous study found that adding these markers located in the non-coding protein region into the prediction model would not improve the predictive ability(Gaoet al.2017).Filtering genetic markers based on genomic annotation information is expected to improve the predictive ability of GP (Pérez-Encisoet al.2015;Gaoet al.2018;Xianget al.2019),because valid information is extracted and noise is potentially reduced.Therefore,highly reliable extra information from the genomic annotation database is a critical factor to ensure the optimal performance of extended ssGBLUP incorporating gene annotation information.Specifically,the availability and usefulness of prior biological information should be paid more attention to while using~|GA models.
Mapping SNPs to genic or exonic regions is required in the~|GA models.In the current study,using imputed WGS data,the results showed that 61.5 and 1.8% SNPs respectively located in the genic and exonic regions were used in~|GA models while other SNPs were discarded.The dramatical reduction in SNPs number did not reduce the predictive ability,but greatly reduced the runtime of building genomic relationship matrices (Appendix B).It also may help to increase the computational efficiency for the time-consuming methods (e.g.,“Bayesian alphabet”and machine learning).Hence,incorporating genomic annotation information into genomic prediction model is a promising approach and is recommended when using WGS data.While using low-density marker data,it can be inferred that fewer genic and exonic regions can be captured.Many real and effective functional regions related to considered traits will be neglected.Also,many studies have confirmed that the predictive ability of GP was affected by marker density (e.g.,Solberget al.2008).From a biological point of view,the gene or exon is the protein-coding functional region of organisms;thus,we speculate that relatively few genes or exons being used in~|GA models would be detrimental for the prediction.Thus,this study suggests that high-density or WGS data should be used for~|GA models.
In the haplotype-based ssGBLUP models,genetic markers were treated by building haploblocks.The haplotype-based model hypothesis states that,tracing the individuals’ genomic relationship based on the identity by state (IBS) of single SNPs may put the base population too far back in time (Meuwissenet al.2014).Compared with mutations of single SNPs,the recombination of haplotypes occurs more frequently in a population.The genetic relationship between individuals may be better reflected based on haplotypes.But the haplotype-based model yielded a similar predictive ability to the corresponding SNP-based model (Table 2),which is different from a previous study in pig population(Meuwissenet al.2014).This may be caused by the LD pattern of the genome affecting the average length of haploblock.Rapid LD decay in genome would lead to few SNPs included in one haploblock and reduce the advantage of using haplotype.Caluset al.(2008) also concluded that average LD between adjacent markers affects the performance of haplotype-based model.
When implementing the ssGBLUP in practice,optimizing genotyping strategy of reference population is important for reducing cost and improving predictive ability.To evaluate different genotyping strategy,we conducted a special design to mimic three genetic relationships from distant to close.Results showed that the accuracy increases for ssGBLUP with the proportion of individuals with WGS data in the reference population for most traits (Fig.4;Appendix E).Liet al.(2019)reported similar results using SNP chip data in swine.Also,they found that genotyping 40-50% swine in each litter yielded an equivalent predictive ability of genotyping all swine (Liet al.2019).In addition,our results showed opposite results for few traits such as ADG and BW45(Appendix E).This might be caused by the genome information have less contribution to predict these traits in our chicken population (i.e.,for ADG and BW45 in Fig.2).Taken together,a cost-optimal genotyping proportion should be considered in breeding practice to optimize the accuracy of ssGBLUP with WGS data.
Individual selection strategy is essential for genotyping a small proportion of individuals in reference population.This was rarely researched in chicken,while have been widely investigated in other species (Hortonet al.2015;Granleeseet al.2019).Our results showed that the predictive ability of ssGBLUP was improved when using the SBF strategy in scenario with a close genetic relationship between reference and candidate populations(Fig.4;Appendix E).It may benefit from the reference individuals selected by SBF strategy represented the half-or full-sib families maximally.When the genetic relationship between reference and candidate population was relatively distant,the predictive ability of ssGBLUP was slightly affected by the strategy for selection of reference individuals in quite a number of traits.Overall,constructing a reference population and selecting individuals for genotyping with maximizing the expected genetic relationship between reference and candidate population,would potentially improve the performance of ssGBLUP or extended ssGBLUP with WGS data.
Finally,it should be noted that the finite population size may introduce random errors in the present study.To address this issue,10 and 50 replicates were performed respectively for the assessment of models and the comparison of genotyping strategies.The individuals in test set or validation population were the same for all tested models and strategies within each replication,which made the comparison fair.
Incorporating genomic annotation information into ssGBLUP based on imputed whole-genome sequence data can slightly improve the accuracy of genomic prediction.Moreover,maximizing the expected genetic relationship between reference and candidate populations is also a potentially useful strategy for improving the performance of ssGBLUP.In summary,suitable prediction model and optimal genotyping strategy of reference population should be comprehensively considered while implementing ssGBLUP with imputed whole-genome sequence data in animal and plant breeding.
This work was supported by the National Natural Science Foundation of China (32022078) and the Local Innovative and Research Teams Project of Guangdong Province,China (2019BT02N630).Also,the authors are grateful for the support from the National Supercomputer Center in Guangzhou,China.
The authors declare that they have no conflict of interest.
This study was carried out in accordance with the recommendations of Animal Care Committee of South China Agricultural University (Guangzhou,China).The protocol was approved by the Animal Care Committee of South China Agricultural University.Animals involved in this study were humanely sacrificed as necessary to ameliorate their suffering.
Appendicesassociated with this paper are available on http://www.ChinaAgriSci.com/V2/En/appendix.htm
Journal of Integrative Agriculture2022年4期