WANG Jie,LEI Qiu-xia,CAO Ding-guo,ZHOU Yan,HAN Hai-xia,LIU Wei,LI Da-peng,LI Fu-wei,LIU Jie#
1 Poultry Institute,Shandong Academy of Agricultural Sciences,Jinan 250100,P.R.China
2 Poultry Breeding Engineering Technology Center of Shandong Province,Jinan 250100,P.R.China
Abstract Many different chicken breeds are found around the world,their features vary among them,and they are valuable resources. Currently,there is a huge lack of knowledge of the genetic determinants responsible for phenotypic and biochemical properties of these breeds of chickens. Understanding the underlying genetic mechanisms that explain across-breed variation can help breeders develop improved chicken breeds. The whole-genomes of 140 chickens from 7 Shandong native breeds and 20 introduced recessive white chickens from China were re-sequenced. Comparative population genomics based on autosomal single nucleotide polymorphisms (SNPs) revealed geographically based clusters among the chickens. Through genome-wide scans for selective sweeps,we identified thyroid stimulating hormone receptor (TSHR,reproductive traits,circadian rhythm),erythrocyte membrane protein band 4.1 like 1 (EPB41L1,body size),and alkylglycerol monooxygenase (AGMO,aggressive behavior),as major candidate breed-specific determining genes in chickens. In addition,we used a machine learning classification model to predict chicken breeds based on the SNPs significantly associated with recourse characteristics,and the prediction accuracy was 92%,which can effectively achieve the breed identification of Laiwu Black chickens. We provide the first comprehensive genomic data of the Shandong indigenous chickens. Our analyses revealed phylogeographic patterns among the Shandong indigenous chickens and candidate genes that potentially contribute to breed-specific traits of the chickens. In addition,we developed a machine learning-based prediction model using SNP data to identify chicken breeds. The genetic basis of indigenous chicken breeds revealed in this study is useful to better understand the mechanisms underlying the resource characteristics of chicken.
Keywords: chicken,genome,genetic diversity,support vector machine
Chicken,Gallusgallus,is valuable not only as a food source but also as a model organism for scientific research. In the past thousands of years,hundreds of chicken breeds have diverged under natural and artificial selection under a wide variety of conditions. For most of their history,domestic chicken populations have been bred for 2 purposes,egg laying and meat production (Aggrey 2010),and a wide range of breeds were developed to meet various human demands. Both domestication and artificial selection have led to rapid phenotypic changes and resulted in distinct features among breeds. As a result,chickens have undergone significant phenotypic differentiation in body size,plumage,egg color,and flying ability (West and Zhou 1988). Analysis of genetic data and patterns can be useful to trace the origin of specific breeds or detect specific traits within breeds (Jeonget al.2016). The screen for selective sweeps in populations of commercial broilers revealed many loci that make sense in relation to selection for muscle growth (Wang Qet al.2020). Genetic variants present in several chicken breeds have been used to support the characterization of specific traits in chicken breeds (dermal pigmentation,polydactyly,silkie feathering or hookless,feathered legs,vulture hock,rose comb,and duplex comb (Dorshorstet al.2010),feather colour (Changet al.2012),meat quality traits (Sunet al.2013),rumpless (Freeseet al.2014),polydactyly(Sunet al.2014)). Using this approach,a wide range of genomic studies on domestic animals,and in chickens specifically,have been conducted to investigate the genetic architecture of these breeds.
China,as a possible place of origin of the domestic chicken and a vast country with abundant diversities in geography and culture (Miaoet al.2013;Wang M Set al.2020),has accumulated the most abundant genetic resources of Chinese indigenous chickens through extensive natural and artificial selection,with considerable genetic variation and phenotypic diversity in terms of morphology and physiology (Nieet al.2019).With different domestication and breeding goals,a large number of chicken breeds were distributed throughout China,and some of their characteristic features varied from one breed to another. These features fall broadly into 2 categories,one includes production features,such as growth,reproduction,carcass and meat quality,and the other includes biological characteristics,such as body shape,appearance,and coat color. A comprehensive and detailed understanding of the genome diversity of Chinese indigenous breeds could reveal the population dynamics of these breeds,thus providing a theoretical basis to facilitate conservation and breeding programs. During a long period of domestication and selection,each breed developed its own production features,biological characteristics,and breed-specific genomic variant features. Understanding the underlying genetic mechanism of the characteristic features of the breed could help breeders develop the ideal chicken breed.
In this study,we sampled and whole-genome resequenced 160 chickens of 9 varieties,including one representative commercial chicken breed that is commonly farmed in China and 8 nationwide indigenous chicken breeds from the Shandong Province of China.Long-term breeding and artificial selection have shaped the specific genetic variation of these chicken breeds. As a result,they have different phenotypic traits,including morphological,growth,and reproduction traits. We hypothesized that the 9 chicken breeds investigated in this study would have different genomic features that reflect their specific phenotypic characteristics;this outcome would be expected to result from the strong natural or artificial selection between these breeds that occurs during the process of long-term breeding. Thus,the objective of this study was to detect differential genomic footprints between the 9 chicken breeds obtained from the Shandong Province by conducting a genome-wide scan of selection signatures using the fixation index (Fst) and Di methods.
A total of 160 blood samples from 9 chicken populations,namely 7 Chinese nationwide indigenous chicken breeds and one introduced representative breed,including 20 Laiwu Black (LB) chickens,20 Langya (LY) chickens,20 Shouguang (SG) chickens,20 Luxi Mini (LM) chickens,20 Jining Bairi (JB) chickens,20 Wenshang Barred (10 WB1,indigenous;10 WB2,which was selected for commercial traits) chickens,20 Luxi Gamecock (LG) chickens,and 20 Recessive White (RW) chickens. JB chicken mainly used for egg production,is a Chinese indigenous breed.SG and LB chicken are used for both meat and egg productions with large bodies,WB and LY chicken are used for both meat and egg productions with medium bodies. LM chicken is a small ornamental breed. LG chicken has accumulated some unique morphological and behavioral signatures such as small comb size,large body size,muscularity and aggressive behavior. The detailed phenotypes are shown in Table 1 and Appendix A. For all birds,blood was collected from wing veins and rapidly frozen and stored at -20°C. Total genomic DNA was extracted using a conventional phenol-chloroform method,and the quality and quantity of DNA were determined using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific,Waltham,MA,USA) and by agarose gel electrophoresis.
Each DNA sample of adequate quality was used to generate paired-end libraries using standard procedures.The average insert size was 500 bp,and the average read length was 150 bp. All libraries were sequenced onan Illumina?HiSeq X Ten Sequencing System (Illumina,San Diego,CA,USA) to an average raw read sequence coverage of 10×. This depth ensured the accuracy of variant calling and genotyping and met the requirements for population genetic analysis. The genome sequence data reported in this article is being uploaded to the genome sequence file of the BIG Data Center of the Beijing Institute of Genomics,Chinese Academy of Sciences,and is publicly available from http://bigd.big.ac.cn (CRA006685).
Raw data were processed using Perl scripts to ensure the quality of the data used in further analysis. The screening criteria were as follows: (1) reads containing adapter sequence (more than 5 bases) must be removed;(2) lowquality reads (those in which more than 50% of bases have Phred Quality value less than 19) must be removed;(3) reads in which more than 5% of bases are N must be removed. In paired-end sequencing data,both reads of a pair would be screened out if either was adaptor-polluted.The screened data were evaluated in terms of its quantity and quality,including by Q30,data quantity,and base content statistics. The Burrows-Wheeler aligner (Li and Durbin 2009) was used to map clean reads to the chicken reference genome (http://ftp.ensembl.org/pub/release-96/fasta/gallus_gallus/dna/). Samtools v1.2 Software (Liet al.2009) was used to sort reads,and MarkDuplicates in Picard tools v1.13 (http://broadinstitute.github.io/picard/)was used to remove duplicate reads resulting from PCR amplification. Reads mapped to 2 or more positions were likewise screened out. Statistics were tabulated with our in-house Perl script. The Genome Analysis Toolkit(McKennaet al.2010) Haplotype Caller was used for single nucleotide polymorphism (SNP) callingvialocal reassembly of haplotypes for the population. SNPs were then screened before further analysis using the GATK Variant Filtration tool with the following settings: QD<2.0,ReadPosRankSum<-8.0,FS>60.0,QUAL<30.0,DP<4.0.
SNPs with an inheritance or genotyping error,a minor allele frequency <5%,a call rate <95%,and SNPs not assigned to 1-28 chromosome were screened out. Ultimately,after quality control,11 209 417 SNPs distributed on 28 chromosomes remained. Accordingly,a total of 11 209 417 SNPs and 160 chickens were used for the analysis.
Principal component analysis (PCA) was performed on all SNPs (chr1-chr28) using Plink (version 1.9) for the analysis of population structure. The 9 chicken populations were separated by the first 3 principal components (PCs). Plots of the first,second and third PCs from the PCA were plotted with R packages.
The genome-wide unlinked SNP dataset and the modelbased assignment software program ADMIXTURE 1.3.0 were used to quantify genome-wide admixture between the 9 chicken populations (RW,JB,WB1,WB2,LG,LB,LM,LY,and SG),in order to estimate the ancestry of each individual. To estimate the parameter standard errors used to determine the optimal group number (K),ADMIXTURE was run with 200 bootstrap replicates for each possible group number (K=1 to 8).
We used the TreeMix Software (Daset al.2020) to infer the historical relationships among the 9 chicken populations included in this study. We ran TreeMix with migration events given from 1 to 5,and generated their corresponding residual matrix,using options “-noss” and“-k 1000”.
We used the genetic differentiation (fixation index,Fst)between individual chicken breeds (e.g.,LB chickens)and other chicken breeds (chicken breeds except LB chickens) upon sliding windows,to identify the genomic regions under selection in every indigenous population.Considering that the populations harbor relatively high genome nucleotide diversity and heterozygosity in chickens,we narrowed down the window size and step size to 40 and 10 kb when calculating both Weir-Fst value of each window. The top 1% outliers of the bins were regarded as the putative genomic regions under selection,and further annotated using Ensembl BioMart tool.
For the SNPs in the genomic regions under selection,we used a simple summary statistic (di) to measure the locus specific divergence in allele frequencies for each breed based on unbiased estimates of pairwise Fst (Akeyet al.2010). In particular,for each SNP in the significant window we calculated the statistic=,whereE[Fstij] andsd[Fstij] denote the expected value and standard deviation of the Fst between breedsiandjcalculated from all 22 083 SNPs from the 40 regions. The top 1% outliers of the bins were regarded as the putative genomic SNPs under selection. Linkage disequilibrium(LD) pruning was then performed with a window size of 25 SNPs,a step of 5 SNPs,and anr2threshold of 0.2 for every indigenous breed chicken,yielding 115 independent SNP markers.
The phylogenetic tree of the sequenced breeds was constructed using the TASSEL 5.0 Software,based on the detected autosome SNPs and corresponding genotypes.Afterwards,the tree was imported into iTOL (Letunic and Bork 2021) (https://itol.embl.de/) to generate the tree views.
Support vector machine (SVM) algorithm (Smola and Lkopf 2004) for regression was used as learner with the aim of identifying,for a set of prediction variables(SNPs),a function that can identify specific breeds from other breeds. The power of the SVM algorithm resides in a particular mathematical element known as kernel. One of the most used kernel is the Gaussian Radial Basis Function (RBF) as it can be used to obtain almost every surface (Cristianini and John Shawe-Taylor 2000). One of the main parameters in a SVM classifier is the “cost parameter” (cost),which is a tradeoff between the prediction error and the simplicity of the model. Gamma is the other hyperparameter of the SVM regarding the Gaussian function inside the RBF kernel.The performance of the SVM classifier is very sensitive to changes in this parameter. The tested values for the cost hyperparameter were 0.001,0.1,1,10,and 100 and for the Gamma parameter were 0.005,0.05,0.5,and 5. The “e1071” R package was used for the analyses(Meyeret al.2015). The receiver operating characteristic(ROC) curve is a measure of the performance of a machine learning classifier model. The 18 SNPs data of 969 chickens from 18 chicken populations used for the SVM model are shown in Appendix B,including 398 LB chickens,341 Jingxing Yellow chickens,9 Bearded chickens,28 Daweishan Mini chickens,21 Piao chickens,21 Wuding chickens,10 Tibetan chickens,7 Dagu chickens,30 Red jungle fowls and 134 individuals measured in this study. The supplementary data of the SNPs of background chicken breeds were provided by the Institute of Animal Sciences,Chinese Academy of Agricultural Sciences.
In this study,we whole-genome re-sequenced a total of 160 chicken samples,from 8 nationwide indigenous chicken breeds found throughout the Shandong Province.A total of 778 Gb of high-quality sequencing data was aligned against the reference chicken genome (GRCg6a)using BWA (v0.7.17) (Li and Durbin 2010). The average mapping rate for all individuals was approximately 97.97%;the average coverage (≥1×) for all individuals was approximately 96.98%,and the average depth was approximately 10.06X (Appendix C). A total of 22 082 777 SNPs shared by both SAMtools and GATK pipelines were identified. Following the screening criteria listed in the “Materials and methods” section,a total of 11 209 417 genome-wide population SNPs were obtained.
The PCA analysis revealed that the top 3 PCs separate 4 populations (WB1,LM,SG,and RW) from all the populations (Fig.1-B). In particular,the JB,LG,WB2,LB,and LY chickens were genetically classified very near each other. WB1,LM,SG,and RW as populations showing the highest level of genetic differentiation with others (Table 2),their unique genetic variation pattern was also evidenced by the top 3 PCs forming an independent cluster from the others. Additionally,the phylogenetic relationships reproduced the PCA results. According to the neighbor-joining tree,all the 160 chickens from 9 populations could be separated into 2 clusters (Appendix D). Among them,Cluster 1 included LM chicken,while Cluster 2 consisted of the remaining 8 Shandong indigenous chicken populations and one introduced representative population. Except for the populations of LY chickens,all the remaining populations were separated into their own clade. The above phylogenetic analysis indicated that the LY chickens did not group well into one cluster,suggesting their complex genetic structure. In addition,it was hard to separate the populations of LY and JB from each other,suggesting that there is a common ancestor between them during their evolution.
Table 2 Genome-wide genetic differentiation between each population1)
To infer the degree of admixture among the 160 chickens,we further performed an unsupervised Admixture analysis,with K run from 1 to 8. Consistent with the above PCA result (Fig.1-B),at K=4,we found that genetic divergency first occurred between introduced populations and indigenous ones. Except for LG,SG,LM and WB1,a potential widespread genetic exchange occurred from introduced populations to other Chinese indigenous ones from K=4 to K=5 (Fig.1-C),inclusively. At K=4,the RW,WB1,LG,LM,and SG chicken populations still remained distinct (Fig.1-C),while the other group was mainly represented by JB,LB,and LY chickens. These chickens appeared to be genetically the admixture of indigenous chickens. We clearly observed a potential widespread genetic exchange between Shandong indigenous chickens during the population evolutionary process.
Given that the above Admixture analysis suggested a potential widespread introgression from commercial chickens into most Shandong indigenous breeds,and to better understand the historical relationship within the 9 populations,we further used the TreeMix Software to reconstruct a maximum likelihood (ML) tree,which allows both the split and migration events of the populations. In this ML tree (Fig.1-D),a gene flow from RW chickens into LY breeds was observed,which is consistent with the Admixture results indicating a potential widespread exchange from introduced chickens into Chinese indigenous chickens. Noticeably,among the 3 gene flows observed between Chinese indigenous breeds,in one of them the gene flows from SG chickens into LB chickens.
Fig. 1 Geographical distribution,population structure analysis and TreeMix analysis of 9 chicken populations included in this study.A,origin of samples in this study. B,principal component analysis (PCA). C,admixture analysis at K=4,5,6. D,TreeMix analysis of Shandong indigenous and introduced chickens. JB,Jining Bairi chickens;LG,Luxi Gamecock chickens;WB,Wenshang Barred chickens;LB,Laiwu Black chickens;LM,Luxi Mini chickens;LY,Langya chickens;SG,Shouguang chickens;RW,Recessive White chickens.
Fig. 2 Genome-wide putative selective signatures in the 9 populations. Manhattan plot for each window calculated by Fst. JB,Jining Bairi chickens;LG,Luxi Gamecock chickens;WB,Wenshang Barred chickens;LB,Laiwu Black chickens;LM,Luxi Mini chickens;LY,Langya chickens;SG,Shouguang chickens;RW,Recessive White chickens.
Fig. 3 Haplotype diversity of TSHR and EPB41L1 among 160 chickens. The reference and alternative alleles of each site are separately colored in red and blue,while the yellow denotes the heterozygote allele. JB,Jining Bairi chickens;LG,Luxi Gamecock chickens;WB,Wenshang Barred chickens;LB,Laiwu Black chickens;LM,Luxi Mini chickens;LY,Langya chickens;SG,Shouguang chickens;RW,Recessive White chickens.
Fig. 4 Genomic distribution of population structure in 9 chicken populations. The distribution of di for each single nucleotide polymorphism (SNP) interval across 40 significant windows is shown for each population. The dashed red line denotes the 99th percentile for each breed. JB,Jining Bairi chickens;LG,Luxi Gamecock chickens;WB,Wenshang Barred chickens;LB,Laiwu Black chickens;LM,Luxi Mini chickens;LY,Langya chickens;SG,Shouguang chickens;RW,Recessive White chickens.
Fig. 5 The classification results derived from the significant single nucleotide polymorphism (SNP) data. A,the phylogenetic tree derived from 115 SNPs. B,the ROC curve and AUC show the ability of the SVM model to distinguish among classes. JB,Jining Bairi chickens;LG,Luxi Gamecock chickens;WB,Wenshang Barred chickens;LB,Laiwu Black chickens;LM,Luxi Mini chickens;LY,Langya chickens;SG,Shouguang chickens;RW,Recessive White chickens.
After removing 10 873 360 SNPs,11 209 417 SNPs and 93 458 windows were retained in subsequent analysis of Fst values,respectively. Using a threshold of the top 1% outliers of windows as the putative selective genomic regions,we identified 540 genomic regions in the Fst analyses (Fig.2). We further annotated the candidate genomic regions identified above,enabling us to identify 397 candidate genes in terms of the respective Fst analyses (Appendix E). Additionally,we merged and annotated the selective genomic regions according to the breeds characteristics and obtained a total of 58 genes related to the various characteristics (Appendix F). For instance,the thyroid stimulating hormone receptor (TSHR)gene related to circadian rhythm,theADGRL3gene related to aggression,theRAC1,andEPB41L1genes related to body size and skeletal development,theGRM5gene related to feather color,etc.,were all identified as breed-specific genes.
The JB chicken is a small local breed mainly used for egg production. It has rare early maturing traits (starting at about 100 days old). In this breed,the reproduction-related genesTSHR(Fig.3),ANK2,SP1RE2andPSMD7were identified. The LM chickens are a small sized Chinese indigenous breed with an adult body weight (BW) at 5 mon of age of 0.86 kg for hens or 1.2 kg for cocks. In this chicken,the body-size-related genesMGAT4C,PPP1CB,RAC1,andEPB41L1(Fig.3) were identified. Gamecock chickens are one of the earliest recorded birds in China and have accumulated some unique morphological and behavioral signatures,such as large body size,muscularity and aggressive behavior,thereby being excellent breeding materials and a good model for studying bird muscular development and behavior. In this study,the genesAGMO,ADGRL3,andVSTM2Awere identified in this chicken.
In this study,we combined the results of the Fst analysis of 9 populations of chickens,grouping the significant windows into 40 genomic regions. To further identify the SNPs with the main effect in each region,we introduced the di statistical index to analyze the SNPs in these regions (Fig.4). Using a threshold of the top 1% outliers of all SNPs as the putative selective SNPs,we identified 1 142 SNPs by di analysis (Appendix G). LD pruning for 1 142 SNPs was then performed,separately for each population,with a window size of 25 SNPs,a step of 5 SNPs,and anr2threshold of 0.2,yielding 115 independent SNP markers (JB,16 SNPs;LG,21 SNPs;WB1,7 SNPs;WB2,7 SNPs;LB,21 SNPs;LM,14 SNPs;LY,8 SNPs;SG,16 SNPs;RW,4 SNPs). Additionally,we used the 115 breed-specific SNPs (Appendix H) to construct a neighbor-joining phylogenetic tree (Fig.5),which separated all chickens from the 9 populations into 9 clusters.
We used 18 SNP markers (3 were removed by call rate) identified in LB chickens to verify the detection efficacy of the SNP markers in breed identification. In this section,we used 18 SNPs from 969 chickens (Appendix B) including 410 LB chickens,559 background chickens.Using 18 SNPs as prediction variables,LB chickens or not as prediction values,20 random samplings were conducted with 70% of the individuals as the training group to train the SVM model. Best values for the cost hyperparameter was 1,and for the Gamma parameter was 0.05. The results of 20 samplings showed that the prediction accuracy of 30% of the test population is 0.92±0.02 (Fig.5),which can effectively achieve the breed identification of LB chickens.
This study is the first to provide comprehensive wholegenome sequencing data from the Shandong indigenous chicken and the first to identify its genomic variants.This study also describes the genetic structure and molecular background of the characterized breedspecific phenotypes of these chickens. Whole-genome sequencing technology has greatly advanced scientific research on the molecular foundations of various complex phenotypic poultry traits,such as body size in chickens (Wanget al.2016),body size and plumage color in ducks (Zhouet al.2018),as well as maturation and plumage color in domestic quails (Wuet al.2018),among others. In particular,machine learning methods are increasingly attracting research interest as they can be used to perform functional identification of new genes,classification of chicken resources (Donget al.2021)and genome-wide predictions (Alveset al.2020). In this study,we implemented a genome-wide scan of selection signatures within the genomes of Shandong indigenous chickens using the Fst and di methods. From the results,we identified 40 differential outliers between 9 breeds according to the empirical distribution for Fst,and found 45 genes based on these outliers,includingANK2,SRIRE2,TSHR,RAC1,MGAT4C,EPB41L1,AGMO,ADGRL3,andVSTM2A. These results support our hypothesis that Shandong indigenous chickens have distinctly different selected footprints on their genomes due to strong natural or artificial selection between these breeds that occurs during the process of long-term breeding,and different genomic features that reflect their specific phenotypic characteristics. These identified differential genomic regions and candidate genes can henceforth be given more attention in future genetic analyses of complex traits in Shandong indigenous chickens. We identified 12 735 differential outliers between these breeds using the Di method,and based on these outliers,found 115 breed-specific SNPs. Using machine learning methods,we then classified the chickens as either LB chickens or other chicken breeds (2-class classification) based on the breed-specific-SNP profiles of their genomic samples,and on this basis,identified 18 SNPs that enabled the correct classification of the chicken breeds in this study.
This comparative population genomic analysis was anchored on genome-wide SNPs of the RW chickens and Shandong indigenous chickens. Population structure analysis revealed an overall distinctive genomic architecture of the Shandong indigenous chickens(PCA and neighbor-joining phylogenetic tree). Notably,the clustering pattern is consistent in the PCA,and ADMIXTURE and remarkably mirrors the geographical distribution of the Shandong indigenous chickens. The LB,JB,LG,and LY populations divided into central clusters.This sub-structuring may be reflective of some degreet of differential exchange of genetic materials in neighboring locations,breeding histories,or natural and artificial selection drivers as previously described in several chicken populations (Yiet al.2014;Wanget al.2017).This explains the existence of genomic grouping among populations with close phenotypic appearances (SG and LB,JB and LY chickens). A crucial point to note is the admixture signals at K=4 in the ADMIXTURE analysis.The results of the PCA and ADMIXTURE analysis (K=4)suggest that the LB chickens have a close relationship with the JB,LG and WB cluster,consistent with their geographical proximity. However,it is inexplicable that LY chickens share a dominant ancestry component at K=4,suggesting the possibility that the LY chicken had a wider geographical distribution in earlier years.
In this study,the phenotypes of biological characteristics included feather color,body size,age at first birth,and body weight. The results clearly revealed that these traits varied from breed to breed and showed distinct differences. TheANK2,SRIRE2,andTSHRgenes are well known for their key roles in reproductive performance.Asymmetric oocyte division is essential for fertility (Leaderet al.2002),SPIRE2plays a key role in asymmetric division of mouse oocytes,and the mRNA levels ofSPIRE2are significantly higher in oocytes than in other tissues (Pfenderet al.2011;Zhanget al.2020). Recent studies have shown that prevention ofANK2translation leads to abnormalities in oocyte cytokinesis (Tetkovaet al.2019). Also,TSHRhas been identified as a putative domestication gene in chicken (Grottesiet al.2020). A strong selective sweep onTSHRin domestic breeds together with the significant effects of a mutation in this gene on several domestication related traits,indicate that the gene has been important for chicken domestication.TSHR signaling between theparstuberalis,of the pituitary gland,and ependymal cells in the hypothalamus regulates photoperiod control of reproduction in birds and mammals(Yoshimuraet al.2003;Hanonet al.2008;Nakaoet al.2008). In addition,TSHRplays a key role in the signal transduction of seasonal reproduction (Grottesiet al.2020). Variations in egg production traits exist amongst Chinese indigenous chicken breeds and commercial egg-laying lines,such as age at first egg (AFE) between successive laying. For example,JB chicken (Chinese indigenous chicken breeds) hens lay the first egg usually at~110 days,much earlier than commercial laying lines(Kanget al.2012). TheANK2,SRIRE2andTSHRwere found to generate independent selection signals in JB chickens,suggesting that these genes may play a key role in regulating egg-laying performance.
Body height,body length,and body weight are 3 component phenotypes of body size,and many candidate genes were identified more than once.RAC1,MGAT4C,andEPB41L1were identified as related genes as mentioned in the result section. Rac1,a member of the small Rho GTPase family,plays multiple cellular roles. Studies in mice conditionally lacking Rac1 have revealed crucial roles for Rac1 in various tissues,including cartilage and limb mesenchyme,where Rac1 loss produces dwarfism and long bone shortening(including body weight,bone length,and growth plate disorganization) (Suzukiet al.2017;Ikehataet al.2018).MGAT4Cwas identified as a candidate for the growth traits (de Araujo Netoet al.2020;D’Alessandroet al.2020). Related studies revealed that the body weight ofEPB41L1knockout mice was reduced by 3 to 4 folds compared to their wild-type littermates for both males and females at 3-5 weeks of age and reduced by 2 to 3 folds at 6-9 weeks of age. Notably,while there were no significant differences in organ/body weight ratio for most of the organs,the testis/body and ovary/body ratios were dramatically decreased inEPB41L1knockout mice (Wang Het al.2020). LM chickens had low body and muscle growth rates,leading to a lower body weight compared to the other Shandong native populations and commercial line. Identifying the genetic changes underlying these developmental differences would provide new insights into the biological mechanisms by which genetic variation shapes phenotypic diversity (Rubinet al.2010). Our results provide evidence for major differences in body size between the mini chickens and other chickens.
Gamecock chickens are one of the earliest recorded birds in China and have accumulated some unique morphological and behavioral signatures,such as large body size,muscularity and aggressive behavior,making them excellent breeding material and a good model for studying bird muscular development and behavior (Luoet al.2020). One breed of gamecock chicken,namely the LG chickens might have played a central role in recent breeding and conservation of other Chinese gamecock chickens (Luoet al.2020). In this study,the genesAGMO,ADGRL3,andVSTM2Awere identified in the LM gamecock chicken breed.AGMOmight be crucial for determining the behavioral pattern,in gamecock chickens.ADGRL3knockout mice showed increased locomotive activity across all tests and subtle gait abnormalities.These mice also show impairments across spatial memory and learning domains,alongside increased levels of impulsivity and sociability with decreased aggression(Mortimeret al.2019).
Machine learning has been applied successfully in many areas of biology and medicine,often to produce effective predictors. Machine learning is poised as a transformational approach uniquely positioned to identify biological interactions for better prediction of complex traits. Various applications of machine learning in biology have been reported,including gene finding (Mathéet al.2002),eukaryote promoter recognition (Wonet al.2004),protein structure prediction (Yi and Lander 1993),pattern recognition in microarrays (Piroozniaet al.2008),gene regulatory response prediction (Middendorfet al.2004),protein/gene identification in databases (Zhouet al.2005),and gene expression microarray-based cancer diagnosis and prognosis (Hajilooet al.2013;Kourouet al.2015).In this study,we considered a way to learn a predictor (to confirm whether the given samples belong to the Laiwu Black chicken breed),for a dataset that specifies all available SNPs for each chicken.
The widely recognized “curse of dimensionality”phenomenon is inherent in SNP data,i.e.,resulting in higher feature size compared to sample size (Xuet al.2011). One way to circumvent the curse of dimensionality is to integrate various machine learning algorithms.The general approach is to employ a feature selection method on the SNP data before classifying samples using machine learning-based classifiers. For example,a SVM algorithm was used to select relevant SNPs associated with breast cancer (Miethet al.2016). A SVM algorithm along with a Naive Bayes algorithm and decision trees have also been used to identify breast cancer cases using SNPs selectedviainformation gain (Listgartenet al.2004). Furthermore,mean difference calculation and k-nearest neighbor (KNN) have also been used to quantify SNP relevance and to perform classification tasks on the breast cancer dataset (Hajilooet al.2013),respectively.In this study,we used the selective sweep to select the SNPs and the SVM model to produce an effective predictor. In addition,the area under the curve (AUC)was used to measure the capability of the predictor.The AUC-ROC curve is a measure of the performance of a machine learning classifier model. The ROC is a probability curve while the AUC represents the capability of the model to distinguish among classes. The ROC curve is constructed by plotting the true positive ratevs.false positive rate. In this study,we used an integrated approach consisting of a feature selection step to identify the SNPs highly associated to recourse characteristics,and a classification step to predict chicken breeds. The results with 20 replicates showed that the prediction accuracy (AUC) was 0.92±0.02,which can effectively achieve the breed identification of LB chickens.
We have characterized the genome diversity,genetic differentiation,population structure and migration events,across the 160 chickens from 9 populations,and reidentified the selective signatures in Shandong indigenous chickens. Importantly,we found thatAGMOmight be crucial to determine the behavioral pattern of gamecock chickens,TSHRmight be essential for the age at first egg-laying in Jining Bairi chickens,andEPB41L1might be associated with body size in Luxi Mini chickens.Moreover,we also developed machine learning-based predictive models,which use SNP data to identify the chicken breeds. To a large extent,we have shown how a machine learning approach can improve the efficiency of derived complex biological data obtained from next generation sequencing platforms. Together,these results can facilitate conservation of the canonical Shandong indigenous breeds,and the genetic basis of indigenous chickens revealed in this study is valuable for us to better understand the mechanisms underlying the resource characteristics in chicken.
Acknowledgements
This research was funded by the China Agriculture Research System of MOF and MARA (CARS-41),the Agricultural Breed Project of Shandong Province,China (2019LZGC019 and 2020LZGC013),the Shandong Provincial Natural Science Foundation,China(ZR2020MC169),and the Agricultural Scientific and Technological Innovation Project of Shandong Academy of Agricultural Sciences,China (CXGC2022C04 and CXGC2022E11).
Declaration of competing interest
The authors declare that they have no conflict of interest.
Ethical approval
The work was approved by the animal experiments were approved by the Science Research Department of the Shandong Academy of Agricultural Sciences,China(approval number: SAAS-2019-029).
Appendicesassociated with this paper are available on https://doi.org/10.1016/j.jia.2022.11.007
Journal of Integrative Agriculture2023年7期