• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      Whole genome variants across 57 pig breeds enable comprehensive identification of genetic signatures that underlie breed features

      2021-06-22 04:33:54JingyaXuYuhuaFuYanHuLilinYinZhenshuangTangDongYinMengjinZhuMeiYuXinyunLiYangZhouShuhongZhaoandXiaoleiLiu

      Jingya Xu,Yuhua Fu,Yan Hu,Lilin Yin,Zhenshuang Tang,Dong Yin,Mengjin Zhu,Mei Yu,Xinyun Li,Yang Zhou,Shuhong Zhaoand Xiaolei Liu

      Abstract Background: A large number of pig breeds are distributed around the world, their features and characteristics vary among breeds,and they are valuable resources. Understanding the underlying genetic mechanisms that explain across-breed variation can help breeders develop improved pig breeds.Results: In this study,we performed GWAS using a standard mixed linear model with three types of genome variants(SNP,InDel,and CNV)that were identified from public,whole-genome,sequencing data sets.We used 469 pigs of 57 breeds,and we identified and analyzed approximately 19 million SNPs,1.8 million InDels,and 18,016 CNVs.We defined six biological phenotypes by the characteristics of breed features to identify the associated genome variants and candidate genes,which included coat color,ear shape,gradient zone,body weight,body length,and body height.A total of 37 candidate genes was identified,which included 27 that were reported previously(e.g.,PLAG1 for body weight),but the other 10 were newly detected candidate genes(e.g.,ADAMTS9 for coat color).Conclusion:Our study indicated that using GWAS across a modest number of breeds with high density genome variants provided efficient mapping of complex traits.

      Keywords:Breed feature,CNV,GWAS,InDel,Pig,SNP

      Background

      With more than one billion individuals alive at any time,the domestic pig (Sus scrofa domestica) is one of the most important livestock animals in the world. Domestication was accomplished well before 9,000 years ago [1,2], and a wide range of breeds were developed to meet various demands of human beings. Both domestication and artificial selection led to rapid phenotypic changes and resulted in distinct features and characteristics among breeds.

      With different domestication and breeding goals, a large number of pig breeds became distributed throughout the world, and some of their characteristic features varied from one breed to another. These characteristics fall broadly into two categories, one is production features, which include growth, reproduction, carcass and meat quality, and the other consists of biological characteristics,which include body shape, appearance, and coat color. The Pietrain pig breed, which is native to Belgium,was bred as a lean pig breed and has the outstanding characteristic of high carcass lean meat percentage; the Iberian pig breed, which is native to Spain, was bred to provide high quality ham meat, but its carcass lean meat percentage is relatively low. The commercial pig breeds that include Duroc, Landrace, and Yorkshire, which are native to Europe and North America, were bred to have a large body size and exhibit excellent meat production;the Gottingen Minipig breed, which is native to Germany, was bred for biomedical research and has small body size. Coat color was always recognized as a breeding criterion, and it has been selected to satisfy breeders and customers (e.g., the coat colors of European wild boar, Duroc, Pietrain, and Yorkshire are black, brownish red, spotted,and white, respectively).

      During a long period of domestication and selection,each breed formed its own production features, biological characteristics, and breed-specific genome variants features. Understanding the underlying genetic mechanism of the breed characteristic features could help breeders to develop the ideal pig breed. A genomewide association study (GWAS) was used to detect the associations between genomic variations and phenotypic records over the past 15 years [3]. To date, many genome variants, such as single nucleotide polymorphism (SNP), insertion-deletion (InDel), and copy number variation (CNV), were associated with the traits of characteristic features. The duplication of theKITgene led to the dominant white coat color [4, 5], QTLs on chromosomes 1, 5, 7, 9, and 12 were significantly associated with ear erectness and ear size [6]. InDels also showed their function by regulating gene translation or by changing the protein structure, and it is related to many traits, such as obesity [7]. Single nucleotide polymorphism (SNP) is the most commonly used type of genome variant in GWAS. However, most trait-associated SNPs reported in previous GWAS studies were found in the non-coding area and explained a small proportion of phenotypic variance [3, 8]. In contrast to SNP, insertions and deletions (InDels) are the genome variants that vary from 1 bp to 10,000 bp in length. A number of InDels are located in the coding exons and promoters of genes,and they can easily change the coding sequence of proteins that change phenotypes, especially those that cause frameshifts [9]. Copy number variation (CNV), which range from 1,000 bp to as much as a mega base, is defined as the varied copy numbers compared with the reference genome and significantly influences gene expression due to its regulation of the dosage [10, 11].CNVs usually cause significant phenotype variation due to their influence on the levels of gene expression, and the level of gene expression increases with an increase in gene copy number [12]. CNVs play an important role in some human diseases [13], growth traits of cattle [14],and fatness in pigs [15]. Therefore, combining these three types of genome variants can improve the efficiency in identifying candidate genes for target phenotypes.

      Using the publicly available, whole-genome sequencing datasets, a total of 469 pigs that composed of 57 breeds were used in this study. Approximately 19 million SNPs,1.8 million InDels, and 18 thousand CNVs were identified. We defined six biological phenotypes by breed characteristic features, which included coat color, ear shape, gradient zone, body weight, body length, and body height. In addition, we also tried to define several production traits, which included sexual maturity, dressing percentage, and lean meat percentage. A mixed linear model (MLM) was fitted to identify the associations between genome variants and phenotypes. By using the GWAS method with public sequencing data, our results revealed that a number of genome variants and candidate genes associated with breed characteristic features have been selected during domestication and breeding.

      Methods

      Data collection

      A total of 469 WGS-seq samples were downloaded from NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra/) and The European Nucleotide Archive(ENA, https://www.ebi.ac.uk/ena), 465 of them with clearSus scrofabreed information (Table S1).

      Identification of short variants from WGS data

      At first, the raw data was converted to fastq files using SRAToolkit (V2.8.2) [16], and the fastq files were trimmed by removing adapters and low-quality bases using Trimmomatic (v0.36) [17]. After conversion and trimming, the remaining high-quality reads were aligned against the Sscrofa11.1 reference sequence using Burrows-Wheeler Aligner 0.7.17 (BWA) [18]. The uniquely aligned reads with ≤5 mismatches were used for detection of short variants.

      To obtain highly confident short variants, we employed GATK (V4.0.3.0) [19] variant calling pipelines,according to the GATK best practice online documentation. The SNPs were filtered using ‘QUAL <30.0 || QD<2.0 || FS >60.0 || MQ <40.0 || SOR >4.0 || ReadPos-RankSum <-8.0’ options, and Indels were filtered using‘QUAL <30.0 || QD <2.0 || FS >200.0 || SOR >10.0 ||ReadPosRankSum <-20.0 || MQ <40.0 || MQRankSum<-12.5’ options. These parameters controlled not only the confidence of variation (QUAL), but also the quality of the variation in terms of read depth (QD), strand bias(FS, SOR), read bias (ReadPosRankSum), and mapping quality (MQ, MQRankSum). Finally, both SNPs and Indel VCF files were then filtered further using the vcftools with the use of the “--min-alleles 2 --max-alleles 2 --min-meanDP 5 --max-missing 0.75” options, which meant that only bi-allelic variations with depth ≥5 and a missing rate ≤25% were used for subsequent analysis.The InDels were coded as 0, 1, 2, in which 0 represented for “no insertion/deletion”, 1 represented for “insertion/deletion on one side”, 2 represented for “insertion/deletion on both sides”.

      Before calling CNVs, individuals with a call rate <95%were filtered, and a total of 432 individuals were retained for CNV calling. The CNVcaller was applied to detect the CNVRs (copy number variation regions) for all individuals with WGS data over 5X coverage according to its pipelines (https://github.com/JiangYuLab/CNVcaller)[20]. Briefly, the reference genome was divided into sliding windows that were 800 bp in length and with a step size of 400 bp. The read depth signal was calculated for each window and a GC correction was performed similar to the method used in the CNVnator [21]. To avoid the effects of individuals with different sequencing depths, the read depths were divided by the global median read depth. The CNVRs were integrated by scanning the population with aberrantly read depth, an allele frequency >0.1, and at least three homozygous gain/loss individuals for candidate CNV windows. Finally, the CNVRs were genotyped using Gaussian mixture models for individuals with a variation percentage <30% of the entire genome, which included 321 individuals in total.Finally, a total of 18,016 CNVs were left to be used in further analysis. According to the dosage effect theory,CNVs were coded as their copy numbers. It should be noted that if the copy number was >6, then they were categorized into 6. The first two principal components that derived from CNV data were plotted in Fig. S1 and indicated that the samples from the same groups clustered together which meant that they represented comparatively high reliability of the CNV calling.

      Phenotype definition

      Nine phenotypes were defined by the breed characteristic features in our study, which included coat color, ear shape, body height, body length, body weight, gradient zone, sexual maturity, lean meat percentage, and dressing percentage. The characteristic features of breeds were mainly from Animal Genetic Resource in China(Pigs), Wikipedia, published studies of breed genetic resources, and official websites for pig breeds. For each phenotype, breeds with no records of characteristic features were marked as missing. The nine phenotypes were defined as below (Table S2 and Supplementary Data 1):

      Coat colorwas defined into five categories, which included white, spotted, multi-colored (black and white),reddish brown, and dark brown/black and coded as 1, 2,3, 4, or 5, respectively. Breeds with various coat colors were defined as missing and coded as NA (Not Available).

      Ear shapewas defined into two categories, erect and non-erect, which were coded as 1 and 0, respectively.

      Body heightwas measured from the highest point of withers to the ground, and the unit was centimeter (cm).The body height of females and males for each breed was averaged.

      Body lengthwas measured from the middle point of ears to the tail, and the unit was centimeter (cm). The body length of females and males for each breed was averaged.

      Body weightwas defined into three categories. The breeds with mature body weight <100 kg were coded as 1, 100-200 kg were coded as 2, and >200 kg were coded as 3. The body weight of females and males for each breed was averaged.

      Gradient zonewas defined as a grey belt. In many two-end-black or belted breeds, there is a grey belt composed of black skin and white hair at the junction of a white body part and a black body part. We defined this belt as the gradient zone and coded it as 1 and 0, which represented that the gradient zone existed or did not exist, respectively.

      Sexual maturitywas defined as three categories,which included premature, medium, and late, and coded as 1, 2, or 3, respectively. Breeds with sexual maturity of 1-3 months were defined as premature, 4-6 months were defined as medium, and >6 months were defined as late maturing.

      Lean meat percentagereferred to the percentage of lean meat of a pig.

      Dressing percentagewas defined as the percentage of total weight after dressing at slaughter.

      Data quality control and imputation of SNPs and InDels

      Before doing GWAS, SNPs and InDels were filtered using PLINK1.90 [22]. SNPs and InDels with call rates<90% and minor allele frequencies (MAF) of <0.05 were excluded. After filtering, SNPs and InDels were imputed using Beagle5.0 [23] and then SNPs and InDels were filtered with MAF <0.05 again using PLINK1.90[22].

      Finally, a total of 469 individuals with 19,683,875 SNPs and 1,856,359 InDels were retained for subsequent data quality control based on the available individuals of each phenotype. As each phenotype had different breeds with missing values, the SNPs and InDels were then filtered with MAF <0.05 according to the available individuals of each phenotype using PLINK 1.90 (Table S4). It should be noted that two individuals were found that may have had incorrect breed information according to the PCA results and, therefore, the phenotypes of these two individuals were recorded as missing in this study.

      Genome-wide association study

      The association tests were performed by fitting a Mixed Linear Model using the following equation in the rMVP package:

      whereyrepresents the phenotype val ue, β represents the fixed effects, which includes the first three columns of principal components,arepresents one dosage genome variant vector to be tested, andurepresents the vector of random effects in the model that follows the distributionu~N(0,Gσ2α), in whichGwas a genomic relationship matrix derived from all the genome-wide SNPs only [24].X,V, andZrepresent the incidence matrices for β,a, andu, respectively, andeis the vector of residual errors. The Bonferroni multiple test was used to correct thePvalues, and the significant threshold was defined as 0.05/N, whereNrepresents the number of genome variants for each independent GWAS.

      Identification of candidate genes and enrichment analysis

      In this study, significant genome variants were annotated to the nearby genes within a distance of 1 Mb either downstream or upstream, and the gene annotation information was obtained from theSus scrofagenome version 11.1 (www.ensembl.org/biomart/). KEGG/GO enrichment analyses were performed by TB tools toolkit [25].CalculatedP-values of all GO/KEGG pathways were corrected by Bonferroni correction method, considering our data was collected from public resources, we employed a loose threshold thatP-values ≤0.05 to detect more significantly enriched GO terms/KEGG pathways for candidate genes.

      Fig.1 The geographical distribution,PCA plot,and phylogenetic tree of 57 pig breeds. a The geographical distribution of each major category of pig breeds.b The first two principal components derived from whole-genome SNP data. c The phylogenetic tree derived from the wholegenome SNP data. The pigs were clustered into five categories, which included Asian domestic(purple),Asian wild (orange), European domestic(green),European wild(black),and the breeds that do not belong to the above categories(blue)

      Results

      Summary of pig breeds,genome variants, and phenotypes

      We used 469 high-quality, whole genome sequencing datasets for an association study, which included 57 pig breeds and 9 phenotypes. The 57 pig breeds were distributed throughout the world, but they were mainly located in Asia and Europe (Fig. 1a). Both the first two principal components (PCs) and the phylogenetic tree derived from whole-genome SNP data indicated that the Asian domestic breeds (AD), Asian wild breeds (AW), European domestic breeds (ED),and European wild breeds (EW) were clustered separately [26] (Fig. 1b, c, Table 1). The phenotypes were defined by the breed features and characteristics, and the information came from Wikipedia, published genetic articles, the pigs recorded of China, which is an official record, etc. All details are described in the Methods section and Table S2.

      A total of 17.93 Tb of high-quality sequencing data was aligned against the reference pig genome(Sscrofa11.1) using BWA (v0.7.17) [18]. To obtain high confidence genome variants, potential duplications were filtered for each individual; the average coverage for all individuals was approximately 93.58%, and the average depth was approximately 12.68×. The marker density plots showed that SNPs, InDels, and CNVs covered the genome well. The variations in chromosomes 1-18 and X were retained for analyses in this research, which included 469 individuals with 19,683,875 SNPs, 1,856,359 InDels, and 18,016 CNVs (Table S3). Rigorous quality control of data was conducted on the genome variants data for each phenotype (see Methods section for detail). The retained sample size and number of genome variants for further statistical analysis of each phenotype are shown in Table S4. The phenotypic distribution plots of six biological phenotypes, together with the schematic diagram of phenotype definition, are shown in Fig. 2.The rest of three production phenotypes are shown in Fig. S2.

      Table 1 Categories of pig breeds based on geographic location and domestication

      Summary of GWAS results

      In this study, association tests were conducted using an MLM in the rMVP package (https://github.com/xiaoleilab/rMVP). The MLM incorporated the first three columns of PCs as fixed-effect terms and the genomic relationship matrix (GRM) that represents the relationship among individuals as a random-effect term. For all three types of genome variants, both PCs and GRM were derived from the whole-genome SNP data.P-values were corrected by the Bonferroni multiple test method, which was defined by 0.05/N, whereNrepresents the number of genome variants for each independent GWAS.

      The correctedP-values of three genome variants types were plotted simultaneously in a single Manhattan plot for each phenotype (Fig. 3). Manhattan plots for all the phenotypes are shown in Fig.S3-S11,and all the lambda values of GWASs are shown in Table S5. The estimated variance components are shown in Table S6 [27].A total of 724 SNPs, 111 InDels, and 12 CNVs were identified to be significantly associated with six biological phenotypes (Supplementary Data 2). The gene enrichment analysis was carried out to further identify the candidate genes that were located in the candidate regions, which were defined as the genomic region that was located within 1 Mb upstream or downstream of significant genome variants (Table 2,Supplementary Data 3-11).

      To track the excellent genetic resources of candidate regions, individuals of modern commercial breeds of Yorkshire, Landrace, Duroc, Pietrain, and Hampshire were separated from European domestic breeds and recognized as the fifth category, which we named commercial lines.

      Coat color

      In this study, coat color was divided into five categories by the shades of black, which included white, spotted,multi-colored (black and white), reddish brown, and dark brown/black (for details, see Methods section). A total of 53 SNPs, 10 InDels, and four CNVs were associated with coat color after Bonferroni multiple corrections (Fig.3, Table 2, Supplementary Data 2, Fig. S3).

      Fig.2 Marker density plots of three types of genome variants,phenotypic distribution plots,and schematic diagrams for six phenotypes of pig biological characteristics. The top row displays the marker density plots of SNP,InDel,and CNV,and the window size for counting genome variants was set to 1 Mb.Legends on the right side display the number of markers for each window with different colors.The histograms show the phenotypic distribution where the X axis represents the phenotype values and the Y axis represents individual counts,and schematic diagrams that depict the six phenotypes

      All three types of genome variants near theKITgene on chromosome 8 were significantly associated with coat color, in which a significant CNV ranges from 41,310,801 bp to 41,774,400 bp on chromosome 8 coversKITgene. Three breeds of European domestic pigs, which included Leicoma, Middle White, and Calabrese, shared the same SNP/InDel genotype of the most significant signal with Yorkshire and Landrace in the commercial line. Interestingly, only Leicoma and Middle White breeds shared the same CNV genotype with Yorkshire and Landrace, and the coat color of Calabrese was black.There are normally two copies of this CNV in Calabrese,but the number of copies in Leicoma, Middle White,Yorkshire, and Landrace breeds were duplicated (>3 copies) (Fig. S12). This suggests that the CNV may be causal for coat color instead of SNP and InDel, and this has also been proved in previous studies [4].

      Ear shape

      Ear shape was classified as erect or non-erect. Two candidate regions on chromosome 5 and 7, which included 35 SNPs, five InDels, and one CNV, were significantly associated with ear shape(Fig. 3,Table 2,Supplementary Data 2,Fig.S4).

      For the significant CNV that was located in the candidate region on chromosome 5, all Asian wild breeds, European wild breeds, and commercial lines of Yorkshire, Duroc, Pietrain, and Hampshire were normal (two copies), but most Asian domestic breeds and European domestic breeds were duplicated (>3 copies). In particular, all individuals of Bamei and Min breeds had more than five copies,and their ear sizes were large, which indicated that more copies may result in larger, non-erect ears(Fig. S13).

      Fig.3 Manhattan plots and Quantile-Quantile(QQ)plots for six phenotypes of pig biological characteristics. There were three types of genome variants plotted in Manhattan plots,in which blue triangles represent CNVs,green diamonds represent InDels,and purple circles represent SNPs.Only the genome variants with genome-wide significant P-values were plotted in Manhattan plots,and the P-values were corrected by the Bonferroni cutoff,which was defined by 0.05/N,where N represents the number of genome variants for each independent GWAS.For QQ plots,the X-axis represents the expected-log10(P),and the y-axis represents the observed-log10(P)

      Table 2 Summary of the identified candidate regions and important genes of six biological phenotypes. For each phenotype,the candidate genes shown in bold were newly identified in this study while the un-bolded were reported previously

      Body height/body length/body weight

      Body size is composed of body height, body length, and body weight. Body height is defined as the mean value of female records and male records for each breed, and the same is true for body length. Mature body weights were divided into three categories, which included small (<100 kg), medium (100-200 kg), and large (>200 kg). A total of 11 SNPs and one InDel were associated with body height (Fig. 3, Supplementary Data 2, Fig. S5), 12 SNPs, seven InDels, and four CNVs were associated with body length (Fig. 3, Supplementary Data 2, Fig. S6), and a total of 30 candidate regions, which contained 452 significant SNPs, 53 InDels, and one CNV were associated with body weight (Fig.3, Supplementary Data 2, Fig. S7).

      The haplotypes of body size-related candidate regions in commercial lines were mainly the same, and the genotypes came from both European breeds and Asian breeds (Fig. S14-S16). The candidate region that was associated with both body height and body length on chromosome 3, the body length-related candidate region on chromosome X, and the body weight-related candidate region on chromosome 4 originated from European wild breeds. Also, the candidate regions for body weight on chromosomes 1 and X of commercial lines shared the same haplotype with Asian wild breeds. The haplotype plots for candidate regions of body size suggested that domestication in Europe and Asia was relatively independent. However, the large body size of commercial lines was formed by the genetic resources from both European breeds and Asian breeds.

      Gradient zone

      Gradient zone is the grey “belt” between the black and white blocks in pigs. This phenotype was divided into two categories designated one or zero, which indicated that the gradient zone existed or did not exist, respectively. Several candidate regions that contained 161 SNPs,35 InDels, and two CNVs were detected (Fig. 3, Supplementary Data 2, Fig. S8,Fig.S17).

      Results for the production traits

      In addition to the six biological traits, we also tried to define three production traits that included sexual maturity, lean meat percentage, and dressing percentage.Details of the definition can be found in the Methods section. We found a total of seven previously reported candidate genes and no new candidate genes were identified in this study (Table S7, Supplementary Data 2, Fig.S9-S11).

      Functional enrichment

      The Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway and Gene Ontology (GO) enrichment analyses were performed on genes located in the candidate regions. The bubble plots of the top 10 KEGG pathways and GO terms enriched for each phenotype were shown in Fig. S18-S26.

      Coat color

      A total of 68 genes located in eight candidate regions were identified as candidate genes for coat color (Supplementary Data 3). In the enrichment analysis, all candidate genes were involved in 150 KEGG pathways, and 23 of them were significant, in which some were directly related to coat color [e.g., EGFR tyrosine kinase inhibitor resistance (P=8.06×10-3) and Melanoma (P=3.34×10-2)]. In total, 416 significant GO terms were identified and some were pigment-related and tryosine-related,such as peptidyl-tyrosine phosphorylation (P= 4.28×10-5), regulation of developmental pigmentation (P=4.41×10-4), pigment cell differentiation (P=1.38×10-3), and melanocyte migration (P=5.28×10-3)(Supplementary Data 3, Fig. S18). Within these related terms, the KIT proto-oncogene receptor tyrosine kinase(KIT), platelet derived growth factor receptor alpha(PDGFRA), kinase insert domain receptor (KDR), and platelet derived growth factor C (PDGFC)genes were reported previously as candidate genes for coat color in pigs. In addition, the Rap guanine nucleotide exchange factor 2 (RAPGEF2), ADAM metallopeptidase with thrombospondin type 1 motif 9 (ADAMTS9), Ataxin 7(ATXN7), chondroitin sulfate proteoglycan 4 (CSPG4),and pseudopodium enriched atypical kinase 1 (PEAK1)were new candidate genes [28](Fig.4, Table 2).

      Ear shape

      A total of 65 genes was detected in the candidate regions for ear shape (Supplementary Data 4). Enrichment analysis showed that some GO terms were related to ear shape, such as post-embryonic animal organ morphogenesis (P=2.92×10-2) and post-embryonic animal morphogenesis (P=4.35×10-2) (Supplementary Data 4,Fig. S19). However, genes within these terms, which included TEA domain transcription factor 3 (TEAD3),tubby like protein 1 (TULP1), glutamate receptor interacting protein 1 (GRIP1), and BCL2 antagonist/killer 1(BAK1) have not been associated with ear shape before,althoughGRIP1andTULP1were associated with ear size in pigs. In addition, several previously reported earrelated genes were observed in these candidate regions,such as methionine sulfoxide reductase B3 (MSRB3),high mobility group AT-hook 2 (HMGA2), WNT inhibitory factor 1 (WIF1), LEM domain containing 3(LEMD3),and peroxisome proliferator activated receptor delta (PPARD) (Table 2).

      Body height

      In total, 99 candidate genes in candidate regions were involved in the enrichment analysis. Twelve KEGG pathways were significant from 199 KEGG pathways in total(Supplementary Data 5). There were 475 significant GO terms, of which some were skeletal-related, such as positive regulation of skeletal muscle tissue regeneration(P=2.21×10-2), skeletal muscle cell differentiation (P=4.42×10-2), and skeletal muscle satellite cell proliferation (P=4.38×10-2) (Supplementary Data 5, Fig. S20).A KEGG pathway named Hippo signaling pathway -multiple species may be related to body height, in which the TEA domain transcription factor 3 (TEAD3) and tubby like protein 1 (TULP1) were enriched. The peroxisome proliferator activated receptor delta (PPARD) gene was involved in all three GO terms and was recognized as a candidate gene. In addition, the high mobility group AT-hook 1 (HMGA1), signal peptide, CUB domain and EGF like domain containing 3 (SCUBE3), glutamate metabotropic receptor 4 (GRM4), nudix hydrolase 3(NUDT3), ribosomal protein S10 (RPS10), protein kinase C and casein kinase substrate in neurons 1 (PACSIN1),and SAM pointed domain-containing Ets transcription factor (SPDEF) genes were also identified as candidate genes, as reported previously (Table 2).

      Fig.4 Coat color-related significant GO terms and KEGG pathways with candidate genes.a Significant GO terms for coat color.One overlapping gene is described by a line between each pair of two terms, where the more overlapping genes there are,the wider the lines are. The P-value of each term was represented by the color of each circle,and the legend showed the levels of P-values.b Heatmap of candidate genes in the coat color-related significant GO terms and KEGG pathways.The candidate genes that were identified in the coat color-related GO terms and KEGG pathways are shown at the bottom of the heatmap.If the gene was included in any GO term or pathway,it was plotted in yellow; otherwise, it was plotted in blue

      Body length

      A total of 223 genes in candidate regions were used to perform the enrichment analysis(Supplementary Data 6).The results revealed that 17 of 245 KEGG pathways were significant,and 458 GO terms were also significant,in which some were skeletal-related [e.g., positive regulation of skeletal muscle tissue regeneration (P=4.02×10-2)] (Supplementary Data 6, Fig. S21). There were 61 overlapping genes between body length results and body height results. In these candidate regions, protein kinase C epsilon (PRKCE), high mobility group AT-hook 1(HMGA1), and signal peptide, CUB domain and EGF like domain containing 3 (SCUBE3), glutamate metabotropic receptor 4 (GRM4), nudix hydrolase 3 (NUDT3),ribosomal protein S10 (RPS10), protein kinase C and casein kinase substrate in neurons 1 (PACSIN1), and SAM pointed domain-containing Ets transcription factor(SPDEF) genes were identified that may affect body length (Table 2).

      Body weight

      We detected 366 genes in the candidate regions for body weight. Enrichment analysis showed that 31 of 269 KEGG pathways were significant, and 446 GO terms were also identified as significant(Supplementary Data 7,Fig. S22). Among the significant pathways, some were fat-related and may affect body weight, such as the PPAR signaling pathway (P=1.61×10-8), Fat digestion and absorption (P=3.38×10-5), regulation of lipid storage (P=1.54×10-2), and positive regulation of lipid storage (P=3.84×10-2) (Supplementary Data 7). According to the enrichment analysis results, the nuclear receptor subfamily 6 group A member 1 (NR6A1), alkB homolog 7 (ALKBH7), PLAG1 zinc finger (PLAG1), ligand dependent nuclear receptor corepressor like(LCORL), phospholipase C beta 4 (PLCB4), acyl-CoA synthetase long chain family member 4 (ACSL4), and par-3 family cell polarity regulator beta (PARD3B) were candidate genes for body weight (Table 2).

      Gradient zone

      A total of 1,036 genes were detected in candidate regions for the gradient zone from enrichment analysis.Seventy one of 341 KEGG pathways were significant,and 686 GO terms were significant (Supplementary Data 8,Fig.S23).Some significant pigment-related terms may relate to the gradient zone, such as melanoma (P=1.21×10-4), melanogenesis (P=1.87×10-3), melanocyte differentiation (P=1.03×10-2), and pigment cell differentiation (P=2.19×10-2) (Supplementary Data 8).Among the genes enriched in these terms, the KIT proto-oncogene receptor tyrosine kinase (KIT), platelet derived growth factor receptor alpha (PDGFRA), kinase insert domain receptor (KDR), endothelin receptor type B (EDNRB), dopachrome tautomerase (DCT),microphthalmia-associated transcription factor-like(MITF), and kit ligand precursor (KITLG) were candidate genes, and all of them were associated with coat color or pigmentation.

      Enrichment results for production traits

      Enrichment analyses were performed on the results of production traits(Supplementary Data 9-11). Seven previously reported candidate genes were found, such as protein kinase D1 (PRKD1)for sexual maturity [29].

      Discussion

      In our research, we used the GWAS approach with a wide range of pig breeds to reveal the candidate genes that underlie breed characteristic features. A total of 469 pigs from 57 pig breeds with abundant genetic diversity and phenotypic diversity, which are distributed mainly in Europe and Asia, were used in this study. Nine phenotypes were defined for each breed based either on biological characteristics or production performance obtained by the information from published articles, Wikipedia, the pig breeds recorded for China, etc. (Table S2).To our knowledge, this is the first genotype-phenotype association study that used high quality whole genome variants, which included SNPs, InDels, and CNVs, in a wide range of pig breeds.

      There are mainly two limitations in this study. First,with the breeding process, the phenotypic values of some breeds may be different from the latest records,which could create bias, especially for the production traits. Second, in the association test results of some traits, e.g. coat color and ear shape, more than two types of genome variants hit the same genome region and high Pearson correlation were detected among them. The regions that included more than two types of genome variants may lead to false positives because the SNPs, InDels, or the duplicated sequences may affect mapping accuracy; however, this is an inevitable general problem that lacks an appropriate solution.Furthermore, while using breed means as phenotype makes no variation in a single breed, however, this can make it possible for detecting important genes that affect phenotypes among different breeds. In addition,there are also some other methods that can be used to identify the genetic differences among multiple breeds,such as the fixation index (Fst) [30] and EigenGWAS[31]. Besides, the genetic mechanisms of some phenotypes, e.g., coat color and ear shape, might be affected by non-additive genetic systems, such as recessive alleles and/or genetic interaction. Therefore, fitting both additive and non-additive genetic effects in the model might be more suitable for those traits.

      Involving many breeds in a single GWAS research could help to identify the genome variants that were domesticated or selected. However, data on multiple breeds also presented a population stratification problem. To balance the number of false positives and false negatives, and especially for controlling the false positives, we incorporated the first three PCs as fixed effects and GRM as a random effect term in a mixed linear model to correct for the population stratification, following previous studies [32-34]. Although the confounding problem between the PCs and for GRM and the tested marker decreased the power to detect candidate genes that were associated with population structure, strong signals were still identified (e.g., the KIT gene for coat color). It should be noted that because the number of CNVs and InDels were far less than SNPs, the GRM was constructed by SNPs for the association tests of all three types of genome variants. Furthermore, highdensity genome variants covered the entire genome very well and reduced the dependency of linkage disequilibrium (LD) between genome variants and causal mutations to some extent, which increased the probability of success for GWAS.

      The candidate regions were defined as the genomic regions that were located within 1 Mb upstream or downstream of the significant genome variants.KEGG and GO analyses were conducted to identify candidate genes. Genes that were located in the candidate regions and fell into the phenotype-related KEGG pathways, GO terms, or previously reported to be associated with the objective phenotype were considered to be candidate genes.

      Candidate genes of biological characteristics

      In this study, the phenotypes of biological characteristics included coat color, ear shape, gradient zone, body length,body height,and body weight.It was obvious that these traits varied from breed to breed and showed distinct differences.KIT,PDGFRA,andKDRgenes are well known for their key roles in the underlying mechanisms of coat color in pigs. Among the candidate genes,KITon chromosome 8 was the key gene that led to the dominant white color in pigs because of the combined effect of a gene duplication and a splice mutation [4, 35].KITalso played a major role in coat color of other species,such as horses and cattle [36-38]. In our study, theKITgene was found in significant InDels and CNVs, which was consistent with previous studies. TheKITgene was reported to affect pigs’ dominant white with approximately 450-kb duplication encompassing the whole gene[5], our results hit the same genome region and identified a 463-kb duplication that contains the entireKITgene. ThePDGFRAgene on chromosome 8 was also associated with the dominant white color in pigs [39]. TheKDRgene is close toKITandPDGFRA, and it was a putative candidate gene for influencing coat color in cattle[40].KIT,PDGFRA,andKDRgenes comprise a classical cluster of tyrosine kinase receptor genes that are related to cattle’s reddish coat color [41].PDGFCandRAPGEF2genes were also involved in the related pathways withKIT,PDGFRA,orKDR. In these genes,PDGFCwas related with the two-end, black coat color in Bama Xiang pigs [42]. Related GO terms includedCSPG4,PEAK1,ADAMTS9, andATXN7(Supplementary Data 3).Among these genes,CSPG4andPEAK1were located in the candidate region near the significant CNV, and theRAPGEF2,PDGFC,ADAMTS9, andATXN7genes were found in the candidate regions near significant InDels.

      Twenty-seven genes were involved in gradient zonerelated KEGG pathways, such as melanoma and melanogenesis, which included many genes (Supplementary Data 8). Among these genes,MITFandEDNRBwere responsible for the two-end black trait in Chinese pig breeds [43, 44]. These two genes were identified in the InDel results, and an InDel marker withinEDNRBwas detected. Insertion of long interspersed element-1 (L1) in intron 3 ofMITFled to a lack of melanocytes [45]. Insertion of a retroposonlike element in intron 1 ofEDNRBcaused the piebald phenotype in mice [46]. In the related GO terms,such as melanocyte differentiation and pigment cell differentiation,KIT,KITLG,MITF,EDNRB, andDCTwere detected, in whichKITLGwas associated with the six white points in Berkshire pigs [47], andDCTaffected coat color in mice, which coded for diluted coat color [48]. In addition,DCTwas associated with human pigmentation in Asian populations [49].

      According to the enrichment analysis results for ear shape, several pathways and GO terms were related to ear development, which included theTEAD3,TULP1,GRIP1andBAK1genes within them (Supplementary Data 4). In addition, theHMGA2,PPARD,MSRB3,WIF1, andLEMD3genes were found in candidate regions and affected ear size and ear morphology in pigs;GRIP1was also associated with ear size in pigs [50-53].It is known that Wnt signaling regulated embryonic cartilage development and chondrocyte maturation [54],and thePPARDgene, which plays a key role in the Wnt/β-catenin pathway, affected ear size in pigs [55]. In a recent study, thePPARDgene inhibited the development of cartilage in external ears [56]. Interestingly, previous research on the same phenotypes in dogs also suggested thatHMGA2,WIF1, andMSRB3were candidate genes[57, 58]. CNV in theMSRB3gene played an important role in enlarging ear size in pigs, and it has an overlap region with the significant CNV in this study [59]. In a recent canine GWAS study,MSRB3was also associated with drop ears, but no CNVs were reported [34].

      Body height, body length, and body weight are three component phenotypes of body size, and many candidate genes were identified more than one time. In the candidate regions, 99 genes for body height and 223 genes for body length were found, in which 61 genes overlapped.TULP1andTEAD3were enriched in related pathway as said in result part. In a related GO term,positive regulation of skeletal muscle tissue regeneration,PPARDwas identified, which was associated with limb bone length and considered as a candidate gene for body height in pigs [60] and humans [61]. Among the 61 overlapping genes,GRM4,NUDT3,RPS10,PACSIN1,HMGA1,SPDEF, andSCUBE3were associated with body height and body length in pigs [60, 62].PRKCE,which was identified in both SNP and InDel, was associated with conformation traits in Danish breeds of pigs[63]. (Supplementary Data 5, Supplementary Data 6).

      For body weight, there were 366 genes detected in the candidate regions, and many of them were enriched in lipid-related pathways that may affect body weight, such as PPAR signaling pathway, Adipocytokine signaling pathway, Fat digestion and absorption, and Regulation of lipolysis in the adipocyte. The genes (e.g.,ACSBG2,ACSL4,C1QL3,TRMT2B,PLIN3,PLIN5,CD36,ACBD7,MAPKAP1,INSR,CYP7A1, andGNAI1) were included in these pathways (Supplementary Data 7). In these genes,ACSL4was associated with fatty acid metabolism,backfat thickness, and fatness [64-66]. Interestingly,ACSL4was also considered as a candidate gene for body size in dogs [67]. In the two related GO terms, regulation of lipid storage and positive regulation of lipid storage,ALKBH7,C3, andPLIN5were detected. A previous study reported that the deletion ofALKBH7caused obesity and led to an increase in body weight inMus musculus[68]. In pigs,ALKBH7affected lipid content because of the regulation of lysine [69], therefore, it was a candidate gene for body weight. Besides these genes that were involved in related GO terms and KEGG pathways,PLAG1,PLCB4,PARD3B,LCORL, andNR6A1were also considered candidate genes.PLAG1was strongly selected and associated with the growth and fatness-related traits of pigs [70, 71].LCORLwas a candidate gene for body size in many other species, which included horses and cattle [72, 73].PLCB4affected porcine growth traits [74, 75]. A comparative genomic study of pigs and humans reported thatPARD3Bmay be a candidate gene for obesity in humans [76].

      Our results indicated that combining the three types of genome variants could help to improve the efficiency of identifying candidate genes for phenotypes of interests. In some cases, candidate genes were identified by all three types of genome variants, such as theMSRB3gene in the results of ear shape, meanwhile, there were also some candidate genes were identified by the results of one or two genome variant types, e.g., theCSPG4gene, which associated with coat color, was identified by the CNV marker only; thePARD3B,which is a candidate gene for body weight, was identified by the InDel marker only.

      The origin of significant genome variants alleles

      To trace the origin of significant genome variants alleles,the important candidate regions, which were simultaneously detected in two or three genome variants types,were selected. For each candidate region, 20 genome variants on either side of the most significant genome variants that was marked with a red star were plotted,and different genome variants alleles were colored differently (Fig. S12-S17).

      Individuals were divided into Asian domestic, Asian wild, European domestic (except the commercial lines),European wild, and Commercial Lines, and the latter was composed of Yorkshire, Landrace, Duroc, Pietrain,and Hampshire. Some candidate genes revealed distinct differences between European pig breeds and Asian pig breeds. For body weight,PLAG1on chromosome 4 was a strong selected gene in European domestic pigs [70,71]; our results were consistent with this conclusion and suggested that the excellent genetic resources of commercial pig breeds due to thePLAG1gene may have come from European wild boars (Fig. S16). In addition,Jeju Black pigs and Songliao Black pigs shared the same genotype ofPLAG1with European domestic breeds.These pigs were also clustered closely to European domestic breeds, which suggested they may have originated from Europe or been crossed with European breeds.

      Conclusions

      In this study, we examined changes in diverse biological characteristics and production performances associated with genome variants across 57 breeds in pig. A total of 37 candidate genes was identified using the GWAS approach, which included 27 that were reported previously.The other 10 were newly detected candidate genes,which constitutes a promising resource for biologically and economically important genes for pig breeding.

      Supplementary Information

      Supplementary informationaccompanies this paper at https://doi.org/10.1186/s40104-020-00520-8.

      Additional file 1: Supplementary Data 1.Breed phenotypes.

      Additional file 2: Supplementary Data 2.Significant GWAS results for each phenotype.

      Additional file 3: Supplementary Data 3.Candidate regions and genes in these regions with KEGG and GO results for coat color.

      Additional file 4: Supplementary Data 4.Candidate regions and genes in these regions with KEGG and GO results for ear shape.

      Additional file 5: Supplementary Data 5.Candidate regions and genes in these regions with KEGG and GO results for body height.

      Additional file 6: Supplementary Data 6.Candidate regions and genes in these regions with KEGG and GO results for body length.

      Additional file 7: Supplementary Data 7.Candidate regions and genes in these regions with KEGG and GO results for body weight.

      Additional file 8: Supplementary Data 8.Candidate regions and genes in these regions with KEGG and GO results for gradient zone.

      Additional file 9: Supplementary Data 9.Candidate regions and genes in these regions with KEGG and GO results for sexual maturity.

      Additional file 10: Supplementary Data 10. Candidate regions and genes in these regions with KEGG and GO results for lean meat percentage.

      Additional file 11: Supplementary Data 11. Candidate regions and genes in these regions with KEGG and GO results for dressing percentage.

      Abbreviations

      SNP: Single nucleotide polymorphism;InDel: Insertion/Deletion; CNV:Copy number variation; GWAS: Genome-wide association study; MAF:Minor allele frequency

      Acknowledgements

      We thank all the researchers worldwide that have whole-genome sequenced pigs and made their data publicly available.

      Authors’ contributions

      XLL and SHZ designed the study. JYX, YHF,YZ, and YH performed the data collection and called the genome variants. JYX, LLY,ZST,and DY analyzed the data under the assistance and guidance of XLL, XYL, YZ,MJZ, MY, JYX,XLL, and YHF wrote the manuscript.All authors read and approved the final manuscript.

      Funding

      This work was supported by the Key Project of National Natural Science Foundation of China (31790414), the National Natural Science Foundation of China (31702087, 31902156, 31730089), Fundamental Research Funds for the Central Universities (2662020DKPY007, 2662017QD016),and the National Swine Industry Technology System (CARS-35).

      Availability of data and materials

      The datasets analysed during the current study are available in the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra/)under project PRJEB1683, PRJEB9115, PRJEB9326, PRJEB9922, PRJNA144099, PRJNA176189,PRJNA186497, PRJNA213179, PRJNA221763, PRJNA231897, PRJNA238851,PRJNA239399, PRJNA240950, PRJNA254936, PRJNA255085, PRJNA260763,PRJNA273907, PRJNA281548, PRJNA309108, PRJNA314580, PRJNA320525,PRJNA322309, PRJNA354435, PRJNA393920, PRJNA398176, PRJNA41185,and PRJNA418771.

      Ethics approval and consent to participate

      Not Applicable.

      Consent for publication

      Not Applicable.

      Competing interests

      All authors declare that they have no conflict of interest.

      Received: 11 May 2020 Accepted: 19 October 2020

      玛纳斯县| 抚远县| 临夏市| 北碚区| 徐闻县| 苗栗市| 山丹县| 潼关县| 遂宁市| 大余县| 江门市| 阿荣旗| 八宿县| 仁化县| 彝良县| 枣庄市| 沾化县| 饶河县| 东阿县| 博野县| 胶州市| 时尚| 通道| 阳曲县| 中阳县| 屯留县| 朔州市| 鄄城县| 玉门市| 牟定县| 吉木乃县| 竹山县| 鹤岗市| 长兴县| 北安市| 攀枝花市| 卫辉市| 浏阳市| 资溪县| 香港| 永顺县|