Dongdong Xu,Dn Sun,Ynling Dio,Minxun Liu,Ji Go,Bin Wu,Xingmio Yun,Ping Lu,Zongwen Zhng,Jing Zhng,Gnggng Guo,*
aKey Laboratory of Crop Germplasm Resources and Utilization,Ministry of Agriculture,National Key Facility for Crop Gene Resources and Genetic Improvement,Institute of Crop Sciences,Chinese Academy of Agricultural Sciences,Beijing 100081,China
bFood Crops Research Institute,Heilongjiang Academy of Agricultural Sciences,Harbin 150049,China
Keywords:Barley BSA-seq Reduced representative sequencing Pale-green Chlorophyllide a oxygenase
ABSTRACT Bulked-segregant analysis coupled with next-generation sequencing(BSA-seq)has emerged as an efficient tool for genetic mapping of single genes or major quantitative trait loci controlling(agronomic)traits of interest.However,such a mapping-by-sequencing approach usually relies on deep sequencing and advanced statistical methods.Application of BSA-Seq based on construction of reduced-representation libraries and allele frequency analysis permitted anchoring the barley pale-green(pg)gene on chromosome 3HL.With further marker-assisted validation,pg was mapped to a 3.9 Mb physical-map interval.In the pg mutant a complete deletion of chlorophyllide a oxygenase(HvCAO)gene was identified.Because the product of this gene converts Chl a to Chl b,the pg mutant is deficient in Chl b.An independent Chl b-less mutant line M4437_2 carried a nonsynonymous substitution(F263L)in the C domain of HvCAO.The study demonstrates an optimized pooling strategy for fast mapping of agronomically important genes using a segregating population.
Over the past two decades,forward genetics has been a robust strategy for identifying genes and mutations responsible for phenotypic variation via positional cloning or candidate-gene approaches.Bulked-segregant analysis(BSA)[1],a technology involving the pooling of genotypes from populations segregating for traits of interest,followed by bulk marker screening and analysis,has greatly simplified the process of marker identification for target traits,such as the photoperiod-thermosensitive genic male sterile gene ptgms2-1 in rice [2], the southern corn leaf blight resistance gene rhm in maize[3],and the powdery mildew resistance gene MlRE[4]and grain protein content(GPC)gene GPC-B1[5]in wheat.With the recent development of next generation sequencing(NGS)technologies,genome-wide variation analysis is becoming increasingly affordable for researchers[6].Nonetheless,whole genome resequencing technology for powerful analysis of genome-wide variation is still prohibitively costly for many researchers,especially when the target plant species possess large genomes and much repetitive DNA.Complexity-reduction methods employing multiplex sequencing include reduced representation libraries(RRLs)[7],restriction site-associated DNA sequencing(RAD-seq)[8],genotyping by sequencing(GBS)[9],complexity reduction of polymorphic sequences(CRoPS)[10]and specific-locus amplified fragment sequencing(SLAF-seq)[11].The costs of genotyping via these methods are minimized by reduced-representation library construction and sample multiplexing.
Single nucleotide polymorphism(SNP)discovery via NGS technologies allows simultaneous mapping and identification of causal regions,and is known as mapping by sequencing(MBS).The combination of BSA with NGS permits the simultaneous discovery of SNP markers and precise assignment of chromosomal regions responsible for phenotypic variation.MBS is widely used in various species,such as Arabidopsis thaliana[12],rice[13,14],maize[15],and barley[16,17].For example,several genes have been identified by MBS in barley,including the red/far-red light photoreceptor HvPHYTOCHROME C(HvPHYC)[16],the green-revertible albino gene(HvSGRA)[11],increased number of internodes(mnd)[17],and semi-dwarf(arie)[18].An approach called MutChromSeq[19]based on chromosome flow sorting and sequencing was developed to identify the barley Eceriferum-q gene and wheat Pm2 genes.Allele frequency estimation methods developed for BSA-seq data analysis include MutMap[13],QTL-seq[20,21],and SHOREMap[22,23].However,BSA-seq still presents challenges.Not only can the estimation of allele frequencies in each pool be corrupted by inadequate sequencing depth,reference-genome errors,and background noise introduced by heterozygous individuals during sample pooling,but also different inference methods,such as the generation of SNP-index plot regression lines[13]or a locally weighted scatterplot smoothing(LOWESS)regression[16],are required for causal gene mapping.
Leaf color variations in higher plants are common and readily observed.They may include albino,chlorina,stripe,virescent,pale-green,and zebra leaves[24].These chlorophyll-deficient phenotypes are caused by mutations in genes involved in chloroplast biosynthesis or chlorophyll metabolism[25].Chlorophyll biosynthesis from glutamyl-tRNA is accomplished by a series of 15 reactions catalyzed by corresponding enzymes[26].To date,several leaf mutant genes have been reported in barley,including Xantha-f[27],NADPH:protochlorophyllide oxidoreductase(POR)[28,29];chlorophyll a oxygenase(Cao)[30],and thermoinducible chlorophyll-deficient(vvy)[31].In the present study,we used homozygous phenotypic BSA pools in conjunction with reduced-representation sequencing to rapidly map a chlorophyll b biosynthesis gene.
The barley natural mutant pg shows an irreversible pale-green leaf color throughout its growth.It was observed in one singleseed descent F6progeny derived from a malting barley cross between Hongri1 and 86-38.A set of 263 F2progeny derived from its F6residual heterozygous mutant lines(RHML)were used for gene mapping.
The concentrations of chlorophylls in seedling leaves of pg and its wild-type sister lines were measured[32].A 0.5 g sample of leaf tissue was ground and chlorophylls were extracted with 25 mL 80%acetone for 48 h at 4°C in the dark.The extract was centrifuged at 3000×g for 30 min to remove residual plant debris.The supernatant solution was subjected to spectrophotometric absorbance measurement at 665 nm and 645 nm with 80%acetone as control.The concentrations of chlorophyll were calculated according to the following equations:
V=final volume of extract solution in 80%acetone which in this case is 25 mL;W=weight of fresh leaves extracted which is 0.5 g.
Pale-green(pg)and normal-green(NG)phenotypic pools were prepared with equal amounts of DNA from 28 F2plants.The pg pool was prepared by pooling pale-green plants directly,whereas for the NG pool only homozygous plants were pooled after F2:3family phenotyping.The modified CTAB method[33]was used for genomic DNA extraction.Complexity-reduced NGS library construction followed our previously described RRL approach[7].A Illumina HiSeq 2500 sequencer(Illumina,San Diego,CA,USA)was used for 2×100 bp paired-end sequencing.Raw reads were trimmed with Cutadapt 1.9.1[34]and aligned to the barley cv.Morex reference genome[35]with BWA MEM 0.7.13[36].SNP calling was performed with Samtools and Bcftool v1.3[37],and a custom Perl script,using a threshold of Q40 and 15-fold read depth.
The allele frequency of each phenotypic pool was represented by its SNP-index,calculated as the ratio of reads harboring the SNP that are different from the reference sequence.Δ(SNP-index)was defined as the difference between the pg pool and NG pool SNP indices.The Δ(SNP-index)value was plotted along the barley physical map with a 100 kb sliding window size and 10 kb shift.
The NG pool consisted of only homozygous F2individuals,whose wild-type alleles were expected to be homozygous within the mutant trait causal region.Likewise,the pg pool with recessive mutations was expected to be homozygous for the causal mutation and surrounding markers,whereas SNP loci not associated with the pale-green trait would not show this bias,owing to free recombination of mutant and wild type alleles.In such cases,after a heterozygosity scan is performed by calculating Δ(SNP-index)in sliding windows along the chromosomes,the mapping interval can be identified as an extended region with lowest heterozygosity.Not all variants can be detected,owing to the low genomic coverage of reduced-representation sequencing.To avoid off-target mapping,the two 5-cM regions flanking the highest homozygosity peak were defined as the primary mapping interval.
Total RNA was prepared from leaves of pg and wild-type plants using an RNA Prep Pure Plant kit(TIANGEN Biotech Co.,Ltd.,Beijing,China).The mRNAs(2.0 μg per sample)were used to synthesize first-strand cDNA,which was reverse transcribed from total RNA using random hexamer primers.
The pg gene primers were designed with Primer3[38](Table S1).PCR amplifications were performed with Pfu-DNA polymerase(TIANGEN).The temperature cycling program was 95 °C for 3 min,followed by 35 cycles(95 °C for 25 s,58 °C for 30 s,and 72 °C for 2 min),and a final extension at 72 °C for 10 min.PCR products were visualized on 1.0%agarose gels by ethidium bromide staining.
The pale-green phenotype can be observed in the pg mutant not only at the seedling stage(Fig.1-A),but also the other growth stages.No Chl b was detected in pg mutant lines,and the Chl a concentration of pg mutants was only half that of the NG sister lines,resulting in a 2/3 reduction in total chlorophyll in the pg mutant(Fig.1-B).The pg mutant also showed a marked decrease in plant height and 1000-kernel weight compared with the NG lines(Fig.1-C).Thus,the pg mutation affects the metabolism of both Chl a and Chl b,especially Chl b.
All the F1plants displayed the normal green leaf phenotype,and the segregation of the F2population fit a 3:1 ratio(normal green:pale-green=202:61,χ2=0.458<χ20.05(1)=3.841,P=0.498>0.05),suggesting that the pale-green phenotype is controlled by a single recessive nuclear gene.
For some time, it was uncertain whether Arnoldus would survive his gunshot wound. When he began to recover, Albertha went to retrieve20 the bulbs, knowing that seeing them would raise his spirits. She climbed over the fence and gasped21 with horror. An unstable22 wall had collapsed23 on the spot, covering the bulbs. It seemed impossible to the little girl that anyone could ever move the heavy slab24. Overwhelmed with sorrow, Albertha decided25 not to tell anyone what she had done.
Fig.1-Difference between the pg mutant and the wild-type.(A)Seedling leaves of pg and the wild-type(WT).Chl:chlorophyll;Scale bar:10 cm.(B)Pigment content in the leaves of the pg mutant and the WT.(C)Plant height.(D)1000-kernel weight.
Table 1-Statistics of sequencing data for both bulked DNA pools.
For RRL sequencing of the pg and NG pools,respectively 28.00 M and10.56 M paired-end reads were generated(Table 1).The proportions of paired-end(89.14%)and singleend(10.84%)mapped reads in the NG pool were slightly higher than those in the mutant pool(87.12%and 9.94%).A total of 454,843 SNPs were assigned by SNP calling.After read-depth and quality filtration,only 17,103 high quality SNPs remained in each bulk for subsequent SNP-index and Δ(SNP-index)calculation.In the visualization of Δ(SNP-index),two sharp peaks symmetrically distributed with respect to the x-axis were observed on the long arm of chromosome 3H(686-687 Mb),and one peak was also found on an unassembled pseudomolecule(chrUn:42.06-42.15 Mb)(Fig.2-A).Given that all SNPs with Δ(SNP-index)> 0.85 were concentrated in the 5 Mb regions flanking the peaks,the 682-692 Mb segment of chromosome 3H was assigned as the candidate mapping interval.
Fig.2-BSA-seq-based cloning and molecular characterization of the pg gene.(A)Allele frequency analysis and genetic marker validation in a mapping interval.If a marker was associated with the pale-green phenotype,the SNP-index in the pale-green(pg)and normal-green(NG)should deviate markedly from 0.5 and the Δ(SNP-index)should deviate from 0 and reach the maximum values of+1.0 or-1.0.Unlinked loci should not show this bias,and their Δ(SNP-index)should show a normal distribution about 0.Eleven polymorphic markers uniformly distributed in the 5-cM flanking regions of the highesthomozygosity peak on chromosome 3H and another three markers on the unassembled pseudomolecule were developed to confirm the mapping interval and candidate gene.Markers with black labels are SNPs and InDel markers.Markers with red labels are deletions(B)Micro-synteny analysis of the candidate region in barley,Brachypodium,and rice.(C)PCR analysis of CAO in mutant alleles and the corresponding NG lines.M:marker D2000;normal-green plants:NG-1 to NG-28;pale-green plants:pg-1 to pg-28.(D)RT-PCR analysis of CAO gene expression in the pg mutant and the NG lines.
Based on the preliminary mapping result for pg,seven SNPs and seven InDel markers uniformly distributed in the candidate gene map interval were developed to confirm the pg position(Table S2).The further recombinant analysis narrowed the mapping interval to 3.9 Mb(683.1-687.0 Mb)bounded by markers M-2 and M-7 on 3HL.Because markers M-3,M-12,and M-13,which cosegregate with the pg gene,were missing in the pg mutants,a micro-synteny analysis of barley genomic region(682.2-684.5 Mb),with Brachypodium distachyon and rice was performed(Fig.2-B).A chlorophyllide a oxygenase gene(CAO,Bradi2g61500.1)was found in this region.Because CAO mediates the conversion of Chl a to Chl b,the barley ortholog HvCAO(HORVU0Hr1G007360)was chosen as a pg candidate gene.
To investigate the cause of pale-green phenotype in the mutant,we attempted to amplify the candidate gene HvCAO from the pg mutant and the NG lines.The PCR results showed that HvCAO was absent from all pg F2progeny lines(Fig.2-C),and its expression could not be detected in pg lines(Fig.2-D).These results indicate that a deletion of CAO in the barley pg mutant is responsible for the pale-green phenotype.
To demonstrate the biological function of the HvCAO gene,we analyzed three EMS-induced mutants showing a phenotype similar to that of pg.Only one Chl b-less mutant(M4437_2)was identified by chlorophyll concentration measurement.The content of Chl a in M4437_2 was half that of the wild-type,as in the pg mutant.Although Chl b could be detected in M4437_2,the Chl a/b ratio of M4437_2 was 8.7(Fig.3-A).Sequencing of the full CAO coding region of M4437_2 revealed a single-nucleotide substitution changing the phenylalanine at amino acid position 263 to leucine(P263L)(Fig.3-B).This finding confirmed that the CAO gene was responsible for the conversion of Chl a to Chl b.
This study demonstrates an optimized pooling strategy for fast mapping a chlorophyll b synthesis-deficiency gene in barley (Hordeum vulgare L.) via BSA with reduced representation sequencing.Leaf color variations are the ideal trait for the validation of mapping methods for its readily observed phenotype.Notably,the segregating populations were derived from an RHML.Therefore it was suited for fine mapping in view of the homozygosity.Conventional MBS requires obtaining SNP-index plot regression lines of averaged SNP-index or the average P-value in Fisher's exact test for the SNPs in a sliding window analysis to map the causal mutation[13,39].In contrast,this optimized method pooled homozygous dominant individuals as its wild-type pool and the recessive individuals as its mutant pool.The use of homozygous phenotypic pools means that only read counting and Δ(SNP-index)are calculated,without the requirement of complex statistical algorithms,which powerfully enhanced the efficiency of mapping interval detection.The presence of two sharp symmetrical peaks in this study also provides double confirmation for accurate mapping interval estimation,avoiding false positives.The use of RRLs is cost-efficient for SNP identification because it reduces the complexity of the genome.Over 56%of repetitive elements can be eliminated,and the sequencing cost is also significantly reduced[7].However,micro-synteny analysis was needed to confirm the candidate gene,given that the CAO gene was not present in barley pseudochromosomes.Thus,further refinement of the barley reference genome is still needed for future map-based cloning.
Fig.3-Variation analysis of the EMS-induced Chl-b-less mutant M4437_2.(A)Pigment content in the leaves of the EMS mutant M4437_2 and its wild-type.(B)Exon-intron structure and protein domains of the CAO gene,showing a nonsynonymous mutation(P263L)in M4437_2.
In the present study,owing to the complete deletion of HvCAO gene,the pg mutant failed to accumulate Chl b.CAO contains three domains,A,B,and C.The A domain senses the presence of Chl b and regulates CAO protein levels[40].The functional C domain catalyzes the conversion of Chl a to Chl b.Most HvCAO mutant alleles previously reported have been InDel and missense mutations in critical sites,especially the C domain[30].Another EMS-induced mutant with one base pair substitution in the HvCAO gene showed a reduced Chl b concentration,with a missense mutation in the C domain reducing CAO activity.
In summary,we rapidly mapped a chlorophyll b biosynthesis gene by RRL sequencing combined with a modified pooling strategy of BSA,without need of prior parental-allele information.
Using a reduced-representation sequencing approach in conjunction with BSA,we mapped pg with near-isogenic lines derived from residual RHML.Deletion of HvCAO is responsible for a deficiency in Chl b synthesis,leading to a pale-green phenotype of barley.This study demonstrates a simple and optimized pooling strategy for fast mapping of agronomically important genes using a segregating population.
Acknowledgments
We thank Drs.Nils Stein and Martin Mascher(IPK Gatersleben)for kind help and advice on candidate gene confirmation.We thank Dr.Liangliang Gao(Kansas State University)for brief editing of English.This work was supported by the Young Elite Scientists Sponsorship Program by China Association for Science and Technology(2015QNRC001),the National Natural Science Foundation of China(31370032),the China Agriculture Research System(CARS-05)and the Agricultural Science and Technology Innovation Program.
Availability of data and materials
Illumina sequencing raw data of two phenotypic pools have been deposited in GenBank as accession PRJNA 387369(SRP107696).
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2018.07.002.