Jun Li,Rui Hn,Ruonn Li,Qing Xu,Mingzhu Li,Yue Tng,Jixing Li,Xi Wng,Zho Li,Qing Li,Ziwen Feng,,Lin Li,c,
a National Key Laboratory of Crop Genetic Improvement,Huazhong Agricultural University,Wuhan 430070,Hubei,China
b College of Informatics,Huazhong Agricultural University,Wuhan 430070,Hubei,China
c Hubei Hongshan Laboratory,Wuhan 430070,Hubei,China
Keywords: Breeding 4.0 Genotyping Epigenotyping iBP-seq CRISPR editing
ABSTRACT Inter-and intra-specific variations in phenotype are common and can be associated with genomic mutations as well as epigenomic variation.Profiling both genomic and epigenomic variants is at the core of dissecting phenotypic variation.However,an efficient targeted genotyping and epigenotyping system is lacking.We describe a new multiplex targeted genotyping and epigenotyping system called improved bulked-PCR sequencing (iBP-seq).We employed iBP-seq for the detection of genotypes and methylation levels of dozens of target regions in mixed DNA samples.iBP-seq can be adapted for the construction of linkage maps,fine mapping of quantitative-trait loci,and detection of genome editing mutations at a cost as low as $0.016 per site per sample.We developed an automated bioinformatics pipeline,including primer design,a series of bioinformatic analyses for genotyping and epigenotyping,and visualization of results.iBP-seq and its bioinformatics pipeline,available at http://zeasystemsbio.hzau.edu.cn/tools/ibp/,can be adapted to a wide variety of species.
Both genomic and epigenomic mutations can cause phenotypic variation,which is exploited for breeding animals and plants with desirable traits[1-4].Genomic variations such as single-nucleotide polymorphisms (SNPs),insertion/deletions (InDels),presence/absence variation,and copy number variation (CNV) have been shown [5-7] to be associated with phenotypic variation.DNA methylation is one of the earliest discovered epigenetic marks,and its levels and patterns vary among species [8] and influence gene expression,embryonic development,and stress response.With the rapid advance of modern breeding technologies,breeding is gradually evolving into designed molecular breeding to reach breeding 4.0 [9],which will rely heavily on precise genotyping and epigenotyping at target loci.
With the advent of molecular markers,multiple newly devised methods of genotyping increase the efficiency of breeding selection.The genome-wide detection of simple sequence repeats(SSRs,also known as microsatellites)and SNPs,which occupy a dominant position in modern genetic analysis,has greatly improved the productivity and accuracy of molecular breeding [10].Since the completion of the Human Genome Project in 2003,next-generation sequencing technologies have reduced the cost per megabase sequenced and led to rapid growth in the number of sequenced genomes,supporting genome-wide genotyping for breeding [11].Multi-sample and multi-site genotyping have emerged from the combination of next-generation sequencing and multiplex PCR[12,13].Hi-Tom [14] is a platform for high-throughput tracking of mutations induced by clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9) systems. CRISPR/Cas systems usually generate biallelic,heterozygous,and chimeric mutations.The proportions of some mutation types are low and often overlooked.In recent years,software and websites have been developed to identify gene editing events by either Sanger or next-generation sequencing [15,16].However,these genotyping systems are time-consuming and expensive,hindering their utilization in breeding.
In DNA methylation analysis,CpG methylation information is generally enriched in promoters,enhancers,and other regulatory element regions,to coordinate epigenetic changes with genetic features including CNVs and SNPs [17].Since its original report in 1992 [18],bisulfite sequencing has become widely used for the genome-wide detection of DNA methylation sites as well as the identification of CpG methylation abnormalities in CpG islands.However,major limitations of conventional bisulfite sequencing are its low throughput and high cost.As a complementary technique,PCR-based methylation-specific PCR allows the analysis of only one or two CpG sites at a time [19].Combining the accuracy of conventional Sanger bisulfite sequencing with highthroughput sequencing,whole-genome bisulfite sequencing(WGBS) can measure methylation levels at single-C-base resolution across the entire genome [20].A variant of WGBS,singlemolecule real-time bisulfite sequencing (SMRT-BS) for targeted identification of CpG methylation has also been developed [21].SMRT-BS is amenable to a high degree of multiplexing with minimal clonal PCR artifacts.However,its cost is extremely high,and the downstream bioinformatic analysis is complex.On the road to breeding 4.0,an efficient tool to measure methylation at targeted loci is still lacking.
The objective of this study is to develop an efficient and lowcost multiplex targeted genotyping and epigenotyping system.In order to achieve this,we propose an improved bulked-PCR sequencing (iBP-seq) by combining next-generation sequencing with multiplex PCR,which can target both variant sites and CpG sites.iBP-seq consists of two steps: target-specific and barcoding PCR amplification.Two rounds of PCR for targeted site genotyping and methylation activity detection using site-specific primers and barcode primers are needed.To eliminate the influence of PCR duplication,UMIs can be included in the reverse primer in the first-round PCR.Reverse primers contain a target-specific sequence with 6-bp UMI sequence and common bridging sequence (5′-ATA GCGACGCGTTTCAAC-3′) added at the 5′end.PCR products can be submitted for high throughput sequencing,which can be used for accurate genotyping and epigenotyping in target genomic regions.
A BC1F2population of maize(Zea mays)for the plant height QTLqPH7[22]was used for iBP-seq.A set of 192 F6plants derived from a cross between two elite maize inbred lines (BY4944 and 9782)and a CRISPR/Cas9 transgenic population of 77 plants were used to test genotyping by iBP-seq.Leaves of 2-week-old seedlings from four maize inbred lines (B73,Mo17,W22,and SK) were collected for DNA methylation detection.Genomic DNA (gDNA) was extracted from fresh seedling leaf tissue using a standard CTAB protocol [23].gDNA was bisulfited with EZ DNA Methylation-Lightning Kit (Cat.No.D5031,Orange,California) for DNA methylation assays.
A set of 384 unique barcoded probes were synthesized to differentiate between target amplicons in iBP-seq (Table S1).Each barcode probe contained a bridge sequence and a unique 8-base barcode sequence but did not include the platform-specific adaptor sequence.The target-specific reverse primer contained a bridge sequence,6-base UMI sequence,and target-specific sequence.Target-specific primers ofqPH7were designed for technical evaluation of iBP-seq and fine mapping ofqPH7(Table S2).Twenty SNP markers (Table S3) for the RIL population which derived from a cross between maize inbred lines BY4944 and 9782 and an InDel marker (Table S4) for the individuals by CRISPR/Cas9-edited were used to evaluate iBP-seq.For detecting DNA methylation,12 barcoded primers containing a 9-base bridge sequence (Table S5)and two specific primers (Table S6) were developed.
First-round PCR was conducted in a 10-μL reaction system containing 50 ng of genomic DNA,0.5 μmol L-1of a mixture of allelespecific primer pairs (prepared previously and combined in equal amounts),and 5 μL of 2×Taq Mix(No.P112-AA,Vazyme,Nanjing,Jiangsu,China).A standard thermocycler with touchdown PCR program was used to run the reaction using the following parameters:95 °C for 3 min;10 cycles of 95 °C for 30 s,65 °C for 20 s (the annealing temperature was decreased by 1 °C/cycle),72 °C for 60 s;25 cycles of 95 °C for 30 s,60 °C for 20 s,72 °C for 60 s;and a final extension at 72°C for 5 min.The annealing temperature of the first-round PCR was adjusted according to theTmvalue of the target-specific sequences.The first-round PCR products were further barcoded during the second-round PCR using the allelespecific forward primer mixture and barcode primer and the same PCR program.For DNA methylation,approximately 1 μg gDNA treated with bisulfite was used as the template for a single multiplex PCR reaction using a target-specific primer mix and KAPA HiFi HotStart Uracil+ReadyMix (Cat.No.KK2801,Basel,Switzerland).The final PCR products were pooled equally and sent for nextgeneration sequencing(NGS)after purification and fragment selection for a target size of 180-400 bp using KAPA Pure Beads.The detailed protocols for genotyping and epigenotyping are available at https://github.com/Hanryi/MarkerTech/raw/master/protocols/.
Raw paired-end sequencing reads were first preprocessed using fastp (version 0.20.0) software [24] to remove short reads(length <72 for iBP-seq) and low-quality (q<20) paired reads.After trimming,more than 95%of the reads were retained for subsequent genotyping (Table S7).All reads were mapped to the B73 reference genome (AGPv4) using BWA (version 0.7.17) software[25].For loci with known variant information,a Python script was used to get genotype information at variant sites.For loci with unknown variants,Sentieon software[26]was used to call variants before genotyping with default parameters.For epigenotyping,Bismark (version 0.23.1) software [27] was used to identify methylation sites before epigenotyping.Code is freely available at https://github.com/Hanryi/MarkerTech.
Multiple barcode sequences and unique molecular identifiers(UMIs) are incorporated in iBP-seq to discriminate between samples and minimize sequence bias caused by PCR duplication(Fig.1A).To evaluate the efficiency and accuracy of iBP-seq,we performed multiplex genotyping of a BC1F2population segregating for a cloned plant-height QTLqPH7using a published genotype dataset [22].Six markers were selected and used with iBP-seq to perform multiplex genotyping of plants showing extreme height phenotypes.The final kernel density estimate (KDE) curve(Fig.S1a) and the average allele frequency (Fig.S1b) revealed that all three genotypes for the six markers could be simultaneously detected and clearly discriminated.Based on genotyping results,we then calculated theP-value of the χ2test associated with variation in plant height for each SNP marker (Table S8) and fine-mapped theqPH7QTL to a genomic region with the highest association peak at~135.4 Mb on chromosome 7 (Fig.1B),in agreement with the previous study [22].These results demonstrated that iBP-seq can accurately identify genotypes by combining multiplex PCR and NGS.To balance high genotyping accuracy and cost-effectiveness in iBP-seq,we used the above dataset to evaluate the optimal sequencing depth needed for QTL mapping with iBP-seq.Genotyping accuracy reached 99.8% when the number of sequencing reads was 0.3 million (96 samples with 6-plex markers) with iBP-seq (Fig.1C).
Fig.1.Overview and application of iBP-seq,and the development of an automatic bioinformatic pipeline.(A)Rationale of iBP-seq.(B)Fine mapping of qPH7 QTL with iBP-seq.(C) Assessment of the optimal sequencing depth for iBP-seq.(D) Identification of mutations at editing target sites after CRISPR/Cas9-mediated genome editing.(E) DNA methylation levels over two target regions,examined using‘‘two primers for four samples”strategy.(F)The correlation coefficients between the DNA methylation levels from four strategies(a means‘‘one primer for one sample”,b means‘‘one primer for four samples”,c means‘‘two primers for one sample”,d means‘‘two primers for four samples”)over two target regions.(G) Cost comparison of iBP-seq strategies for targeted DNA methylation analysis.(H) A user-friendly bioinformatics pipeline for iBP-seq.
We designed 20 iBP-seq markers evenly distributed among maize chromosome 4 to construct a linkage map for a population of 192 RILs.The results indicated that 17 markers at 20-plex were obtained accurate genotypes with iBP-seq (Fig.S1c),based on which we constructed a linkage map of chromosome 4 (Fig.S1d).Furthermore,we identified a multiallelic mutation with a 7-bp deletion and a 9-bp deletion at the target site for the individuals by CRISPR editing (Fig.1D).
The epigenome,including DNA methylation level,shapes plant growth and development.We detected DNA methylation levels by extracting genomic DNA for bisulfite treatment from the maize inbred lines B73,Mo17,W22,and SK.We first performed PCR with‘‘one primer for one sample”.For the Chr.1:83554984-83555202 genomic region,we found that DNA hypermethylation occurred only in B73,whereas the Chr.7:46311746-46311527 genomic region occurred in both Mo17 and SK inbred lines(Fig.S2a).To validate these results,we measured the DNA methylation levels of B73 and Mo17 among these two regions by WGBS [28] (Fig.S2b).The two methods revealed similar methylation levels except for the first genomic region in Mo17,which may have differed owing to the DNA from different tissue stages.To investigate whether the two rounds of PCR would aggravate the bias of the detected methylated level,we obtained the amplified products of the four inbred lines in two genomic regions using one round of PCR,pooled eight samples,purified them,and sent them for NGS.Similar DNA methylation levels were observed to that of iBP-seq (Fig.S3a).Then,the methylation levels of the four inbred lines in the two genomic regions were comparable with that of the ‘‘one primer for one sample” strategy by iBP-seq,the average correlation was 95.6% (Fig.S3b),thus confirming the accuracy of the methylation profiles obtained by iBP-seq.
To further test the utility of iBP-seq for measuring methylation levels,we equally mixed DNA samples from the B73,Mo17,W22,and SK inbred lines for bisulfite conversion.The strategies‘‘one primer for four samples” (Fig.S4a),‘‘two primers for one sample”(Fig.S4b)and‘‘two primers for four samples”(Fig.1E),yielded similar results,with correlations of least 92% between pairs of strategies (Fig.1F).
We assessed the cost of iBP-seq for genotyping and epigenotyping,for primer synthesis and library preparation as well as for NGS.Based on the optimal sequencing depths and the genotyping accuracy,the average cost per genotype significantly decreased,reaching as low as$0.016 per site per sample with the increased sample numbers(Table S9).The costs of the four iBP-seq strategies for targeted DNA methylation profiling were also assessed.The cost for one targeted region from one sample dropped as low as $1.85 using the ‘‘two primers for four samples” strategy (Fig.1G),a cost much lower than the market price.The pooling of genomic DNA and primer pairs was responsible for the cost saving.
To facilitate the use of iBP-seq by users without strong bioinformatics skills,we developed an automated bioinformatics pipeline(http://zeasystemsbio.hzau.edu.cn/tools/ibp/) (Fig.S5a),including primer design (Fig.S5b) and a series of bioinformatic analyses directed at genotyping and epigenotyping(Fig.S5c).In iBP-seq services,users must provide a pair of paired-end sequence files,a barcode file for splitting sample sequences,and a mutation information file.The pipeline does not supportde novotracking mutations,but the variant information generated by Sentieon software can be used as the input file of mutation information.The genotyping results of samples can be automatically extracted and exported into a Microsoft Excel document(Fig.1H).In epigenotyping service,users need only upload paired-end sequence files,a barcode file,and a local reference genome file to receive the final methylation results automatically.The site currently provides reference genomes for four species (Zea mays,Oryza sativa,Glycine max,Arabidopsis thaliana).For other species,users can upload reference genomes for iBP-seq analysis.
We describe a novel method called iBP-seq for targeted genotyping and DNA methylation profiling by combining nextgeneration sequencing with multiplex PCR.We also established a user-friendly bioinformatics platform that permits the automated analysis of raw reads and the visualization of these targeted genotyping and epigenotyping results.iBP-seq eliminates PCR redundancy by introducing UMI sequences.iBP-seq can discriminate hundreds of samples for tens of target genomic regions with the use of a barcode primer for indexing.iBP-seq can be employed for fine mapping,constructing genetic maps,and genotyping CRISPR-edited plants as well as profiling methylation levels in target genomic regions at a low cost.iBP-seq can be used for a wide range of species.
iBP-seq can overcome the shortcomings of existing methods(Table S10).Compared with the Hi-Tom platform,iBP-seq has several unique features: 1) iBP-seq adds barcode to only one end of the target fragment,increasing the flexibility of amplifying sample number.Introduction of UMIs sequence into the fragment increases data utilization rates.2) iBP-seq can not only genotype the target site but also detect the methylation level of the target region.3)Our website provides a bioinformatics interface for genotyping,making it easy for users to perform iBP-seq.4) The results of genotype and methylation level are visualized in iBP-seq,allowing users to judge the reliability of the results.iBP-seq complements current genotyping and epigenotyping systems.Furthermore,it is beneficial to use iBP-seq when analyzing large numbers of samples or sites,with 1 Gb sequencing data sufficient to analyze thousands of amplicons.Considering the cost of an NGS library and sequencing data,we suggest using direct Sanger sequencing when <20 amplicons are analyzed.Given the lowcost of iBP-seq,the screening of CRISPR/Cas9-edited materials should be more efficient and cost-effective.
Application of iBP-seq to genotyping and to quantification of DNA methylation has some limitations.Targeted genomic regions may exhibit preferential or uneven amplification when mixed primer pairs are used,with some regions not being amplified at all.This issue can be resolved by appropriate primer design.First,it is essential to keep the melting temperature similar for all primer pairs and to avoid complementary sequences.Second,we recommend using a lower concentration of primers than in single-plex PCR.Third,it is important to test the efficiency of each pair of primers on the template before performing multiplex PCR.Based on our experience investigating the genotyping accuracy rates and costs of multiplex PCR amplifications,we recommend that users choose to amplify 20 target sequences of single DNA samples for genotyping in a PCR reaction,reducing costs while ensuring accuracy.For epigenotyping,it is cost-effective to mix DNAs from multiple samples before bisulfite conversion,which requires sufficient SNPs differentiating the samples.The number of mixed DNA samples and the targets for epigenotyping depend on the number of SNPs that can distinguish among samples in the PCR amplification products.Accordingly,when DNA mixing is performed,it is necessary to ensure that the mixed samples have differences in the target region so that the amplification products of different samples can be distinguished.Overall,iBP-seq shows efficient performance for genotyping and epigenotyping among various populations because of its high efficiency and low cost and offers high potential for application in modern crop breeding.
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive [29] in National Genomics Data Center[30],China National Center for Bioinformation/Beijing Institute of Genomics,Chinese Academy of Sciences (GSA:CRA009122) and are accessible at https://ngdc.cncb.ac.cn/gsa.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Juan Li:Formal analysis,Validation,Visualization,Writing -original draft.Rui Han:Software,Data curation.Ruonan Li:Validation.Qiang Xu:Validation.Mingzhu Li:Validation.Yue Tang:Software.Jixiang Li:Software.Xi Wang:Resources.Zhao Li:Resources.Qing Li:Conceptualization,Project administration.Zaiwen Feng:Conceptualization,Project administration.Lin Li:Conceptualization,Project administration,Supervision,Writing -review &editing.
We appreciate the funding supports from the National Natural Science Foundation of China (32272158),the Major Program of Hubei Hongshan Laboratory (2021hszd008),Hainan Yazhou Bay Seed Lab (B21HJ8102),and Huazhong Agricultural University Scientific & Technological Self-innovation Foundation(2021ZKPY001).We appreciate the high-performance computing platform at National Key Laboratory of Crop Genetic Improvement in Huazhong Agricultural University.
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2023.03.012.