Lirui Cheng, Xiocui Chen,Cihong Jing, Bing M,Min Ren, Yzeng Cheng,Dn Liu, Ruimei Geng, Aiguo Yng,*
aKey Laboratory of Tobacco Genetic Improvement and Biotechnology, Tobacco Research Institute, Chinese Academy of Agricultural Sciences,Qingdao 266100, Shandong,China
bKey Laboratory of Zunyi Crop Gene Resource and Germplasm Innovation,Zunyi Academy of Agricultural Sciences,Zunyi 563006,Guizhou,China
Keywords:Single nucleotide polymorphism Genetic linkage map Tobacco Cucumber mosaic virus Quantitative trait loci
A B S T R A C T Genetic linkage maps are essential for studies of genetics,genomic structure,and genomic evolution,and for mapping quantitative trait loci(QTL).Identification of molecular markers and construction of genetic linkage maps in tobacco(Nicotiana tabacum L.),a classical model plant and important economic crop, have remained limited. In the present study we identified a large number of single nucleotide polymorphism (SNP) markers and constructed a high-density SNP genetic map for tobacco using restriction site-associated DNA sequencing. In 1216.30 Gb of clean sequence obtained using the Illumina HiSeq 2000 sequencing platform,99,647,735 SNPs were identified that differed between 203 sequenced plant genomes and the tobacco reference genome. Finally, 13,273 SNP markers were mapped on 24 high-density tobacco genetic linkage groups. The entire linkage map spanned 3421.80 cM,with a mean distance of 0.26 cM between adjacent markers.Compared with genetic linkage maps published previously, this version represents a considerable improvement in the number and density of markers. Seven QTL for resistance to cucumber mosaic virus(CMV)in tobacco were mapped to groups 5 and 8.This high-density genetic map is a promising tool for elucidation of the genetic bases of QTL and for molecular breeding in tobacco.
Tobacco (Nicotiana tabacum L.) is an allotetraploid (2n = 4x =48) produced by the hybridization of N. sylvestris × N.tomentosiformis [1,2]. Several factors are stimulating increased interest in the genetics and genomics of tobacco. First,tobacco is a model for the investigation of plant pathology,genetics, and biotechnology, and studies of tobacco genetics and genomics have strengthened the use of N. tabacum as a plant model system. Second, tobacco is a member of the nightshade (Solanaceae) family and a classical allopolyploid species model. Decipherment of the genomic structure of tobacco would provide a more detailed picture of genomic evolution in the Solanaceae family. Third, tobacco is an important agricultural crop grown in >120 countries [3]. In addition, the cultivation of tobacco has been considered for the“molecular farming”of commercially useful proteins[4,5].Thus,studies of tobacco genetics and genomics are beneficial for applied tobacco research.
Genetic linkage maps are essential for studies of genetics,genomic structure, and genomic evolution and for mapping QTL. In recent decades,several genetic maps of tobacco have been developed using different kinds of molecular markers[6-11].The first simple sequence repeat(SSR)genetic map was constructed in an N. tabacum Red Russian × N. tabacum Hicks Broad Leaf cross, with 293 SSR loci and one morphological trait, flower color [8]. This map was 1920 cM in length with a mean marker interval of 6.5 cM. Bindler et al. [9], using SSR markers, developed a high-density genetic map consisting of 24 linkage groups and spanning 3270 cM,with 2317 markers.
Single nucleotide polymorphisms (SNPs) provide an ideal marker system for the construction of high-density maps, as they are highly abundant and evenly distributed in the plant genome [12-15]. Recent developments in next-generation sequencing (NGS) technology have allowed rapid discovery of thousands of SNP markers and high-throughput genotyping of large populations [16-20]. In particular, restriction siteassociated DNA sequencing (RAD-seq) is a powerful method for the development of a large number of genetic markers and construction of high-density genetic maps owing to its high throughput and cost effectiveness[21-23].In previous studies,RAD-seq has been used widely to discover SNPs and create genetic maps for many plants including eggplant [24,25],barley [26,27], cotton [28,29], and peanut [30,31]. In tobacco,three maps have recently been constructed by NGS methods[32,33].
CMV belongs to the Cucumovirus genus in the Bromoviridae family and has a wide host range[34].CMV is one of the most severe virus diseases affecting tobacco production and yield in many planting areas[35].Genetic resistance is an economically attractive prospect when it can be incorporated into varieties without impairing yield and quality.
Resistance to CMV in tobacco is quantitatively inherited,and monogenic dominant resistance is scarce. Classical genetic studies showed that resistance to CMV was controlled by five genes: the N gene from N. glutinosa, rml and rm2 from tobacco accession Ambalema, and tl and t2 from tobacco line T.I.245. The five genes were pyramided into the tobacco line Holmes[36-38],and resistance derived from Holmes has been incorporated into many cultivars. Among them, TT8, developed from a cross between Hicks and Holmes, shows moderate resistance to CMV. In China, the cultivar TT8 has been used as source of resistance to CMV in flue-cured tobacco breeding.
In this study, RAD-seq was used to generate SNP markers for tobacco. These markers were used to construct a highdensity SNP genetic map and to identify QTL associated with resistance to CMV.
The high-quality flue-cured tobacco cultivar NC82 from the USA and the TT8 CMV-resistant tobacco cultivar from China were used as parental lines. TT8 was the female parent. F1plants were backcrossed to TT8 to develop a BC1F1population of 200 individuals.The two parents,a single F1plant,and 200 BC1F1progeny were used in the subsequent genotyping experiment.The P1,P2,F1,and BC1F1generations were planted at the Shandong Experimental Station, Tobacco Research Institute, Chinese Academy of Agricultural Sciences,Qingdao(35.4°N,119.3°E,altitude 15 m),Shandong, China.
RAD-seq was performed as described by Baird et al.[21],with minor modification. Genomic DNA samples of the two parents, a single F1plant, and 200 BC1F1individuals were extracted from young leaf tissue using the DNeasy plant mini pre-kit (Qiagen, Valencia, CA, USA). Genomic DNA was digested with EcoR Ι (5′-GAATTC-3′), and a P1adapter containing individual specific nucleotide barcodes (4-8 bp long) was ligated to the overhanging ends to track the samples. The adapter-ligated fragments were pooled and randomly sheared, and 350-500-bp DNA fragments were isolated using a gel extraction kit (Qiagen). The P2adapter,which was a “Y” adapter with divergent ends, was added to the selected fragments. Finally, 203 RAD libraries were prepared using two parent libraries, a single F1plant library,and 200 BC1F1offspring libraries. These libraries were enriched by polymerase chain reaction amplification with final selection of 350-500 bp fragments. Each library was sequenced on an Illumina HiSeq 2000 sequencing platform at BGI Technologies Corporation (Shenzhen, China;http://www.bgisequence.com/)using paired-end reads.
Low-quality reads (Q score <20) and reads without a correct barcode were filtered out. Clean reads (each ~86 nucleotides long)were sorted according to nucleotide barcodes.All pairedend reads with clear index information were aligned against the tobacco TN90 draft genome [3] for SNP calling using SOAPaligner software [39]. A minimum stack depth of 5 was applied for SNP calling. A global minimum allele count of 3 and a minimal sequencing depth of 20× were applied to ensure that SNPs were polymorphic.A total of 99,647,735 SNPs were identified in 203 samples and the reference genome.SNPs with a homozygous parental genotype and heterozygous F1genotype were selected to construct the genetic map. In addition, >75% integrity of the RAD markers was required in the progeny:at least 75%of BC1F1progeny carried the parental alleles at any given marker locus.
Fig.1-Transitions and transversions in the set of 13,273 single-nucleotide polymorphic markers mapped to genetic maps.
Fig.2- High-density single-nucleotide polymorphic (SNP)genetic map of tobacco.Bars on each linkage group represent SNP markers.Detailed map is described in Table S2.
Markers showing high segregation distortion (χ2test,P <0.001, df = 1) were excluded from map construction.JoinMap 4.1 software [40] was used to group and order the markers.A recombination frequency ≤0.4 and logarithm of the odds(LOD)value ≥3.0 were used as thresholds for groups.The Kosambi mapping function was used to calculate map distances. Genetic maps were drawn using MapChart 2.0 software[41].
Three rounds of phenotypic evaluation were conducted for both parents,F1,and 200 BC1F1.For each round of phenotypic evaluation, about 100 plants were regenerated for each BC1F1using tissue culture to repeatedly evaluate resistance to CMV[42]. Approximately 30 tissue-cultured plants per BC1F1individual were transplanted into individual plastic pots(10 cm height, 12 cm diameter) and maintained in a greenhouse under cycles of 13 h light at 29 °C and 9 h dark at 24 °C.The experiment used a completely random design with two replications (200 BC1F1plants, two parents, and one F1); two parents, F1and each BC1F1in one replication consisted of 10 plants.
Plants were inoculated with the CMV-Fny strain at the sixleaf stage. Inoculum was prepared by grinding young symptomatic leaf tissues from plants infected with CMV-Fny using a mortar and pestle at a ratio of 1:2(w/v)in 0.1 mol L-1sodium phosphate buffer (pH 7.4). Viral inoculum was rubbed onto the top two leaves of each plant. Disease severity was rated 15 days post-inoculation (dpi) using a scale from 0 to 9 based on mosaic symptoms and leaf distortion (0 = no symptoms;1 = mild mosaic on inoculated leaf, no leaf distortion; 3 =mosaic in 1/3 of all leaves, no distortion; 5 = mosaic from 1/3 to 1/2 of all leaves,mild leaf distortion;7 = mosaic from 1/2 to 2/3 of all leaves,mild stunting;9 = mosaic in all leaves,severe stunting).The incidence of disease(ID)and disease index(DI)scores per BC1F1were calculated using the following formulae:ID=Number of asymptomatic plants/total number of plants ×100
Fig.3- Numbers of unique single-nucleotide polymorphic markers originating from S, T,or O genomes.
where F is the disease evaluation scale score,V is the number of plants with each scale score,N is the total number of plants observed, and X is the highest disease evaluation scale score.
Inclusive interval mapping was performed using IciMapping 4.1 software for QTL mapping [43-45]. Empirical LOD thresholds were set at 3.3 for the ID trait and 3.0 for the DI trait to identify significant QTL based on 1000 runs and a type I error of 5%[46].
A total of 203 samples, including the two parents, a single F1,and 200 BC1F1individuals, were sequenced, yielding 1216.30 Gb of sequence with a mean read length of 90 bp. The Q20 ratio of each sample was >92.68%, indicating high-quality sequence. The mean content ratio of guanine and cytosine for all samples was 39.19%, a value higher than that found in Solanum lycopersicum (tomato, 33%) and Arabidopsis thaliana(thale cress, 36%), but lower than that in Oryza sativa (rice, 42%)and Zea mays (maize, 45%) [47]. After editing, 1058.47 Gb of high-quality sequences with a mean sequence length of 86.82 bases (standard deviation 5.6 bases, range 82-90 bases) were available for sequence assembly. Among these high-quality data, 97,829,622 reads came from parents (52,062,344 and 45,763,278 reads from TT8 and NC82,respectively), and 5,969,792,515 reads came from the 200 BC1F1progeny. The number of sequence reads per BC1F1plant was 18,896,429-52,545,957 with a mean of 29,848,963 reads (Table S1).
Table 1-Statistics of the single-nucleotide polymorphism-based tobacco genetic map.
Fig. 4 - Symptoms of NC82 and TT8 infected by CMV. (A) Phenotypic comparison of NC82 and TT8 plants at 15 dpi; (B) Symptoms of leaves from NC82 and TT8 at 15 dpi. dpi, days post-inoculation.
All high-quality reads obtained from parents and the progeny were aligned to the TN90 tobacco reference genome[3].In total,548,617 and 491,503 SNP loci were identified in TT8 and NC82, respectively. The mean number of SNP loci for progeny was 490,718.42 (range, 382,195-615,069). After the filtering out of low-quality markers, 13,974 high-quality markers remained. Of the 13,974 polymorphic SNPs, 13,273 were classified into different SNP types.More SNP markers were of transition than transversion type,with 2326 A/G and 4510 C/T SNPs accounting for respectively 33.98%and 17.52%of all SNPs.The number of different transitions was not balanced, with a ratio of 0.52,which differs from those for eggplant(49.60%A/G and 50.30% C/T) [24], jujube (49.60% A/G and 50.30% C/T) [48],and peanut(36.00%A/G and 37.00%C/T)[30].Four transversion types(1258 G/T,1317 A/C,2771 C/G,and 1091 A/T;8.22%-20.88%)were identified,and G/C transversions accounted for less than half of the total transversions(Fig.1).
Of the 13,974 polymorphic SNP markers, 13,273 (including 5694 SNPs mapped to unique genetic locations) were mapped to 24 high-density linkage groups of tobacco (Table S2). The entire linkage map spanned 3421.80 cM,with a mean distance of 0.26 cM between adjacent markers (Table 1, Fig. 2). LG length ranged from 35.30 to 347.10 cM(mean,142.59 cM).The most saturated LG was LG04, which had an inter-marker distance of 0.12 cM,whereas LG03 had the largest mean intermarker distance of 0.72 cM. The largest LG was LG06,contained 1143 markers, with a mean distance of 0.30 cM.The smallest LG was LG10, contained 123 markers, with a mean distance of 0.63 cM.
According to alignment with the two genome sequences from the ancestral parents, N. sylvestris (S genome), N.tomentosiformis (T genome) and N. otophora (O genome), the SNP markers were assigned to tobacco genomes[2](Tables S3 and S4). In total, 53.83% (n = 7145) were assigned to the S genome, 41.79% (n = 5547) to the T genome, and 4.38% (n =581) to the O genome. In total, 50 were unique SNP markers assigned to the S ancestral genome, with a mean identity score of 99.01%and 0 identity to the other ancestral genomes(Fig.3).Nine LGs(4,6,9,10,11,16,19,22,and 24)were clearly assigned to the S genome,whereas 10 LGs(2,3,5,8,12,13,14,17,18,and 21)were assigned to the T or O genome.In LGs 1,7,15, 20, and 23, 2017 markers were assigned to the S, 1730 markers to the T, and 224 markers to the O genome. In sum,the LGs represented the S,T,and O genomes in different parts of the linkage maps.
Table 2-CMV resistance phenotypes of different generations.
Table 3-Analyses of variance for ID and DI in a BC1F1 population.
The two parental lines displayed markedly different reactions to CMV under different environments (Fig. 4). As Table 2 showed, the female parent TT8 displayed high resistance,with a mean DI of 63.64% and ID of 14.67% under three different environments, whereas the other parent NC82 displayed a completely opposite phenotype for CMV resistance with a mean DI of 100.00%and ID of 35.55%across three different environments. Thus, the two parental materials displayed markedly differently reactions to CMV. Transgressive segregation for CMV resistance in the BC1F1was found and the mean values of DI and ID over the three environments were respectively 23.74%and 82.79%(Table 2).
ANOVA showed significant differences among genotypes for DI and ID. Significant differences were also observed among different environments and genotype × environment interactions (Table 3).
Seven QTL,including two for ID and four for DI,were identified in three rounds using the BC1F1population(Table 4,Fig.5).qID5 for ID was mapped to the interval mk6533-mk646 on LGs 5 in the second and third tests,explaining 7.70%and 7.33%of the total phenotypic variance.For DI,qDI8 on LG 8 was mapped to similar genomic regions in different environments, and explained respectively 7.99%,10.27%,7.20%,and 7.20%of total phenotypic variance,suggesting stable genetic effects in different environments.QTL qDI5 was mapped to the same interval as qID5 for ID observed in environment 2.
A high-density genetic map provides a foundation for QTL mapping and molecular marker-assisted selection. Although>5000 SSR markers were published by Bindler et al. [9] and a high-density SSR genetic map containing 2363 SSR markers covering 24 LGs with a total length of 3270 cM has been constructed for tobacco,the number of markers is insufficient for the needs of genetic research and tobacco breeding. As a member of the Solanaceae family,tobacco is an allopolyploid species(2n = 48)with a large genome(4.5 Gb)[3].Its evolution has included a genetic bottleneck, which resulted in the very low diversity of tobacco accessions today. In addition, the construction of a genetic map based on an intra-type cross(e.g., flue-cured tobacco) is more suitable for practical breeding. However, very low polymorphism is present among tobacco cultivars of the same type. For example, Tong et al.[10] screened 10,005 SSRs (including 4886 developed by the authors[10]and 5119 reported by Bindler et al.[9])and found only 590 polymorphic markers between two flue-cured tobacco cultivars. Thus, the construction of high-density genetic and QTL maps between cultivars of the same type using conventional molecular markers, such as random amplified polymorphic DNA, AFLPs, and SSRs, is timeconsuming and inefficient [49-53]. RAD-seq is a cost-efficient and high-throughput alternative for construction of highdensity genetic maps for tobacco, which can produce numerous molecular markers and be used to genotype hundreds of individuals derived from mapping populations at a relatively low cost and in a short time.In addition,markers identified by RAD methods are sequence-based[23,26],a characteristic that is helpful for comparative genomic studies and candidate gene research for traits of interest[18].
In this study, 99,647,735 SNPs were identified between samples and the reference TN90 tobacco genome by RAD-seq.Although the degree of polymorphism in tobacco is very low,13,974 integral and accurate SNPs were identified for constructing a genetic map. A high-density map consisting of 13,273 SNP markers distributed across 24 LGs spanning3421.80 cM was developed.The linkage map developed in this study represents considerable improvement in the number and density of markers compared to prior mapping efforts in tobacco(Table 5).
Table 4-QTL affecting resistance at the seedling stage to CMV in the BC1F1 population derived from TT88 × NC82.
Fig.5- Locations of QTL affecting resistance to CMV identified in the BC1F1 population.
We also examined the origin and evolution of the N.tabacum genome by mapping SNP markers to the ancestral sequenced genomes of N. sylvestris (2n = 24, maternal donor),N. tomentosiformis (2n = 24, paternal donor), and N. otophora(2n = 24, a potential genome donor). The results indicate that the S and T genomes of N. tabacum differ significantly and have undergone rearrangements since they diverged from their common ancestor. Markers derived from the O genome showed a tendency to be located in the same regions as N.tomentosiformis markers. The results are consistent with the hypothesis that the predominant parental donor was N.tomentosiformis rather than N. otophora. Interestingly, 80 markers showed strong identity with the S genome, with a mean identity score of 98.42%, and could not be matched to the other genomes. It is possible that some sequences in regions of the S genomes were missed owing to polyploidization [54,55]. Sierro et al. [2] reported a 4%-8%reduction in the size of the allotetraploid N. tabacum genome compared with its ancestral species. However, it is curious that all SNP markers found in only a single genome were from the S.
CMV is one of the most important diseases affecting tobacco production, and affects >1200 other species [56,57]. Several dominant-resistant genes for CMV have been reported,but they provided resistance to only a few CMV isolates [58,59].Resistance to CMV is a quantitative trait controlled by multiple genes in most plants[60-62].Based on the high-density geneticmap,five QTL were identified in three testing environments.Our most interesting results are the identification of the novel qDI8 QTL and its mapping to similar regions in LG8 in different environments (Table 4) in the BC1F1population.Previous studies showed that resistance to CMV derived from TT8 was inherited from Holmes, and the genotype of Holmes was thought to be NNrm1rm1rm2rm2t1t1t2t2 [37,38].This,the novel qDI8 QTL may correspond to one or several of the four recessive loci (rm1, rm2, t1, and t2). Our results will serve as a basis for further analysis of the role of qDI8 in CMV resistance.
Table 5-Comparison of published tobacco linkage maps.
In summary, the genetic map developed in this study provides a tool for detecting QTL and applying molecular breeding to tobacco improvement.
A large number of tobacco SNP markers were identified using restriction site-associated DNA sequencing and a highdensity genetic map containing 13,273 markers was constructed. Compared to previous genetic linkage maps of tobacco, the map represents a considerable improvement in the number and density of markers.Seven QTL for resistance to CMV in tobacco were mapped to linkage groups 5 and 8.These results will facilitate marker-assisted selection for resistance against CMV in tobacco.
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2018.11.010.
The work was supported by the Agricultural Science and Technology Innovation Program(ASTIP-TRIC01).