張林甦 韓忠耀 王傳明 鄧先擴(kuò)
摘 要: 該研究以新鮮陰地蕨全株為材料,用Illumina HiSeq 2500平臺(tái)進(jìn)行全轉(zhuǎn)錄組測(cè)序,干凈序列經(jīng)組裝后得單一基因(Unigene),將Unigene在非冗余蛋白/核酸數(shù)據(jù)庫(nonredundant protein database, NR)、核酸序列數(shù)據(jù)庫(nucleotide sequence database,NT)、基因本體論數(shù)據(jù)庫(gene ontology,GO)、蛋白質(zhì)真核同源數(shù)據(jù)庫(clusters of eukaryotic orthologous groups,COG)、京都基因與基因組百科全書(Kyoto encyclopedia of genes and genomes,KEGG)、蛋白質(zhì)序列數(shù)據(jù)庫SwissProt和Interpro進(jìn)行生物信息學(xué)分析。結(jié)果表明:共獲得干凈序列 6.67 Gb ,組裝后得到 58 646 個(gè)Unigene,平均長(zhǎng)度為1 023 bp,Unigene在上述數(shù)據(jù)庫的總體注釋率為69.25%。其中,在GO數(shù)據(jù)庫中,20 762個(gè)基因被注釋到生物功能、細(xì)胞組分和分子功能3個(gè)本體的52個(gè)功能組,COG注釋了20 633個(gè)基因并將其劃分為25個(gè)功能簇;在KEGG數(shù)據(jù)庫中,比對(duì)注釋了29 377個(gè)基因,可劃分為5個(gè)大類、19個(gè)亞類代謝途徑,據(jù)此篩選出八類植物激素信號(hào)轉(zhuǎn)導(dǎo)相關(guān)的41個(gè)基因家族。通過比對(duì)共得到43 102個(gè)編碼序列(coding sequence,CDS),平均長(zhǎng)度為749 bp, N50為1 137;篩選到60個(gè)轉(zhuǎn)錄因子(transcript factor,TF)家族共1 502個(gè)轉(zhuǎn)錄因子基因;共發(fā)現(xiàn)17 195 個(gè)單核苷酸多態(tài)性(single-nucleotide polymorphism,SNP)位點(diǎn),其中堿基轉(zhuǎn)換11 122個(gè),顛換6 073個(gè);發(fā)現(xiàn)了8 245個(gè)簡(jiǎn)單序列重復(fù)(simple sequence repeat,SSR),數(shù)量最多的為二核苷酸重復(fù)和三核苷酸重復(fù)。這些結(jié)果從功能和結(jié)構(gòu)方面提供了陰地蕨全轉(zhuǎn)錄組信息和涉及植物激素信號(hào)轉(zhuǎn)導(dǎo)的潛在基因,為進(jìn)一步深入研究陰地蕨的生長(zhǎng)發(fā)育、遺傳、品種鑒別等提供了分子生物學(xué)的基礎(chǔ)數(shù)據(jù)。
關(guān)鍵詞: 陰地蕨, 轉(zhuǎn)錄組, 植物激素, 信號(hào)轉(zhuǎn)導(dǎo), 基因篩選
中圖分類號(hào): Q943 ?文獻(xiàn)標(biāo)識(shí)碼: A
文章編號(hào): 1000-3142(2020)04-0536-10
Abstract: Botrychium ternatum is a commonly used folk medicinal plant, its growth and development have some typical representative characteristics of some fern plants. But researches on it mainly focus on chemical constituents, clinical and pharmacological effects, classification and distribution investigation, few about its molecular biology.Plant hormone is a sort of small signal molecular and has very important function to plant growth and development, and plant hormone signal transduction plays a key role in hormonal equilibrium. To obtain related information, Illumina HiSeq 2500 platform was used to perform transcriptome sequencing and bioinformatics analysis were carried out afterwards. Results showed 6.67 Gb clean reads was obtained and 58 646 Unigenes were assembled with an average length of 1 023 bp. Unigenes were annotated in nonredundant protein database (NR), nucleotide sequence database (NT), gene ontology (GO), clusters of eukaryotic orthologous groups (COG), Kyoto encyclopedia of genes and genomes (KEGG),Swissprot and Interpro databases with an overall annotation rate of 69.25%.Through GO annotation, 20 762 genes were annotated to three terms and 52 functional groups. A total of 20 633 genes were divided into 25 functional cluster by COG annotation. Through KEGG analysis, 29 377 genes were mapped to five groups and nineteen sub-groups pathways. In addition, 41 gene families related to eight plant hormone signal transduction pathways were screened. Through BLAST and ESTScan, 43 102 coding sequences (CDS) were found, with average length 749 bp, N50 1 137. 60 transcript factor gene families with total 1 520 genes were screened out, including C3H, MYB,MYB-related, bHLH,AP2-EREBP,WRKY and GRAS. 17 195 single-nucleotide polymorphisms (SNP) were found, including 11 122 transitions and 6 073 transversion. And 8 245 simple sequence repeats were found, among them di-nucleotide repeats and tri-nucleotide repeats rank the top two abundance. These data sets provide functional and structural information of global transcriptome and putative genes involving plant hormone signal transduction, and basic data for further research on the growth, development and variety identification of B. ternatum.
Key words: Botrychium ternatum, transcriptome, plant hormone, signal transduction, gene screening
植物激素是一類信號(hào)小分子,對(duì)于植物的生長(zhǎng)發(fā)育起著重要作用,其通過植物激素信號(hào)轉(zhuǎn)導(dǎo)系統(tǒng)發(fā)揮作用,即內(nèi)因或外因可促使一系列植物激素基因誘導(dǎo)和表達(dá),作用于相應(yīng)的激素受體或組件,最終顯現(xiàn)出不同的性狀(蘇謙等,2008)。常見的植物激素有生長(zhǎng)素、細(xì)胞分裂素、赤霉素、脫落酸、乙烯、油菜內(nèi)酯素、茉莉酸和水楊酸等,在這些激素信號(hào)轉(zhuǎn)導(dǎo)系統(tǒng)中有的受體或關(guān)鍵組分因互作(interact)或串話(crosstalk)會(huì)產(chǎn)生協(xié)同或拮抗的作用而使信號(hào)途徑網(wǎng)絡(luò)化(Ohri et al., 2015)。比如光信號(hào)可以通過與生長(zhǎng)素信號(hào)途徑發(fā)生串話從而調(diào)控根的發(fā)育(Kumari & Panigrahi, 2019),光敏色素作用因子(PIFs)可以響應(yīng)赤霉素、油菜內(nèi)酯素、茉莉酸、生長(zhǎng)素吲哚乙酸(IAA)、脫落酸、乙烯等信號(hào)途徑,通過這個(gè)“樞紐”分子將激素信號(hào)途徑聯(lián)成了復(fù)雜的網(wǎng)絡(luò)(任小蕓等,2016)。植物激素還可通過表觀遺傳調(diào)控促進(jìn)開花,如赤霉素、茉莉酸、脫落酸和生長(zhǎng)素對(duì)DNA甲基化、組蛋白翻譯后修飾介導(dǎo)的染色質(zhì)壓縮起重要作用從而影響開花(Campos-Rivero et al., 2017)。另外,在植物應(yīng)對(duì)土壤病原菌侵害時(shí)也進(jìn)化出復(fù)雜的激素信號(hào)網(wǎng)絡(luò)來保護(hù)自身(Berens et al., 2017)??梢娭参锏募に匦盘?hào)轉(zhuǎn)導(dǎo)系統(tǒng)對(duì)植物的生長(zhǎng)發(fā)育、防御、環(huán)境適應(yīng)等具有重要意義。
陰地蕨(Botrychium ternatum),又名一朵云、小春花、蛇不見、郎萁細(xì)辛、獨(dú)腳蒿、冬草等,屬于陰地蕨科陰地蕨屬,一年生草本藥用植物,多以孢子進(jìn)行繁殖,其生長(zhǎng)發(fā)育具有一定的代表性。在我國(guó)民間,尤其是貴州、福建等地常用的一種中草藥,因其具有清熱解毒、止咳、止血等功效,主要用于小兒高熱驚搐、肺熱咳嗽、咳血、百日咳、毒蛇咬傷、目赤火眼、目生翳障等(齊建紅,2012;趙俊華等,2008;阮君山,2002)。目前,對(duì)陰地蕨的研究較少,主要集中在化學(xué)成分、臨床及藥理作用、分類及分布調(diào)查等方面,分子生物學(xué)相關(guān)的信息較少,限制了更深入的研究。轉(zhuǎn)錄組(transcriptome)是指某一生理?xiàng)l件下,細(xì)胞內(nèi)所有轉(zhuǎn)錄產(chǎn)物的集合,包括信使RNA(mRNA)、核糖體RNA(rRNA)、轉(zhuǎn)運(yùn)RNA(tRNA)及非編碼RNA(none coding RNA)。隨著測(cè)序技術(shù)的發(fā)展和普及,轉(zhuǎn)錄組測(cè)序(RNA-seq)已經(jīng)成為從分子水平研究生物基因及其調(diào)控的重要方法。本研究通過高通量測(cè)序獲得陰地蕨全轉(zhuǎn)錄組,通過生物信息學(xué)方法對(duì)其進(jìn)行分析,得到陰地蕨轉(zhuǎn)錄組的整體注釋信息、篩選出植物激素信號(hào)轉(zhuǎn)導(dǎo)相關(guān)的潛在基因及其單核苷酸多態(tài)性(single nucleotide polymorphism, SNP)和短序列重復(fù)多態(tài)性(short sequence repeat polymorphism, SSR)等信息,為進(jìn)一步從分子水平開展陰地蕨生長(zhǎng)發(fā)育、品種鑒定等研究提供了有用的資源。
1 材料與方法
1.1 材料
新鮮、成熟陰地蕨植物全株3株(包含根、莖、葉及孢子,于2016年7月采于貴州省黔南州都勻市郊斗篷山地區(qū)(107°20′—107°27′ E、26°12′—26°16′ N,海拔約1 500 m),經(jīng)黔南醫(yī)學(xué)高等??茖W(xué)校王傳明副教授鑒定為陰地蕨(Botrychium ternatum)。樣品采集后,立即用清水沖洗干凈,吸水紙吸干后放入干冰盒中帶回,以備提取RNA。
1.2 cDNA文庫制備及測(cè)序
將植物全株用液氮研磨成粉末,用RNA提取試劑盒(艾德萊公司,北京)提取總RNA并將DNA消化,用帶有寡聚脫氧胸腺嘧啶(Oligo dT)的磁珠富集 mRNA,經(jīng)瓊脂糖電泳及微量核酸檢測(cè)儀NanoDrop檢測(cè)合格后用試劑盒依次合成cDNA、純化、修復(fù)粘性末端、在3′末端加上堿基“A”并連接接頭,然后進(jìn)行片段大小選擇,最后進(jìn)行 PCR 擴(kuò)增構(gòu)建cDNA文庫;構(gòu)建好的文庫經(jīng)檢驗(yàn)合格后上Illumina HiSeq 2500平臺(tái)進(jìn)行測(cè)序。
1.3 De novo(從頭) 組裝
將測(cè)序得到的原始序列(raw reads)去除低質(zhì)量、接頭污染以及未知堿基N含量過高的序列得到干凈序列(clean reads),使用Trinity軟件(v2.0.6)(Grabherr et al.,2011 )對(duì)clean reads進(jìn)行De novo組裝, 使用Tgicl軟件(v2.0.6)(Pertea et al., 2003)將組裝的轉(zhuǎn)錄本進(jìn)行聚類去冗余,得到單一基因(Unigene)用于后續(xù)分析。
1.4 Unigene功能注釋及分析
為了解Unigene的功能,用生信分析軟件將Unigene在七大功能數(shù)據(jù)庫中進(jìn)行注釋:用 Blast(v2.2.23) 進(jìn)行 NT、NR、COG、KEGG注釋;用SwissProt注釋;用 Blast2GO (v2.5.0)(Conesa et al., 2005) 以及NR注釋結(jié)果進(jìn)行GO注釋;用 InterProScan5 (v5.11-51.0)(Quevillon et al., 2005)進(jìn)行InterPro注釋。根據(jù)KEGG 信號(hào)途徑 map04075,將經(jīng)注釋的相關(guān)基因進(jìn)行歸類,即得植物激素信號(hào)轉(zhuǎn)導(dǎo)相關(guān)基因。
1.5 轉(zhuǎn)錄組結(jié)構(gòu)分析
1.5.1 編碼序列(coding sequences,CDS)預(yù)測(cè) 根據(jù)功能注釋結(jié)果,按照NR、SwissProt、KEGG、COG的數(shù)據(jù)庫優(yōu)先順序,挑選Unigene的最佳比對(duì)片段作為該Unigene的 CDS 。未能注釋上的Unigene使用預(yù)測(cè)得到的 CDS 作為模型進(jìn)行建模,然后使用ESTScan (v3.0.2) (Iseli et al., 1999)進(jìn)行CDS預(yù)測(cè)。
1.5.2 轉(zhuǎn)錄因子(transcript factor,TF)編碼能力預(yù)測(cè) 首先,用getorf(EMBOSS:6.5.7.0) (Rice et al., 2000) 檢測(cè)Unigene的開放閱讀框(open reading frame,ORF);然后,使用hmmsearch(v3.0) (Mistry et al., 2013) 將ORF比對(duì)到轉(zhuǎn)錄因子蛋白結(jié)構(gòu)域(數(shù)據(jù)來源于PlantTFDB);最后,根據(jù)PlantTFDB描述的轉(zhuǎn)錄因子家族特征對(duì)Unigene進(jìn)行TF編碼能力鑒定(Jin et al., 2017)。
1.5.3 SSR和SNP檢測(cè) 首先,用MISA (v1.0)(Thiel et al., 2003) 對(duì)Unigene進(jìn)行SSR檢測(cè);然后,用HISAT( v0.1.6-beta)(Kim et al., 2015) 把clean reads比對(duì)到Unigene;最后,使用GATK (v3.4-0)(McKenna et al., 2010) 檢測(cè) SNP。
2 結(jié)果與分析
2.1 測(cè)序及組裝結(jié)果
使用Illumina Hiseq平臺(tái)一共測(cè)得總原始序列(raw reads)數(shù)據(jù)量為55.52 Mb,過濾后得到干凈序列(clean reads)44.45 Mb, clean reads比率達(dá)到80.6%,測(cè)序深度屬“深度”(high deep,>15 Mb)測(cè)序。得到干凈總堿基數(shù) 6.67 Gb ,組裝后得到 58 646 個(gè)Unigene,平均長(zhǎng)度1 023 bp,N50、N70均大于1 000 bp (表1) 。所有Unigene的長(zhǎng)度均大于300 bp, 分布在300~400 bp的最多, 占25.5%,大于1 000 bp 的累計(jì)占39%(圖1),說明測(cè)序連續(xù)性和組裝效果較好。
2.2 Unigene功能注釋
將Unigene進(jìn)行七大功能數(shù)據(jù)庫注釋(NR、NT、GO、COG、KEGG、Swissprot 和 Interpro),注釋結(jié)果見表2。在NR(NCBI蛋白數(shù)據(jù)庫,NCBI protein database)中得到最多注釋(65.4%),總體注釋率為69.25%。根據(jù)NR注釋結(jié)果統(tǒng)計(jì)了注釋物種分布(圖2),在蕨類植物小立碗蘚(Physcomitrella patens)和江南卷柏(Selaginella moellendorffii)中共注釋了24%,跟陰地蕨蕨類植物屬性相符,另外在常用的參比物種北美云杉(Picea sitchensis)中注釋也較高(14.21%),可能跟北美云杉本身的注釋較好有關(guān)(Ralph et al., 2008)。NR、COG、KEGG、Swissprot以及Interpro的注釋結(jié)果展示在圖3,在五個(gè)數(shù)據(jù)庫中都注釋上的有12 522個(gè),占全部Unigene 的21.4%。
2.3 GO注釋結(jié)果
通過GO注釋將20 762個(gè)陰地蕨基因或基因產(chǎn)物賦予三大類術(shù)語(term): 分子功能、細(xì)胞組分和生物學(xué)過程,GO功能分布如圖3。在生物學(xué)過程中涉及基因數(shù)量處于前三位的分別是代謝過程(metabolic process)、細(xì)胞過程(cellular process)和單組織過程(single-organism process)。細(xì)胞組分中最多的是細(xì)胞(cell),最少的是核苷(nucleotide)。分子功能中數(shù)量最多的是催化活性(catalytic activity)和結(jié)合(binding),其次是轉(zhuǎn)運(yùn)活性(transport activity)。
2.4 COG功能注釋
通過與COG數(shù)據(jù)庫進(jìn)行比對(duì),將20 633個(gè)陰地蕨Unigene進(jìn)行COG注釋,結(jié)果如圖5。聚在一般功能(general function prediction only)的最多(4 559個(gè)),包含1 000~2 000個(gè)基因的簇有8個(gè),包括翻譯、核糖體結(jié)構(gòu)及生物發(fā)生及轉(zhuǎn)錄等重要的生命活動(dòng),值得注意的是鑒定了995個(gè)未知功能(function unknown)基因。
2.5 KEGG通路分析及植物激素信號(hào)轉(zhuǎn)導(dǎo)基因篩選
共有29 377條基因比對(duì)到六大類、21亞類代謝通路上(圖6),其中數(shù)量最多的是新陳代謝(metabolism)通路,有17 698個(gè)基因,占60%;最少的是與人類疾病相關(guān)的基因,共141個(gè)(陰地蕨屬植物);與有機(jī)系統(tǒng)(organismal system)環(huán)境適應(yīng)(environmental adaption)有關(guān)的基因有1 266個(gè)。根據(jù)KEGG 信號(hào)途徑 map04075,將經(jīng)注釋的相關(guān)基因進(jìn)行歸類,得植物激素信號(hào)轉(zhuǎn)導(dǎo)相關(guān)的候選基因(表3)。
2.6 轉(zhuǎn)錄組結(jié)構(gòu)
CDS:通過BLAST得到38 212個(gè)CDS,用ESTScan方法得到4 890個(gè)CDS, 共得到43 102個(gè)CDS,平均長(zhǎng)度749 bp, N50為1 137。
TF:共篩選到60個(gè)轉(zhuǎn)錄因子基因家族共1 502個(gè)TF基因,數(shù)量超過100的有C3H MYB 和MYB-related以及bHLH轉(zhuǎn)錄因子家族,其他較多的還有AP2-EREBP、WRKY、GRAS等轉(zhuǎn)錄因子。
SNP:共發(fā)現(xiàn)17 195 個(gè)SNP位點(diǎn),其中堿基轉(zhuǎn)換11 122個(gè),包括A-G 5 452個(gè)、C-T 5 670個(gè);顛換6 073個(gè),包括A-C 1 444個(gè)、A-T 1 729個(gè)、C-G 1 418個(gè)、G-T 1 482個(gè)。
SSR:最多的是二核苷酸重復(fù),有3 666個(gè);其次是三核苷酸重復(fù),3 439個(gè);接下來依次是單核苷酸重復(fù)(563個(gè))、六核苷酸重復(fù)(260個(gè))、四核苷酸重復(fù)(169個(gè))和五核苷酸重復(fù)(148個(gè))。
3 討論
GRABHERR MG, HAAS BJ, YASSOUR M, et al., 2011. Trinity: Reconstructing a full-length transcriptome without a genome from RNA-Seq Data [J]. Nat Biotechnol, 29(7): 644-652.
ISELI C, JONGENEEL CV, BUCHER P, 1999. ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequences [J]. Proc Int Conf Intell Syst Mol Biol, 99: 138-148.
JIN JP, TIAN F, YANG DC, et al., 2017. PlantTFDB 4.0: Toward a central hub for transcription factors and regulatory interactions in plants [J]. Nucl Acids Res, 45(D1): D1040-D1045.
KIM D, LANGMEAD B, SALZBERG SL, 2015. HISAT: A fast spliced aligner with low memory requirements [J]. Nat Methods,12(4): 357-360.
KUMARI S, PANIGRAHI KCS, 2019. Light and auxin signaling cross-talk programme root development in plants [J]. J Biosci, 44(1): 26.
MCKENNA A, HANNA M, BANKS E, et al., 2010. The genome analysis toolkit: A map reduce framework for analyzing next generation DNA sequencing data [J]. Genome Res, 20(9): 1297-1303.
MEENA KK, SORTY AM, BITLA UM, et al., 2017. Abiotic stress responses and microbe-mediated mitigation in plants: The omics strategies [J]. Front Plant Sci, 8: 172.
MISTRY J,F(xiàn)INN RD, EDDY SR, et al., 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions [J]. Nucl Acid Res, 41(12): e121.
MYBURG AA,HUSSEY SG,WANG JP, 2019. Systems and synthetic biology of forest trees: A bioengineering paradigm for woody biomass feedstocks [J]. Front Plant Sci, 10: 775.
OHRI P, BHARDWAJ R, BALI S, et al., 2015. The common molecular players in plant hormone crosstalk and signaling [J]. Curr Protein Pept Sci, 16(5): 369-388.
PERTEA G, HUANG X, LIANG F, et al., 2003. TIGR gene indices clustering tools (TGICL): A soft ware system for fast clustering of large EST datasets [J]. Bioinformatics, 19(5): 651-652.
QI JH, 2012. A summary of recent studies on Botrychium Sw [J]. J Xian Univ Arts Sci (Nat Sci Ed), 15(2):48-50. [齊建紅, 2012. 陰地蕨屬植物研究進(jìn)展 [J]. 西安文理學(xué)院學(xué)報(bào)自然科學(xué)版, 15(2):48-50.]
QUEVILLON E, SILVENTOINEN V, PILLAI S, et al., 2005. InterProScan: Protein domains identifier [J]. Nucl Acids Res, 33: 116-120.
RALPH SG, CHUN HJ, KOLOSOVA N, et al., 2008. A conifer genomics resource of 200 000 spruce (Picea spp.) ESTs and 6 464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis) [J]. BMC Genomics,9:484.
REN XY,WU MQ,CHEN JM, et al., 2016. The molecular mechanisms of phytochrome interacting factors (PIFs) in phytohormone signaling transduction [J]. J Plant Physiol, 52 (10): 1466-1473. [任小蕓, 吳美琴, 陳建民, 等, 2016. 光敏色素作用因子PIFs參與植物激素信號(hào)轉(zhuǎn)導(dǎo)的分子機(jī)制 [J]. 植物生理學(xué)報(bào), 52 (10): 1466-1473.]
RICE P, LONGDEN I, BLEASBY A, 2000. EMBOSS: The European molecular biology open software suite [J]. Trends Genet,16(6): 276-277.
RUAN JS, 2002. Research progress of Sceptridium ternatum and its effective ingredients [J]. J Chin Pharm Univ, 33: 328-329. [阮君山, 2002.小春花及其有效成分研究進(jìn)展 [J]. 中國(guó)藥科大學(xué)學(xué)報(bào), 33: 328-329.]
SU Q, AN D, WANG K, 2008. Phytohormone receptors and induced genes in plants [J]. Plant Physiol Mol Biol, 44(6): 1202-1208. [蘇謙, 安冬, 王庫, 2008. 植物激素的受體和誘導(dǎo)基因 [J]. 物生理學(xué)通訊, 44(6): 1202-1208.]
THIEL T, MICHALEK W, VARSHNEY RK, et al., 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) [J]. Theor Appl Genet, 106(3): 411-422.
YANG M, YOU W, WU S, et al., 2017. Global transcriptome analysis of Huperzia serrata and identification of critical genes involved in the biosynthesis of huperzine A [J]. BMC Genomics, 18: 245.
ZHANG KM, SHEN Y, LIU Y, et al., 2016. Research progress on development and physio-ecology of fern gametophytes [J]. Guihaia, 36(4): 419-424. [張開梅, 沈羽, 劉穎, 等, 2016. 蕨類植物配子體發(fā)育與生理生態(tài)研究進(jìn)展 [J]. 廣西植物, 36(4): 419-424.]
ZHAO JH, ZHAO NW,WANG PS, et al., 2008. Study on the species and distribution of Adiantum and Botrychiam medicinal plants from Tujia medicine of Guizhou Province origin [J]. J Med Pharm Chin Minor, 5: 44-46. [趙俊華, 趙能武, 王培善, 等, 2008. 土家藥黔產(chǎn)鐵線蕨、陰地蕨科藥用植物的種類和分布研究 [J]. 中國(guó)民族醫(yī)藥雜志, 5:44-46.]
(責(zé)任編輯 何永艷)