?
MicroRNA數(shù)字特征綜述
朱榮勝1,劉景璐2,陳慶山3
(1.東北農(nóng)業(yè)大學(xué)理學(xué)院,哈爾濱150030;2.東北農(nóng)業(yè)大學(xué)生命科學(xué)學(xué)院,哈爾濱150030;3.東北農(nóng)業(yè)大學(xué)農(nóng)學(xué)院,哈爾濱150030)
摘要:闡述MicroRNA(miRNA)數(shù)字特征在計(jì)算識(shí)別中的發(fā)展及應(yīng)用,總結(jié)相關(guān)數(shù)字特征在miRNA基因及靶基因預(yù)測(cè)中的作用。miRNA數(shù)字特征是計(jì)算識(shí)別物種新miRNA基因的工具,從首個(gè)miRNA基因被發(fā)現(xiàn)以來,數(shù)字特征研究由最初的miRNA序列特征和二級(jí)結(jié)構(gòu)特征到現(xiàn)在常用的熱力學(xué)特征、熵特征和拓?fù)涮卣鞯?,均被廣泛應(yīng)用于新miRNA基因鑒別和靶基因辨識(shí)。隨著新一代測(cè)序技術(shù)廣泛應(yīng)用,miRNA數(shù)字特征在miRNA生源論探討、基因和靶基因識(shí)別、生物合成途徑發(fā)掘等研究中具有重要作用。文章提出將數(shù)字特征應(yīng)用于靶基因功能的歸屬判別、生物進(jìn)化等研究領(lǐng)域,為miRNA數(shù)字特征在其他研究領(lǐng)域應(yīng)用提供參考和借鑒。
關(guān)鍵詞:MicroRNA;數(shù)字特征;計(jì)算識(shí)別;功能;進(jìn)化
網(wǎng)絡(luò)出版時(shí)間2015-1-27 16:00:26
[URL]http://www.cnki.net/kcms/detail/23.1391.S.20150127.1600.007.html
朱榮勝,劉景璐,陳慶山. MicroRNA數(shù)字特征綜述[J].東北農(nóng)業(yè)大學(xué)學(xué)報(bào), 2015, 46(2): 100-108.
MicroRNA是在物種間高度保守的、內(nèi)源性的、非編碼小分子RNA,成體(Mature)約21~24 nt,是重要的轉(zhuǎn)錄后調(diào)控因子??膳c靶基因特異性互補(bǔ)結(jié)合,以抑制或裂解方式調(diào)控[1]。Bentwich等研究表明,在胚胎發(fā)育、植物生長(zhǎng)、抗病和癌癥發(fā)生等生物過程中起到關(guān)鍵調(diào)控作用[15]。Vazquez等研究顯示在人類基因組中有35%的基因由miRNA調(diào)控[6-7]。1993年第一個(gè)miRNA基因lin-4在線蟲中被發(fā)現(xiàn)[8],2000年Ruvkun實(shí)驗(yàn)室揭示擁有21個(gè)核苷酸的let-7序列在4億年前由雙側(cè)對(duì)稱動(dòng)物祖先起源的生物中完全保守[9]。標(biāo)志現(xiàn)代miRNA生物學(xué)誕生,多個(gè)對(duì)動(dòng)、植物物種中miRNA的測(cè)序計(jì)劃被啟動(dòng),從miRNA發(fā)現(xiàn)、試驗(yàn)驗(yàn)證、功能研究到計(jì)算識(shí)別miRNA工作出現(xiàn)[10, 13]。隨著對(duì)miRNA生源論、調(diào)控機(jī)制研究深入,借助高通量測(cè)序技術(shù)發(fā)展更多新物種miRNA被測(cè)序和鑒定。miRNA起源和進(jìn)化成為研究熱點(diǎn)。計(jì)算生物學(xué)為研究人員提供了新視角,miRNA數(shù)字特征研究廣泛應(yīng)用于miRNA計(jì)算識(shí)別領(lǐng)域中。
最早miRNA發(fā)現(xiàn)主要有直接克隆測(cè)序和計(jì)算預(yù)測(cè)兩種方法。cDNA直接克隆測(cè)序方法直接、準(zhǔn)確,但也有局限性,對(duì)在不同時(shí)期表達(dá)或特異組織表達(dá)miRNA難克隆,對(duì)miRNA表達(dá)豐度要求很高[14-15]。2003年計(jì)算機(jī)鑒定miRNA基因方法最先被用于對(duì)線蟲與果蠅miRNA基因研究中[16-17],隨后在植物、人類及病毒中都有應(yīng)用計(jì)算機(jī)預(yù)測(cè)miRNA報(bào)道。
計(jì)算機(jī)預(yù)測(cè)miRNA基因優(yōu)點(diǎn)是不受miRNA表達(dá)時(shí)間與組織特異性及表達(dá)水平影響,彌補(bǔ)cDNA克隆測(cè)序不足[18]。總結(jié)預(yù)測(cè)miRNA基因的計(jì)算方法發(fā)現(xiàn)(見表1),無論是早期基于同源片段搜索方法、基于序列和結(jié)構(gòu)特征打分預(yù)測(cè)方法、基于比較基因組學(xué)預(yù)測(cè)方法、基于結(jié)合作用靶標(biāo)預(yù)測(cè)方法還是基于機(jī)器學(xué)習(xí)預(yù)測(cè)方法,都利用miRNA的發(fā)夾結(jié)構(gòu)前體(pre-miRNA)數(shù)字特征識(shí)別新的miRNA基因。直到新一代測(cè)序(Next-generation sequencing,NGS)技術(shù)成熟,miRNA生源論與生物合成過程中數(shù)字特征應(yīng)用到預(yù)測(cè)新miRNA基因過程中。伴隨著技術(shù)革新miRNA數(shù)字特征也從簡(jiǎn)單的序列特征、結(jié)構(gòu)特征、能量及熵特征、拓?fù)涮卣骱捅磉_(dá)譜特征等不斷地到更新。進(jìn)一步說明miRNA數(shù)字特征在計(jì)算識(shí)別研究領(lǐng)域的重要性和廣泛性。本文綜述數(shù)字特征在計(jì)算識(shí)別miRNA基因及其靶基因過程中的應(yīng)用。
數(shù)字特征作為反映miRNA序列和結(jié)構(gòu)的最本質(zhì)特征,廣泛應(yīng)用于計(jì)算識(shí)別miRNA基因領(lǐng)域,是數(shù)字特征預(yù)測(cè)依據(jù)和基礎(chǔ)。圖1中僅列出部分miRNA二級(jí)結(jié)構(gòu)特征[19]。結(jié)合現(xiàn)有研究結(jié)果將數(shù)字特征在miRNA的計(jì)算識(shí)別中按特性大致分為以下6類:
A類特征為序列特征,如單堿基含量(A%、U%、G%、C%)、雙堿基含量(AA%)、三堿基含量(AAA%)、堿基含量和堿基比率,如(G+C)%,G/C,(G+C)/(A+U)等。
B類特征為二級(jí)結(jié)構(gòu)特征,microRNA二級(jí)結(jié)構(gòu)特征包括長(zhǎng)度(length)、螺旋(Helix)、匹配(Stack,包括G-U匹配)凸起環(huán)(Bulge loop)、內(nèi)部環(huán)(Interior loop)、發(fā)夾環(huán)(Hairpin loop)、外部環(huán)(External loop)、二級(jí)結(jié)構(gòu)三元體特征(見圖1,A+++、U+.+、G++.)[20]等直接計(jì)數(shù)特征。
C類特征為熱力學(xué)及熵特征,miRNA的熱力學(xué)穩(wěn)定性區(qū)別于其他RNA(mRNA、tRNA、rRNA)早已證明,并被廣泛應(yīng)用于miRNA計(jì)算識(shí)別判斷二級(jí)結(jié)構(gòu)穩(wěn)定性。如最小自由能(MFE)、MFEI(最小自由能指數(shù))[21]、AMFE(修飾最小自由能)、信息熵等。
圖1 MicroRNA二級(jí)結(jié)構(gòu)及部分?jǐn)?shù)字特征[19]Fig. 1 MicroRNA secondary structure and partial schematic diagram of the numerical features
D類特征為拓?fù)涮卣?,是?yīng)用統(tǒng)計(jì)方法將數(shù)字特征用集合的形式定義新的數(shù)字特征[22]。
E類特征為生源論、靶標(biāo)結(jié)合位點(diǎn)等生物過程中出現(xiàn)的源基因和靶基因匹配狀態(tài)特征。
F類特征為表達(dá)譜特征,是miRNA與mRNA在表達(dá)譜數(shù)據(jù)中呈現(xiàn)負(fù)相關(guān)表達(dá)的數(shù)據(jù)特征。
通常用最小自由能(MFE)來描述細(xì)胞中miRNA二級(jí)結(jié)構(gòu)穩(wěn)定性,而成體上酶切位點(diǎn)與對(duì)應(yīng)酶的親和性與RISC(轉(zhuǎn)錄沉默復(fù)合物)結(jié)合能力直接或間接地通過miRNA其堿基分布特征進(jìn)行解釋。如突起(Bugle)數(shù)量,莖環(huán)(Loop)數(shù)量,莖環(huán)上的堿基匹配(Stack)數(shù)量,保守性等均可對(duì)其進(jìn)行描述。這種在miRNA生源論研究中發(fā)現(xiàn)的數(shù)字特征被應(yīng)用到miRNA基因辯識(shí)中。
2.1基于同源比對(duì)方法預(yù)測(cè)miRNA
早期應(yīng)用計(jì)算識(shí)別方法預(yù)測(cè)miRNA基因一般僅關(guān)注小RNA的二級(jí)結(jié)構(gòu),通過尋找相關(guān)物種之間保守的發(fā)夾結(jié)構(gòu)來確定候選miRNA基因,如MiRscan、miRseeker等預(yù)測(cè)工具。
表1 不同microRNA預(yù)測(cè)軟件中數(shù)字特征的應(yīng)用Table 1 Different microRNA prediction software application for numerical features
MiRscan和miRseeker是兩個(gè)高效預(yù)測(cè)工具,兩個(gè)預(yù)測(cè)工具可搜索保守基因間區(qū)域的序列,應(yīng)用RNAfold[37](用于MiRscan)、MFOLD[38](用于miRseeker),篩選可形成發(fā)夾結(jié)構(gòu)候選miRNA序列。MiRscan應(yīng)用已知miRNA保守序列特征,如莖干區(qū)域保守的第30~50個(gè)堿基等。而miRseeker選擇的是保守的核苷酸分歧模式與文獻(xiàn)[39]中設(shè)置相似。MiRscan首次被應(yīng)用到線蟲中識(shí)別新的miRNA基因,而miRseeker則首次被應(yīng)用在果蠅中預(yù)測(cè)miRNA基因,對(duì)預(yù)測(cè)的候選miRNA進(jìn)行試驗(yàn)驗(yàn)證。
基于miRNA序列和二級(jí)結(jié)構(gòu)相似性特性所提出的同源比對(duì)方法應(yīng)用較早,可用于親緣關(guān)系較近的物種之間保守miRNA預(yù)測(cè)。但對(duì)于親緣關(guān)系較遠(yuǎn)物種并不適用,因?yàn)檫h(yuǎn)源物種基因組同源性較低。與應(yīng)用比較基因組學(xué)預(yù)測(cè)miRNA不足是不能對(duì)具有物種特異性miRNA進(jìn)行預(yù)測(cè)。
2.2基于機(jī)器學(xué)習(xí)方法預(yù)測(cè)miRNA
基于機(jī)器學(xué)習(xí)的計(jì)算機(jī)識(shí)別miRNA方法在新一代測(cè)序技術(shù)出現(xiàn)之前傳播廣泛,以支持向量機(jī)(Support vector machine)、隱馬爾科夫模型(Hidden Markov model)、隨機(jī)森林(Random forest)、樸素貝葉斯分類器(Naive bayes classifier)等為代表。方法歸類為序列的保守性特征、拓?fù)涮卣?、熱力學(xué)特征、信息熵等特征并與上述算法相結(jié)合。即使不依賴序列保守性的機(jī)器學(xué)習(xí)預(yù)測(cè)算法方法已被開發(fā),但仍有研究選擇將其作為標(biāo)準(zhǔn)之一。這些方法使用一組已知miRNA(陽(yáng)性訓(xùn)練集)與一組不包含miRNA發(fā)夾結(jié)構(gòu)序列,例如一些mRNA、tRNA基因、核糖體rRNA(陰性訓(xùn)練集),根據(jù)不同的屬性,區(qū)分真與假,構(gòu)建miRNA基因預(yù)測(cè)器[40]。MiRNA數(shù)字特征可作為區(qū)分miRNA與其他RNA (rRNA,tRNA)量度,如成體長(zhǎng)度21~24 nt、最小自由能MFE(Minimum free energy)、最小自由能指數(shù)MFEI(Minimum fold energy index)等被用來評(píng)估新miRNA前體。研究表明,植物miRNA前體的MFEI要高于其他RNA(tRNA=0.64,rRNA=0.59 and mRNA=0.65)[21]。二級(jí)結(jié)構(gòu)能量特征成為計(jì)算識(shí)別miRNA前體重要特征,隨著Vienna RNA package[41]、Mfold、RNAfold等軟件包或在線程序出現(xiàn),多數(shù)預(yù)測(cè)工具使用其計(jì)算二級(jí)結(jié)構(gòu)的能量特征。
Pfeffer等第一次介紹SVM方法鑒別miRNA[25],使用簡(jiǎn)單的特征:折疊自由能,對(duì)稱莖上最長(zhǎng)長(zhǎng)度,對(duì)稱莖上A,C,G和U核苷酸含量,被預(yù)測(cè)的擁有最小自由能的二級(jí)結(jié)構(gòu)上AU,GC和GU的配對(duì)數(shù)。訓(xùn)練結(jié)束后,獲得一個(gè)有71%真陽(yáng)性且只有3%誤報(bào)miRNA分類模型。
Ng和Mishra提出相對(duì)于單堿基頻率較為高級(jí)的二核苷酸頻率(Dinucleotide frequencies %)特征[42],為識(shí)別miRNA提供更好的分辨率和計(jì)算處理。又結(jié)合其他16個(gè)基本組成變量包括(G+C)%,6種折疊測(cè)度(dP、dD、dQ、MFE、MFEI1、MFEI2)和其余拓?fù)涮卣?。?yīng)用高斯徑向基函數(shù)(Gaussian Radial Basis Function kernel,RBF)以200個(gè)人類per-miRNA正樣本與400個(gè)負(fù)樣本作為支持向量機(jī)的訓(xùn)練樣本,應(yīng)用40個(gè)非人類物種的1 918個(gè)permiRNA為正樣本和3 836個(gè)per-miRNA負(fù)樣本進(jìn)行驗(yàn)證,綜合結(jié)果得到92.08%敏感度,97.42%特異性,95.64%準(zhǔn)確性。
Xue等進(jìn)行改進(jìn)提出臨近結(jié)構(gòu)-序列的三元體數(shù)字特征,對(duì)miRNA發(fā)卡結(jié)構(gòu)匹配形式進(jìn)行描述[20]。任意位置上都能出現(xiàn)匹配(“(”)與不匹配(“.”)兩種情況,對(duì)任意3個(gè)連續(xù)堿基共有32種情況(如U“(((”,同圖1中“A+++”)并結(jié)合支持向量機(jī)成功對(duì)人類數(shù)據(jù)進(jìn)行分析,正確鑒別miRNA前體接近90%,在未利用比較基因組學(xué)信息情況下對(duì)其他物種包括植物、病毒鑒定精確度達(dá)到90%以上。
MiRFinder比較相關(guān)物種間序列,從一組候選的發(fā)夾結(jié)構(gòu)中識(shí)別miRNA。利用18個(gè)不同數(shù)字特征,如最小自由能,假定堿基配對(duì)的成體miRNA,與可能的二級(jí)結(jié)構(gòu)元件的頻率等。由于有大量序列可形成類似miRNA前體發(fā)夾結(jié)構(gòu),該方法包括隨機(jī)化測(cè)試以評(píng)估預(yù)測(cè)結(jié)構(gòu),顯著降低候選miRNA數(shù)目統(tǒng)計(jì)學(xué)意義。但可能無法檢測(cè)到物種特異性的前體miRNA。
SVM最大問題在于樣本分類不平衡,由于單一物種中已知miRNA數(shù)量相對(duì)miRNA總數(shù)較少。SVM通常只能應(yīng)用物種中少數(shù)被驗(yàn)證的miRNA作為正向子集進(jìn)行訓(xùn)練,而應(yīng)用不平衡數(shù)據(jù)集進(jìn)行分類訓(xùn)練,對(duì)正向和負(fù)向訓(xùn)練數(shù)據(jù)集的分類性能有很大影響。關(guān)鍵是要選擇適當(dāng)有代表性的樣本子集訓(xùn)練分類器,這一做法有助于提高分類精度。
2.3 miRNA的同源性/二級(jí)結(jié)構(gòu)比對(duì)
MiRAlign是一個(gè)基于miRNA的結(jié)構(gòu)和序列同源性水平的預(yù)測(cè)工具,是比對(duì)方法。miRAlign可使用較寬松的序列保守性要求,揭示遠(yuǎn)源基于結(jié)構(gòu)miRNA同源物。由于具有較高靈敏度,比相似性預(yù)測(cè)工具ERPIN提供更好預(yù)測(cè)結(jié)果。miRAlign潛在缺點(diǎn)是不能夠檢測(cè)到未知結(jié)構(gòu)中同源新miRNA。
2.4基于新一代測(cè)序技術(shù)預(yù)測(cè)miRNA
近年來,隨著生物信息學(xué)和新一代測(cè)序技術(shù)發(fā)展,使大量miRNA被預(yù)測(cè)。
MiRanalyzer使用已知的miRNA作為訓(xùn)練集,并采用機(jī)器學(xué)習(xí)方法預(yù)測(cè)新miRNA。該程序可檢測(cè)到已在miRBase[43]數(shù)據(jù)庫(kù)中注釋過的miRNA,隨后刪除已知的匹配序列并讓其無錯(cuò)配。對(duì)于數(shù)據(jù)來說其包含轉(zhuǎn)錄組數(shù)據(jù),如基因和其他非編碼RNA序列的匹配數(shù)據(jù)。最后從篩選后序列中預(yù)測(cè)新的miRNA,特別是應(yīng)用在線蟲、大鼠和人類的數(shù)據(jù)中可得到較好的分類結(jié)果。
PIPmiR方法則首次嘗試將樸素貝葉斯分類器引入到基于高通量測(cè)序數(shù)據(jù)的植物miRNA預(yù)測(cè)中。第一階段先構(gòu)建的貝葉斯分類器采用了包含GC%含量、自由能、堿基配對(duì)在內(nèi)的15個(gè)數(shù)字特征,并對(duì)連續(xù)特征做離散化處理。第二階段PIPmiR使用讀段(Read)數(shù)據(jù),結(jié)合基因組特征,特異性地識(shí)別和區(qū)分出現(xiàn)在序列文庫(kù)中的miRNA和其他小分子RNA。這些數(shù)字特征是基于已知的miRNA生物合成途徑、加工過程和miRNA與其靶標(biāo)相互作用時(shí)的特征。
MiRTRAP、mirTools、miRCat、MIReNA等高通量miRNA預(yù)測(cè)工具應(yīng)用表達(dá)譜數(shù)據(jù)與生物信息學(xué)方法相結(jié)合,通過將讀段比對(duì)到參考基因組上可以更準(zhǔn)確得到miRNA成體,并根據(jù)少數(shù)miRNA生物合成、加工過程中的數(shù)字特征對(duì)候選miRNA篩選。提高miRNA預(yù)測(cè)準(zhǔn)確性。高通量測(cè)序技術(shù)由于存在測(cè)序錯(cuò)誤,如何過濾低質(zhì)量測(cè)序數(shù)據(jù)和避免去除接頭時(shí)產(chǎn)生的錯(cuò)誤序列導(dǎo)致在比對(duì)參考基因組時(shí)的模糊性等問題需進(jìn)一步探討,這些問題會(huì)對(duì)miRNA預(yù)測(cè)結(jié)果產(chǎn)生影響。
3.1動(dòng)物靶基因預(yù)測(cè)
動(dòng)物中miRNA與其靶基因部分互補(bǔ)配對(duì),導(dǎo)致每個(gè)miRNA基因可能會(huì)錨定數(shù)百個(gè)潛在靶標(biāo)。Griffths等研究將動(dòng)物miRNA靶向特性分成miRNA-mRNA配對(duì)特性、作用位點(diǎn)特性、保守性、位點(diǎn)的可達(dá)性、多結(jié)合位點(diǎn)特性和表達(dá)譜特性等六類[44],按與microRNA數(shù)字特征分類的相關(guān)程度(通常這種關(guān)聯(lián)是間接的),歸為數(shù)字特征各類,具體為miRNA-mRNA配對(duì)特性、作用位點(diǎn)特性、保守性和多結(jié)合位點(diǎn)特性等歸為E類;位點(diǎn)可達(dá)性歸為B類;表達(dá)譜特性歸為F類,類別劃分與第1節(jié)相同。本文運(yùn)用常用動(dòng)物miRNA靶基因預(yù)測(cè)工具,依據(jù)預(yù)測(cè)分析方法與某些數(shù)字特征是否有關(guān)聯(lián),進(jìn)行類別標(biāo)注,結(jié)果見表2。
表2 MiRNA靶基因預(yù)測(cè)工具Table 2 MiRNA target prediction tool
注:物種:蠅(f),蟲(w),人類(h),鼠標(biāo)(m),鼠(r),雞(c),斑馬魚(z),狗(d)。特征:“A”序列特征,“B”二級(jí)結(jié)構(gòu)特征,“C”熱力學(xué)及熵特征,“D”拓?fù)涮卣鳎癊”生源論、基因組位置、靶標(biāo)結(jié)合位點(diǎn)等生物過程中的特征,“F”表達(dá)譜。
Note: Species: fly (f), worm (w), human (h), mouse (m), rat (r), chicken (c), zebrafish (z), dog (d). Features: "A" Sequence features, "B" Secondary structure features, "C" Thermodynamics and entropy features, "D" Topological features, "E" Biogenesis, genomic location, target binding sites and other biological characteristics of the process, "F" Expression profiling.
由表2中靶基因預(yù)測(cè)工具使用的數(shù)字特征分析發(fā)現(xiàn),應(yīng)用最多是位點(diǎn)的可達(dá)性(B類特征)及miRNA-mRNA相互作用時(shí)生物過程中的特征(E類特征);而mirWIP、HOCTAR、TargetMiner預(yù)測(cè)工具中除了使用B、E特征外使用表達(dá)譜數(shù)據(jù)(F類特征)作為預(yù)測(cè)靶基因的特征;只有MirTif和EIMMo兩個(gè)預(yù)測(cè)工具使用E特征;GenMiR++中使用E類和F類特征。說明在動(dòng)物miRNA靶基因預(yù)測(cè)過程中生源論、miRNA-mRNA相互作用等生物過程中特征是靶基因預(yù)測(cè)關(guān)鍵特征。
3.2植物靶基因預(yù)測(cè)
相比較動(dòng)物靶基因預(yù)測(cè)而言,植物靶基因預(yù)測(cè)相對(duì)簡(jiǎn)單。因?yàn)橹参镏衜iRNA與其靶基因幾乎完全互補(bǔ),而這區(qū)別于動(dòng)物miRNA種子區(qū)與mRNA的互補(bǔ)方式。成熟miRNA約21~24 nt,在其與靶基因mRNA作用時(shí)需和RNA介導(dǎo)沉默復(fù)合體(RNA-induced silencing complex,RISC)結(jié)合并行使功能。而Argonaute(AGO)蛋白則是RISC中重要組成部分負(fù)責(zé)與miRNA結(jié)合,成員可特異性地與不同長(zhǎng)度、不同首堿基的小RNA精確結(jié)合,確保miRNA正確地行使功能。Xue等研究表明,擬南芥中共有10 個(gè)AGO蛋白,行使不同的功能。如AGO1、AGO2主要與21 nt的小RNA結(jié)合,AGO4主要與24 nt的小RNA結(jié)合,而AGO5可同時(shí)與21、22、24 nt的三種長(zhǎng)度miRNA結(jié)合[20]。大多數(shù)miRNA第一個(gè)堿基是尿嘧啶(U),AGO1主要識(shí)別這種堿基,而AGO2和AGO4多與首堿基為胞嘧啶(A)小RNA結(jié)合。這種堿基結(jié)合的偏好使得AGO蛋白在識(shí)別miRNA-miRNA*雙聯(lián)體時(shí)更精確??梢?,堿基含量和成體堿基分布等數(shù)字特征在植物靶向結(jié)合過程中具有重要作用。
在miRNA靶向結(jié)合過程中,植物miRNA與靶基因序列是完美配對(duì),多數(shù)表現(xiàn)為對(duì)靶基因mRNA降解,少部分為翻譯抑制[6]。而在動(dòng)物中,miRNA與靶基因mRNA通常被認(rèn)為是不完全互補(bǔ)并起抑制作用。植物中miRNA與靶基因的互補(bǔ)程度要高于動(dòng)物,除miR319a與其靶基因TCP基因互補(bǔ)序列中有5個(gè)錯(cuò)配外[7],大部分植物miRNA與靶基因間只存在4個(gè)以下的錯(cuò)配。更明顯地表現(xiàn)在動(dòng)、植物miRNA在堿基組成和堿基分布上存在巨大數(shù)值差異,這與動(dòng)、植物miRNA的發(fā)生、進(jìn)化選擇和功能差異直接相關(guān)[76]。
在miRNA靶基因計(jì)算識(shí)別中直接應(yīng)用數(shù)字特征較少,相對(duì)于miRNA基因計(jì)算識(shí)別靶基因預(yù)測(cè)更加注重miRNA-mRNA結(jié)合生物過程中的特性。但microRNA數(shù)字特征依然在靶基因的辨識(shí)中發(fā)揮重要作用,如熱力學(xué)特性和堿基含量特性等,決定miRNA對(duì)靶基因結(jié)合能力、結(jié)合時(shí)期、結(jié)合地點(diǎn)和結(jié)合方式等。隨著microRNA和mRNA數(shù)字編碼技術(shù)發(fā)展和完善,更快捷數(shù)字化靶基因識(shí)別會(huì)成為靶基因識(shí)別技術(shù)突破點(diǎn)之一。
隨著技術(shù)手段革新和新一代測(cè)序技術(shù)快速發(fā)展,大量物種基因組數(shù)據(jù)變得清晰,更多的miRNA將被鑒定。數(shù)字特征發(fā)展隨miRNA研究而不斷深入,數(shù)字特征不僅是計(jì)算識(shí)別miRNA基因重要組成部分,在其他研究領(lǐng)域有相關(guān)應(yīng)用。數(shù)字特征是描述miRNA序列、結(jié)構(gòu)、能量等數(shù)字化符號(hào),而序列決定結(jié)構(gòu),影響功能相互關(guān)系。數(shù)字特征是否能作為miRNA基因調(diào)控靶基因功能分類標(biāo)準(zhǔn),對(duì)擁有某種數(shù)字特征類型的miRNA行使的功能進(jìn)行歸屬判別。研究顯示,一部分miRNA呈明顯譜系特異性和物種特異性,而另一部分在物種間呈現(xiàn)出保守性,基于這種保守性應(yīng)用數(shù)字特征是否可對(duì)物種進(jìn)行分類,從數(shù)字特征角度出發(fā)研究物種進(jìn)化問題。這些設(shè)想可為miRNA數(shù)字特征在其他領(lǐng)域的應(yīng)用提出新思路。
[參考文獻(xiàn)]
[1]Bartel D P. MicroRNAs: genomics, biogenesis, mechanism, and function[J]. Cell, 2004, 116(2): 281-297.
[2]Ambros V. The functions of animal microRNAs[J]. Nature, 2004, 431(7006): 350-355.
[3]Croce C M, Calin G A. miRNAs, cancer, and stem cell division[J]. Cell, 2005, 122(1): 6-7.
[4]Zamore P D, Haley B. Ribo-gnome: the big world of small RNAs [J]. Science, 2005, 309(5740): 1519-1524.
[5]Bartel D P. MicroRNAs: target recognition and regulatory functions [J]. Cell, 2009, 136(2): 215-233.
[6]Qi Y, He X, Wang X J, et al. Distinct catalytic and non-catalytic roles of ARGONAUTE4 in RNA-directed DNA methylation[J]. Nature 2006, 443(7114):1008-1012.
[7] Vazquez F, Vaucheret H, Rajagopalan R, et al. Endogenoustrans-acting siRNAs regulate the accumulation of Arabidopsis mRNAs[J]. Molecular Cell, 2004, 16(1): 69-79.
[8]Lee R C, Feinbaum R L, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14[J]. Cell, 1993, 75(5): 843-854.
[9]Pasquinelli A E, Reinhart B J, Slack F, et al. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA[J]. Nature, 2000, 408(6808): 86-89.
[10]Pantaleo V, Szittya G, Moxon S, et al. Identification of grapevine microRNAs and their targets using high-throughput sequencing and degradome analysis[J]. Plant J, 2010, 62(6): 960-976.
[11]Frazier T P, Zhang B. Identification of plant microRNAs using expressed sequence tag analysis[J]. Methods in Molecular Biology, 2011, 678: 13-25.
[12]Thiebaut F, Grativol C, Carnavale-Bottino M, et al. Computational identification and analysis of novel sugarcane microRNAs [J]. BMC Genomics, 2012, 13: 290.
[13]Hu J, Zhang H, Ding Y. Identification of conserved microRNAs and their targets in the model legume Lotus japonicus[J]. J Biotechnol, 2013.
[14]Berezikov E, Cuppen E, Plasterk R H. Approaches to microRNA discovery[J]. Nature Genetics, 2006, 38(Suppl): S2-7.
[15]Bentwich I. Prediction and validation of microRNAs and their targets[J]. FEBS Letters, 2005, 579(26): 5904-5910.
[16]Ambros V, Lee R C, Lavanway A, et al. MicroRNAs and other tiny endogenous RNAs in C. elegans[J]. Current Biology: CB 2003, 13(10): 807-818.
[17]Grad Y, Aach J, Hayes G D, et al. Computational and experimental identification of C. elegans microRNAs[J]. Molecular Cell, 2003, 11(5): 1253-1263.
[18]Kim V N, Nam J W. Genomics of microRNA[J]. Trends in genetics: TIG, 2006, 22(3): 165-173.
[19]Zhu R, Li X, Chen Q. Discovering numerical laws of plant microRNA by evolution[J]. Biochemical and Biophysical Research Communications, 2011, 415(2): 313-318.
[20]Xue C H, Li F, He T, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine[J]. BMC Bioinformatics, 2005: 6.
[21]Bonnet E, Wuyts J, Rouze P, et al. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences[J]. Bioinformatics, 2004, 20 (17): 2911-2917.
[22]Huang T H, Fan B, Rothschild M F, et al. MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans[J]. BMC Bioinformatics, 2007(8): 341.
[23]Xuan P, Guo M, Liu X, et al. PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs[J]. Bioinformatics, 2011, 27(10): 1368-1376.
[24]Xu Y, Zhou X, Zhang W. MicroRNA prediction with a novel ranking algorithm based on random walks[J]. Bioinformatics, 2008, 24(13): 50-58.
[25]Alexander R P, Fang G, Rozowsky J, et al. Annotating non-coding regions of the genome[J]. Nature reviews Genetics, 2010, 11(8): 559-571.
[26]Wang X, Zhang J, Li F, et al. MicroRNA identification based on sequence and structure alignment[J]. Bioinformatics, 2005, 21(18): 3610-3614.
[27]Huang T H, Fan B, Rothschild M F, et al. MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans[J]. BMC Bioinformatics, 2007(8): 341.
[28]Xue C H, Li F, He T, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine[J]. BMC Bioinformatics, 2005.
[29]Jiang P, Wu H, Wang W, et al. MiPred: Classification of real and pseudo microRNA precursors using random forest prediction model with combined features[J]. Nucleic Acids Research, 2007, 35(Web Server issue): 339-344.
[30]Sheng Y, Engstrom P G, Lenhard B. Mammalian microRNA prediction through a support vector machine model of sequence and structure[J]. PloS One, 2007, 2(9): 946.
[31]Xuan P, Guo M, Liu X, et al. PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs[J]. Bioinformatics, 2011, 27(10): 1368-1376.
[32]Moxon S, Schwach F, Dalmay T, et al. A toolkit for analysing large-scale plant small RNA datasets[J]. Bioinformatics, 2008, 24 (19): 2252-2253.
[33]Hackenberg M, Sturm M, Langenberger D, et al. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments[J]. Nucleic Acids Research, 2009, 37: W68-76.
[34]Wang W C, Lin F M, Chang W C, et al. miRExpress: analyzing high-throughput sequencing data for profiling microRNAexpression[J]. BMC Bioinformatics, 2009, 10: 328.
[35]Mathelier A, Carbone A. MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data[J]. Bioinformatics, 2010, 26(18): 2226-2234.
[36]Zhu E, Zhao F, Xu G, et al. MirTools: microRNA profiling and discovery based on high-throughput sequencing[J]. Nucleic Acids Research, 2010, 38: 392-397.
[37]Mathews D H, Sabina J, Zuker M, et al. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure[J]. Journal of Molecular Biology, 1999, 288(5): 911-940.
[38]Zuker M. Mfold web server for nucleic acid folding and hybridization prediction[J]. Nucleic Acids Research, 2003, 31(13): 3406-3415.
[39]Terai G, Komori T, Asai K, et al. miRRim: A novel system to find conserved miRNAs with high sensitivity and specificity[J]. Rna, 2007, 13(12): 2081-2090.
[40]Mendes N D, Freitas A T, Sagot M F. Current tools for the identification of miRNA genes and their targets[J]. Nucleic Acids Research, 2009, 37(8): 2419-2433.
[41]Hofacker I L. Vienna RNA secondary structure server[J]. Nucleic Acids Research, 2003, 31(13): 3429-3431.
[42]Pasquinelli A E, Reinhart B J, Slack F, et al. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA[J]. Nature, 2000, 408(6808): 86-89.
[43]Griffiths-Jones S, Saini HK, van Dongen S, et al: miRBase: tools for microRNA genomics[J]. Nucleic Acids Research, 2008, 36: 154-158.
[44]Saito T, Saetrom P. MicroRNAs--targeting and target prediction [J]. New Biotechnology, 2010, 27(3): 243-249.
[45]Chen K, Rajewsky N. Natural selection on human microRNA binding sites inferred from SNP data[J]. Nature Genetics, 2006, 38 (12): 1452-1456.
[46]Grun D, Wang Y L, Langenberger D, et al. microRNA target predictions across seven Drosophila species and comparison to mammalian targets[J]. PLoS Computational Biology, 2005, 1(1): e13.
[47]Krek A, Grun D, Poy M N, et al. Combinatorial microRNA target predictions[J]. Nature Genetics, 2005, 37(5): 495-500.
[48]Lall S, Grun D, Krek A, et al. A genome-wide map of conserved microRNA targets in C. elegans[J]. Current Biology: CB, 2006, 16 (5): 460-471.
[49]Enright A J, John B, Gaul U, et al. MicroRNA targets in Drosophila[J]. Genome Biology, 2003, 5(1): 1.
[50]John B, Enright AJ, Aravin A, et al. Human MicroRNA targets[J]. PLoS Biology, 2004, 2(11): 363.
[51]Betel D, Koppal A, Agius P, et al. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites[J]. Genome Biology, 2010, 11(8): 90.
[52]Betel D, Wilson M, Gabow A, et al. The microRNA.org resource: targets and expression[J]. Nucleic Acids Research, 2008, 36: 149-153.
[53]Griffiths-Jones S, Saini H K, van Dongen S, et al. miRBase: tools for microRNA genomics[J]. Nucleic Acids Research, 2008, 36: 154-158.
[54]Griffiths-Jones S, Grocock R J, van Dongen S, et al. miRBase: microRNA sequences, targets and gene nomenclature[J]. Nucleic Acids Research, 2006, 34: 140-144.
[55]Kruger J, Rehmsmeier M. RNAhybrid: microRNA target prediction easy, fast and flexible[J]. Nucleic Acids Research, 2006, 34: 451-454.
[56]Rehmsmeier M, Steffen P, Hochsmann M, et al. Fast and effective prediction of microRNA/target duplexes[J]. Rna, 2004, 10(10): 1507-1517.
[57]Kertesz M, Iovino N, Unnerstall U, et al. The role of site accessibility in microRNA target recognition[J]. Nature Genetics, 2007, 39(10): 1278-1284.
[58]Hammell M, Long D, Zhang L, et al. mirWIP: microRNA target prediction based on microRNA-containing ribonucleoproteinenriched transcripts[J]. Nature Methods, 2008, 5(9): 813-819.
[59]Thadani R, Tammi M T. MicroTar: predicting microRNA targets from RNA duplexes[J]. BMC Bioinformatics, 2006, 7(Suppl 5): S20.
[60]Wang X: miRDB: a microRNA target prediction and functional annotation database with a wiki interface[J]. Rna, 2008, 14(6): 1012-1017.
[61]Wang X, El Naqa I M: Prediction of both conserved and nonconserved microRNA targets in animals[J]. Bioinformatics, 2008, 24(3): 325-332.
[62]Gaidatzis D, van Nimwegen E, Hausser J, et al. Inference of miRNA targets using evolutionary conservation and pathway analysis[J]. BMC Bioinformatics, 2007(8): 69.
[63]Yousef M, Jung S, Kossenkov A V, et al. Naive Bayes for microRNA target predictions--machine learning for microRNAtargets[J]. Bioinformatics, 2007, 23(22): 2987-2992.
[64]Loher P, Rigoutsos I. Interactive exploration of RNA22 microRNA target predictions[J]. Bioinformatics, 2012, 28(24): 3322-3323.
[65]Nielsen C B, Shomron N, Sandberg R, et al. Determinants of targeting by endogenous and exogenous microRNAs and siRNAs [J]. Rna, 2007, 13(11): 1894-1910.
[66]Stark A, Brennecke J, Russell R B, et al. Identification of Drosophila MicroRNA targets[J]. PLoS Biology, 2003, 1(3): 60.
[67]Brennecke J, Stark A, Russell R B, et al. Principles of microRNA-target recognition[J]. PLoS Biology, 2005, 3(3): 85.
[68]Stark A, Brennecke J, Bushati N, et al. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution[J]. Cell, 2005, 123(6):1133-1146.
[69]Paraskevopoulou M D, Georgakilas G, Kostoulas N, et al. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows[J]. Nucleic Acids Research, 2013, 41: 169-173.
[70]Gennarino V A, Sardiello M, Avellino R, et al. MicroRNA target prediction by expression analysis of host genes[J]. Genome Research, 2009, 19(3): 481-490.
[71]Gennarino V A, Sardiello M, Mutarelli M, et al. HOCTAR database: a unique resource for microRNA target prediction[J]. Gene, 2011, 480(1-2): 51-58.
[72]Huang J C, Babak T, Corson T W, et al. Using expression profiling data to identify human microRNA targets[J]. Nature Methods, 2007, 4(12): 1045-1049.
[73]Yang Y, Wang Y P, Li K B. MiRTif: a support vector machinebased microRNA target interaction filter[J]. BMC Bioinformatics, 2008, 9(Suppl 12): S4.
[74]Bandyopadhyay S, Mitra R. TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples[J]. Bioinformatics, 2009, 25(20): 2625-2631.
[75]Liu C, Mallick B, Long D, et al. CLIP-based prediction of mammalian microRNA binding sites[J]. Nucleic Acids Research, 2013, 41(14): e138.
[76]Axtell M J, Westholm J O, Lai E C. Vive la difference: biogenesis and evolution of microRNAs in plants and animals[J]. Genome Biol, 2011, 12(4): 221.
Zhu Rongsheng, Liu Jinglu, Chen Qingshan. Numerical features of microRNA review[J]. Journal of Northeast Agricultural University, 2015, 46(2): 100-108. (in Chinese with English abstract)
Numerical features of microRNA review/ZHU Rongsheng, LIU Jinglu, CHEN
Qingsha
n3(1. School of Sciences, Northeast Agricultural University, Harbin 150030, China; 2. School of Life Sciences, Northeast Agricultural University, Harbin 150030, China; 3. School of Agriculture, Northeast Agricultural University, Harbin 150030, China)
Abstract:Reviewed the development and application of miRNA numerical features in the field of computational identification, and summarized the role of numerical features in the prediction of miRNA genes and their target. The numerical features of microRNA(miRNA) is an important tool of computational identification new miRNA genes. Since the first miRNA was found that the study of miRNA numerical features has been on, by originally, applying miRNAs sequence features and secondary structure features, thermodynamics features, entropy and topological features to identify new miRNA genes and their target widely. By now, with the next-generation sequencing (NGS) technology applied to miRNA research, the numerical feature in miRNA biogenesis discuss and the research in biosynthetic pathway and identify for miRNA genes and targets. Finally, the development of numerical features in the field of the ownership of discriminant of target genes in function, biological evolution and so on which provided a new train of thought for the numerical feature of miRNA in other field.
Key words:microRNA; numerical features; computational identification; function; evolution
作者簡(jiǎn)介:朱榮勝(1975-),男,副教授,博士,碩士生導(dǎo)師,研究方向?yàn)樯镄畔W(xué)。E-mail: rshzhu@126. com
基金項(xiàng)目:黑龍江省博士后基金(LBH-Z12045);黑龍江省自然科學(xué)基金重點(diǎn)項(xiàng)目(ZD201213)
收稿日期:2014-03-04
文章編號(hào):1005-9369(2015)02-0100-09
文獻(xiàn)標(biāo)志碼:A
中圖分類號(hào):Q52