摘要:利用NCBI數(shù)據(jù)庫(kù)中小鼠Embryonic Stem Cells (ESC)與Neural Progenitor Cells (NPC)的基因芯片結(jié)果及NPC時(shí)期的RFX1ChIPSeq數(shù)據(jù),進(jìn)行有關(guān)RFX1的分析,結(jié)果表明:RFX1結(jié)合位點(diǎn)富集在1,2,4,5,7,9,11染色體上,Y染色體上最少,其他染色體上比較均衡;在基因組中結(jié)合位點(diǎn)分布區(qū)域主要在基因的promoter區(qū)域,約有53.2%,其次是intergentic,占22.5%,body區(qū)域,占13.1%,enhancer區(qū)域,占11.2%.說(shuō)明RFX1是以結(jié)合在基因的promoter區(qū)為主要形式對(duì)目的基因進(jìn)行調(diào)控.同時(shí)在DAVID數(shù)據(jù)庫(kù)中用生物信息學(xué)方法探索了RFX1靶基因的生物學(xué)功能分類.
關(guān)鍵詞:神經(jīng)先祖細(xì)胞;RFX1;ChIPSeq
中圖分類號(hào):Q341文獻(xiàn)標(biāo)識(shí)碼:A
轉(zhuǎn)錄因子RFX1(regulatory factor X1)是RFX基因家族(小鼠基因組中含有7個(gè)RFX家族成員)第一個(gè)被發(fā)現(xiàn)的成員,以結(jié)合蛋白的形式存在于主要組織相容性復(fù)合物分類II基因(major histocompatibility complex (MHC) class II)啟動(dòng)子中\[1\],且在人類基因組中具有大量的結(jié)合位點(diǎn)\[2\].前人的研究表明RFX1是RFX家族中最典型的成員,在各種組織中都有表達(dá),尤其在哺乳動(dòng)物腦組織中表達(dá)量特別高\[3\],其除了控制乙肝病毒I型增強(qiáng)子\[4\]及與纖毛起源形成有關(guān)外,有可能還對(duì)靶基因的活性具有重要的調(diào)控作用.
RFX1含有一個(gè)Cterminal抑制區(qū)域,與二聚化功能域相重疊;含有一個(gè)Nterminal激活區(qū)域,主要在基因的啟動(dòng)子區(qū)對(duì)基因起到抑制或激活作用\[5\].以往研究表明RFX1不是一個(gè)常規(guī)的轉(zhuǎn)錄因子,早期可能沒(méi)有活性,當(dāng)它的啟動(dòng)子與其他因子形成復(fù)合物時(shí)被激活或抑制[6-7].除了MHCII基因外,幾個(gè)潛在靶基因可能被RFX1與臨近的RFX2,RFX3共同調(diào)控.RFX1激活白介素5受體α基因\[8\]、促進(jìn)大鼠神經(jīng)特異性基因煙胺比林谷氨酸運(yùn)載體類型3基因的表達(dá)\[9\],敲除線蟲中與RFX1同源基因,呈現(xiàn)嚴(yán)重的感覺(jué)器官缺陷癥,說(shuō)明RFX1在神經(jīng)系統(tǒng)中具有一定的作用,可能參與調(diào)控了感覺(jué)神經(jīng)的分化過(guò)程\[10\].RFX1與HDAC1(Histone deacetylase)相互作用抑制COL1A12和ID2的表達(dá)\[11-13\].RFX1通過(guò)與HDAC1及DNMT1(DNA methyltransferases)相互作用抑制分化抗原CD11a與CD70的表達(dá),在系統(tǒng)性紅斑狼瘡綜合癥中有重要的作用\[14\].RFX1抑制cmyc基因、增殖細(xì)胞核抗原PCNA(proliferating cell nuclear antigen)、微管相關(guān)蛋白MAP1A、成纖維細(xì)胞生長(zhǎng)因子FGF1的表達(dá)\[15-18\].Xie等2008年通過(guò)生物信息方法預(yù)測(cè)RFX1在人基因組中含有15 319個(gè)結(jié)合位點(diǎn)\[19\],小鼠中敲除RFX1導(dǎo)致胚胎早期致死\[20\],人類染色體含有RFX1的一段19p13.12缺失,會(huì)導(dǎo)致多種先天性畸形,包括耳聾、淚道狹窄、斜視、雙側(cè)頸靜脈竇、先天性心臟畸形、胼胝體發(fā)育不全、小腦蚓部發(fā)育不全\[21\],說(shuō)明RFX1在生物進(jìn)程中具有重要的調(diào)控作用.然而RFX1轉(zhuǎn)錄因子在靶基因中的結(jié)合位點(diǎn)特殊性及其相關(guān)調(diào)控機(jī)制還不完全清楚.
因此本文利用劍橋大學(xué)NCBI數(shù)據(jù)庫(kù)(http://www.ncbi.nlm.nih.gov)中小鼠ESC與NPC的基因芯片結(jié)果及NPC時(shí)期的RFX1ChIPSeq數(shù)據(jù),在DAVID(the Database for Annotation, Visualization and Integrated Discovery)(http://david.abcc.ncifcrf.gov/home.jsp)數(shù)據(jù)庫(kù)中進(jìn)行有關(guān)RFX1的結(jié)合位點(diǎn)及其靶基因功能分類等相關(guān)的生物信息學(xué)分析.
1材料與方法
11材料
NCBI數(shù)據(jù)庫(kù)(http://www.ncbi.nlm.nih.gov)中小鼠Embryonic Stem Cells (ESC)與Neural Progenitor Cells (NPC)的基因芯片結(jié)果及NPC時(shí)期的RFX1ChIPSeq數(shù)據(jù),在DAVID(the Database for Annotation, Visualization and Integrated Discovery)(http://david.abcc.ncifcrf.gov/home.jsp)數(shù)據(jù)庫(kù)中進(jìn)行分析.
12實(shí)驗(yàn)方法
表達(dá)譜數(shù)據(jù)信號(hào)值使用的是文章已經(jīng)處理好的數(shù)據(jù),Normalized signal intensity in log2;差異表達(dá)基因使用samr程序篩選,篩選標(biāo)準(zhǔn),Pvalue<0.02% (delta=1.8),fold_change>2;RFX1 (peak detection)結(jié)合位點(diǎn)掃描使用 MACS程序,篩選標(biāo)準(zhǔn):Pvalue<10-8,tag>10.
基因注釋及功能分析按照DAVID(the Database for Annotation, Visualization and Integrated Discovery)(http://david.abcc.ncifcrf.gov/home.jsp)數(shù)據(jù)庫(kù)中提供的應(yīng)用指南進(jìn)行.
利用服務(wù)器,浪潮天梭10 000;具體配置,兩個(gè)計(jì)算節(jié)點(diǎn),一個(gè)管理節(jié)點(diǎn),20個(gè)核,40G內(nèi)存,硬盤146*6 G,CPU XeonE5620,運(yùn)算速度2 000億次/s;系統(tǒng),RedHatLinux AS 5.4,對(duì)上述數(shù)據(jù)進(jìn)行處理分析.
2結(jié)果
21NPC的RFX1ChIPSeq數(shù)據(jù)分析
利用劍橋大學(xué)NCBI數(shù)據(jù)庫(kù)中小鼠ESC與NPC的基因芯片結(jié)果及NPC時(shí)期的RFX1ChIPSeq數(shù)據(jù),進(jìn)行有關(guān)RFX1的分析見(jiàn)圖1(a)在圖1中,RFX1ChIPSeq數(shù)據(jù)在軟件中顯示的RFX1在8號(hào)染色體區(qū)域的閱讀分布,可以看到RFX1富集峰、較低峰、本地區(qū)域的背景;(b)對(duì)照組在相同條件下的軟件閱讀圖;(c)RFX1基因在8號(hào)染色體上的基因結(jié)構(gòu)與位置,長(zhǎng)方體為外顯子、橫線為內(nèi)含子、兩側(cè)短長(zhǎng)方體為UTR,預(yù)測(cè)小鼠基因組RFX1結(jié)合位點(diǎn)信息.對(duì)基因芯片結(jié)果進(jìn)行初步分析顯示共有45 018個(gè)基因,其中下調(diào)的有2 432個(gè),上調(diào)的有2 713個(gè).對(duì)NPC的RFX1ChIPSeq數(shù)據(jù)分析,將含有RFX1峰值的基因進(jìn)行了相關(guān)的分析,受RFX1調(diào)控的基因共有1 166個(gè),其中494個(gè)是下調(diào)基因,672個(gè)是上調(diào)基因,RFX1可能在ESC向NPC分化中調(diào)控這些基因的表達(dá),從而實(shí)現(xiàn)精確分化.
22RFX1調(diào)控的基因分析
DAVID生物信息資源,能夠從大量基因列表中對(duì)其進(jìn)行快速、高效、多樣化的功能分析.與GoMiner,Gostat,Ontoexpress,GoToolBox,F(xiàn)atiGO,GFINDer,GOBar,GSEA等同類數(shù)據(jù)庫(kù)相比較,DAVID具有獨(dú)特特征和信息容量、集成和擴(kuò)展的后端注釋數(shù)據(jù)庫(kù)、先進(jìn)的模塊化富集算法、強(qiáng)大的綜合數(shù)據(jù)探索能力等優(yōu)勢(shì).Gene Functional Classification根據(jù)基因功能共同存在的注釋術(shù)語(yǔ)而非簡(jiǎn)單的基因名字將功能相關(guān)的基因一起作為一個(gè)單元,以較大的生物網(wǎng)絡(luò)為平臺(tái),進(jìn)行探索和查看,而不是集中在單個(gè)基因水平上的搜尋.Function Annotation Chart 根據(jù)所提交的基因的注釋中提供的與其最為相關(guān)的典型的基因術(shù)語(yǔ)豐度進(jìn)行分類分析,包含GO terms, proteinprotein interactions, protein functional domains, disease associations, biopathways, sequence features, homology, gene functional summaries, gene tissue expression等40多種注釋分類.
將RFX1調(diào)控的1 166個(gè)基因在DAVID(the Database for Annotation, Visualization and Integrated Discovery)(http://david.abcc.ncifcrf.gov/home.jsp)數(shù)據(jù)庫(kù)中進(jìn)行GO分析,基本注釋涉及到了Fuctional_Categories(其子目錄TERMS共有5項(xiàng),有3項(xiàng)被檢測(cè)到,分別是COG_ONTOLOGY,SP_PIR_KEYWORDS和UP_SEQ_FEATURE),Gene_Ontology(其子目錄TERMS共有23項(xiàng),有3項(xiàng)被檢測(cè)到, 分別是GOTERM_BP_FAT和GOTERM_CC_FAT,GOTERM_MF_FAT),Pathways(其子目錄TERMS共有6項(xiàng),有3項(xiàng)被檢測(cè)到,分別是BBID,BIOCARTA和KEGG_PATHWAY),Protein_Domains(其子目錄TERMS共有18項(xiàng),有3項(xiàng)被檢測(cè)到,分別是INTERPRO,PIR_SUPERFAMILY,SMART)4個(gè)方面(表1),將每項(xiàng)注釋進(jìn)行了詳細(xì)分析.
2.2.1RFX1調(diào)控的基因功能分類
對(duì)這些基因的Fuctional_Categories分析(圖6),“√”標(biāo)注的選項(xiàng)是經(jīng)過(guò)網(wǎng)站的一些參考值對(duì)其進(jìn)行初步的篩選而優(yōu)先推薦的功能分類范疇;Chart后柱狀圖長(zhǎng)短表示有多少基因參與該分類生物進(jìn)程,是以百分?jǐn)?shù)的形式顯示.
圖6 Chart中是以PValue值排布的,越小值越可信,排在上面.選取COG_ONTOLOGY分類進(jìn)行簡(jiǎn)單分析,由圖7(在網(wǎng)站中點(diǎn)擊圖6顯示的Chart得到圖7)可知,在COG_ONTOLOGY分類中的基因主要參與了細(xì)胞分裂與染色體分離、胞內(nèi)運(yùn)輸與分泌、信號(hào)傳導(dǎo)、細(xì)胞分裂與染色體支架4個(gè)生物進(jìn)程.
在SP_PIR_KEYWORDS分類中的基因參與了近70個(gè)生物進(jìn)程,主要以磷酸化、細(xì)胞核形成、乙?;?、選擇性剪輯進(jìn)程為主;UP_SEQ_FEATURE分類中的基因參與了60個(gè)生物進(jìn)程,主要以遺傳突變、剪接變體為主.很多基因涉及到幾個(gè)分類的很多進(jìn)程.
2.2.2 RFX1調(diào)控的基因本體論
在Gene_Ontology中分為23個(gè)小類,只有3類被標(biāo)記為“√”,是網(wǎng)站優(yōu)先推薦,比較有意義的分類見(jiàn)圖8.其中GOTERM_BP_FAT含有203個(gè)分項(xiàng),以轉(zhuǎn)錄調(diào)控、負(fù)向調(diào)控為主;GOTERM_CC_FAT含有80個(gè)小項(xiàng),以參與微管支架、胞內(nèi)非膜綁定細(xì)胞器的形成為主;GOTERM_MF_FAT含有59個(gè)小項(xiàng),以參與核酸結(jié)合、DNA結(jié)合、轉(zhuǎn)錄因子活化為主.
3結(jié)論與討論
利用劍橋大學(xué)NCBI數(shù)據(jù)庫(kù)(http://www.ncbi.nlm.nih.gov)中小鼠Embryonic Stem Cells (ESC)與Neural Progenitor Cells (NPC)的基因芯片結(jié)果及NPC時(shí)期的RFX1ChIPSeq數(shù)據(jù),進(jìn)行有關(guān)RFX1的分析,RFX1結(jié)合位點(diǎn)富集在1,2,4,5,7,9,11染色體上,Y染色體上最少,其他染色體上比較均衡;以基因轉(zhuǎn)錄起始位點(diǎn)TSS為參照,距離TSS在-2~1 kb之間的為promoter,距離TSS在-50~2 kb之間的為enhancer,RFX1在基因組中結(jié)合位點(diǎn)分布區(qū)域主要在基因的promoter區(qū)域,約有53.2%,其次是intergentic,占22.5%,在body區(qū)域的分布,占13.1%,11.2%分布在enhancer區(qū)域.說(shuō)明RFX1是以結(jié)合在基因的promoter區(qū)為主要形式對(duì)目的基因進(jìn)行調(diào)控.
對(duì)基因芯片結(jié)果進(jìn)行初步分析顯示共有45 018個(gè)基因,其中下調(diào)的有2 432個(gè),上調(diào)的有2 713個(gè),結(jié)合NPC的RFX1ChIPSeq數(shù)據(jù)分析,含有RFX1峰值的基因共有1 166個(gè),其中494個(gè)是下調(diào)基因,672個(gè)是上調(diào)基因.將這些基因在DAVID Bioinformatics Resources數(shù)據(jù)庫(kù)中進(jìn)行基因注釋分析,這些基因被網(wǎng)站自動(dòng)篩選優(yōu)化分為以下幾類,F(xiàn)unctional_Categories(網(wǎng)站篩選分類后優(yōu)先推薦以“√”標(biāo)注顯示,分別為COG_ONTOLOGY,SP_PIR_KEYWORDS,UP_SEQFEATURE 3個(gè)子類);Gene_Ontology(網(wǎng)站篩選分類后優(yōu)先推薦GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT 3個(gè)子類);Pathways(網(wǎng)站篩選分類后優(yōu)先推薦BBID,BIOCARTA,KEGG_PATHWAY 3個(gè)子類);Protein_Domains(網(wǎng)站篩選分類后優(yōu)先推薦INTERPRO,PIR_SUPERFAMLIY,SMART 3個(gè)子類).這些子類又分為Signal transduction mechanisms,membrane,phosphoprotein,transport等等若干TERM描述.
672個(gè)上調(diào)基因在分類中涉及多個(gè)功能重復(fù)出現(xiàn)頻率超過(guò)40次的有25個(gè)基因,下調(diào)的494個(gè)基因中重復(fù)出現(xiàn)頻率超過(guò)16次的有50個(gè)基因,基因重復(fù)次數(shù)與重復(fù)基因數(shù)呈現(xiàn)指數(shù)相關(guān).上調(diào)基因中在brain的發(fā)育與形態(tài)建成具有重要的作用,而下調(diào)基因中并未發(fā)現(xiàn)與brain相關(guān)的基因;上調(diào)基因中涉及7項(xiàng)、下調(diào)基因中涉及3項(xiàng)與cancer相關(guān)的TERM描述,分別涉及71,30個(gè)基因,包含Ccnd1,Mapk3,Pik3r1,Akt2,Pik3cd,Kras,Sos1等熱點(diǎn)基因.
綜上小鼠NPC細(xì)胞中有關(guān)RFX1數(shù)據(jù)的生物信息學(xué)分析,顯示了RFX1調(diào)控基因的廣泛性及多功能性,為今后RFX1調(diào)控機(jī)理的研究提供了良好的理論基礎(chǔ).
參考文獻(xiàn)
[1]REITH W, BARRAS E, SATOLA S, et al. Mach B: Cloning of the major histocompatibility complex class II promoter binding protein affected in a hereditary defect in class II gene regulation\[J\]. Proc Natl Acad Sci U S A, 1989, 86(11): 4200-4204.
[2]XIE X H, TARJEI S, ANDREAS G. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites\[J\]. PNAS, 2007, 24: 7145-7150.
[3]AFTAB S, SEMENEC L, CHU J S, et al. Identification and characterization of novel human tissuespecific RFX transcription factors\[J\]. BMC Evol Biol, 2008, 8: 226.
[4]SIEGRIST C A, DURAND B, EMERY P, et al. RFX1 is identical to enhancer factor C and functions as a transactivator of the hepatitis B virus enhancer\[J\]. Mol Cell Biol, 1993, 13: 6375-6384.
[5]KATAN Y, AGAMI R, SHAUL Y. The transcriptional activation and repression domains of RFX1, a contextdependent regulator, can mutually neutralize their activities\[J\]. Nucleic Acids Res, 1997, 25: 3621-3628.
[6]JOTHI R, CUDDAPAH S, BARSKI A, et al. Genome wide identification in vivo protein DNA binding sites from ChIPSeq data\[J\]. Nucleic Acids Res, 2008, 6(16): 5221-5231.
[7]KATANKHAYKOVICH Y, SHAUL Y. RFX1, a single DNAbinding protein with a split dimerization domain, generates alternative complexes\[J\]. J Biol Chem, 1998, 273: 24504-24512.
[8]IWAMA A, PAN J, ZHANG P, et al. Dimeric RFX proteins contribute to the activity and lineage specificity of the interleukin5 receptor alpha promoter through activation and repression domains\[J\]. Mol Cell Biol , 1999, 19: 3940-3950.
[9]MA K, ZHENG S, ZUO Z. The transcription factor regulatory factor X1 increases the expression of neuronal glutamate transporter type 3\[J\]. J Biol Chem, 2006, 281: 21250-21255.
[10]DUBRUILLE R, LAURENCON A, VANDAELE C, et al. Drosophila regulatory factor X is necessary for ciliated sensory neuron differentiation[J]. Development, 2002, 129: 5487-5498.
[11]SENGUPTA P, XU Y, WANG L, et al. Collagen alpha1(I) gene (COL1A1) is repressed by RFX family\[J\]. J Biol Chem, 2005, 280: 21004-21014.
[12]XU Y, SENGUPTA P K, SETO E,et al. Regulatory factor for Xbox family proteins differentially interact with histone deacetylases to repress collagen alpha2(I) gene (COL1A2) expression\[J\]. J Biol Chem, 2006, 281: 9260-9270.
[13]WANG K R, NEMOTO T, YOKOTA Y. RFX1 mediates the seruminduced immediate early response of Id2 gene expression\[J\]. J Biol Chem, 2007, 282: 26167-26177.
[14]ZHAO M, SUN Y M. Epigenetics and SLE: RFX1 downregulation causes CD11a and CD70 overexpression by altering epigenetic modifications in lupus CD4+T cells\[J\]. Journal of Autoimmunity, 2010, 35(1): 58-69.
[15]ZAJACKAYE M, BENBARUCH N, KASTANOS E, et al. Induction of mycintronbinding polypeptides MIBP1 and RFX1 during retinoic acidmediated differentiation of haemopoietic cells\[J\]. Biochem J, 2000, 345(Pt 3): 535-541.
[16]LIU M, LEE B H, MATHEWS M B. Involvement of RFX1 protein in the regulation of the human proliferating cell nuclear antigen promoter\[J\]. J Biol Chem, 1999, 274: 15433-15439.
[17]NAKAYAMA A, MURAKAMI H, MAEYAMA N, et al. Role for RFX transcription factors in nonneuronal cellspecific inactivation of the microtubuleassociated protein MAP1A promoter\[J\]. J Biol Chem, 2003, 278: 233-240.
[18]YICHAO H, WEICHIH L, CHIENYU K, et al. Regulation of FGF1 gene promoter through transcription factor RFX1\[J\]. J Biol Chem, 2010, 285(18): 13885-13895.
[19]XIE X, RIGOR P. MotifMap:a human genomewide map of candidate regulatory motif sites[J]. Bioinformatics, 2008,25(2):167-174.
[20]FENG Chenzhuo,XU Wenhao,ZUO Zhiyi. Knockout of the regulatory factor X1 gene leads to early embryonic lethality\[J\].Biochemical and Biophysical Research Communications, 2009, 386: 715-717.
[21]JENSEN D R, MARTIN D M, GEBARSKI S, et al. A novel chromosome 19p13.12 deletion in a child with multiple congenital anomalies\[J\].Am J Med Genet A, 2009, 149A(3): 396-402.