[摘要]目的利用生物信息學的方法篩選高級別卵巢漿液性囊腺癌(HGSC)的差異表達基因(DEGs),并從基因水平挖掘這些DEGs在HGSC中發(fā)揮的潛在作用。方法從GEO數(shù)據(jù)庫中下載GSE10971、GSE14001、GSE18521、GSE27651、GSE12470數(shù)據(jù)集,運用R軟件和Bioconductor安裝包篩選HGSC組織中與正常組織相比上調(diào)的DEGs和下調(diào)的DEGs,對這些基因分別進行基因本體(GO)富集分析和KEGG通路分析,蛋白質(zhì)相互作用(PPI)網(wǎng)絡(luò)分析及預(yù)后生存分析,并運用網(wǎng)絡(luò)分析插件(CytoHubba)篩選關(guān)鍵基因,最后通過Kaplan-Meier plotter數(shù)據(jù)庫分析篩選出的關(guān)鍵基因的表達與HGSC病人生存預(yù)后的關(guān)系。結(jié)果從GEO數(shù)據(jù)庫中篩選出134個DEGs,其中94個上調(diào)DEGs作為細胞質(zhì)的組成成分,與蛋白質(zhì)二聚活性有關(guān),參與細胞內(nèi)代謝過程的調(diào)控和細胞周期的調(diào)控;40個下調(diào)DEGs主要以細胞外基質(zhì)成分居多,并且大多具有poly(A)聚合活性,參與腫瘤信號通路的調(diào)控。篩選出的6個上調(diào)關(guān)鍵基因BUB1B、CENPF、BIRC5、UBE2C、ASPM、TOP2A與病人預(yù)后有顯著相關(guān)性(r=0.87~1.55,P<0.05)。結(jié)論篩選出的DEGs參與了HGSC發(fā)生發(fā)展的分子功能,其中的關(guān)鍵上調(diào)基因BUB1B、CENPF、BIRC5、UBE2C、ASPM、TOP2A可能對HGSC的臨床治療及預(yù)后判斷具有潛在的指導(dǎo)價值。
[關(guān)鍵詞]卵巢腫瘤;囊腺癌,漿液;計算生物學;基因本體;蛋白質(zhì)相互作用圖;預(yù)后
[中圖分類號]R737.31[文獻標志碼]A[文章編號]2096-5532(2021)01-0019-06
[ABSTRACT]ObjectiveTo screen out the differentially expressed genes (DEGs) in high-grade ovarian serous cystadenocarcinoma (HGSC) using the bioinformatics method, and to investigate the potential role of these DEGs in HGSC at the gene level. MethodsGSE10971, GSE14001, GSE18521, GSE27651, and GSE12470 datasets were downloaded from gene expression database (GEO), and R software and Bioconductor installation package were used to screen out the upregulated and downregulated DEGs in HGSC tissue compared with normal tissue. These genes were analyzed by Gene Ontology (GO) enrichment analysis, kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, protein-protein interaction (PPI) network analysis, and prognosis survival analysis, and the network analysis plug-in (CytoHubba) was used to screen out hub genes. Finally, the Kaplan-Meier plotter database was used to analyze the association of the expression of hub genes with the survival and prognosis of HGSC patients. ResultsA total of 134 DEGs were screened out in GEO database, among which 94 upregulated DEGs were cytoplasmic components associated with protein dimerization activity and were involved in the regulation of intracellular metabolism and cell cycle, and 40 downregulated DEGs were mainly the components of extracellular matrix, most of which had poly (A) polymerization activity and were involved in the regulation of tumor signaling pathways. Six upregulated hub genes, i.e., BUB1B, CENPF, BIRC5, UBE2C, ASPM, and TOP2A, were significantly correlated with the prognosis of patients (r=0.87-1.55,Plt;0.05). ConclusionThe DEGs screened out are involved in the molecular functions of the development and progression of HGSC, and the upregulated hub genes, i.e., BUB1B, CENPF, BIRC5, UBE2C, ASPM, and TOP2A, may have a potential value in guiding clinical treatment and prognostic evaluation of HGSC.
[KEY WORDS]ovarian neoplasms; cystadenocarcinoma, serous; computational biology; gene ontology; protein interaction maps; prognosis
卵巢漿液性囊腺癌(OV)是比良性漿液性囊腺瘤和交界性漿液性囊腺瘤(SBT)嚴重的一種卵巢上皮性癌亞型[1]。根據(jù)美國KURMAN教授提出的卵巢癌“二元模型”理論,可以將OV分為兩種類型:Ⅰ型的低級別漿液性囊腺癌(LGSC)和Ⅱ型的高級別漿液性囊腺癌(HGSC)[2]。目前認為,HGSC發(fā)病起源于輸卵管,與LGSC在分子學和組織學水平上存在明顯差異[3-4]。相較于LGSC,HGSC具有發(fā)病年齡較晚(55~65歲)、發(fā)病率高、生存率低、對化療藥物敏感性高且易復(fù)發(fā)等特點,因此對HGSC預(yù)后判斷和治療策略的深入研究也顯得尤為迫切。本研究運用生物信息學的方法,從GEO(Gene Expression Omnibus)數(shù)據(jù)庫獲取OV基因芯片數(shù)據(jù),從中挖掘HGSC的差異表達基因(DEGs),進行基因本體(GO)富集分析和KEGG信號通路分析,構(gòu)建蛋白質(zhì)相互作用(PPI)網(wǎng)絡(luò),篩選出關(guān)鍵基因,并分析關(guān)鍵基因表達與HGSC預(yù)后的關(guān)系,從而為HGSC的靶向治療提供一定的理論依據(jù)。
1資料與方法
1.1數(shù)據(jù)來源
從GEO數(shù)據(jù)庫(http://www.ncbi.nlm.nih.gov/GEO/)中檢索并下載的OV相關(guān)數(shù)據(jù)集有5個(GSE10971、GSE14001、GSE18521、GSE27651、GSE12470)[5-6],其中前4個數(shù)據(jù)集對應(yīng)的檢測平臺為GPL570,而最后1個數(shù)據(jù)集對應(yīng)的檢測平臺為GPL887。在每個GSE數(shù)據(jù)集中,只選擇HGSC樣本以及與之匹配的正常樣本數(shù)據(jù)。其中GSE10971數(shù)據(jù)集中包含腫瘤樣本13個和正常樣本12個,GSE14001數(shù)據(jù)集中包含腫瘤樣本10個和正常樣本3個,GSE18521數(shù)據(jù)集中包含腫瘤樣本53個和正常樣本10個,GSE27651數(shù)據(jù)集中包含腫瘤樣本22個和正常樣本6個, GSE12470數(shù)據(jù)集中包含腫瘤樣本35個和正常樣本10個[7-11]。利用GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r)分析工具進行在線分析,將結(jié)果匯總在Excel表格中,去除沒有基因名稱或基因探針以及同一個基因?qū)?yīng)多個基因探針的數(shù)據(jù)。
1.2DEGs的篩選
使用R 3.6.2軟件(https://www.r-pro-ject.org/)中的edgeR包對數(shù)據(jù)進行標準化處理,之后對數(shù)據(jù)進行篩選。篩選標準如下:P<0.01,差異倍數(shù)logFC≥1或≤-1[12]。然后再對篩選出的DEGs進行火山圖的可視化分析。
1.3上調(diào)基因和下調(diào)基因的篩選
將上一步篩選的DEGs數(shù)據(jù),按照logFC>1為上調(diào)基因的標準、logFC<-1為下調(diào)基因的標準,進行再次篩選。然后,將5個數(shù)據(jù)集中的上調(diào)基因或下調(diào)基因全部導(dǎo)入Bioinformatics amp; Evolutionary Genomics(http://bioinformatics.psb.ugent.be/webtools/Venn/)在線數(shù)據(jù)庫中,以尋找5個數(shù)據(jù)集中上調(diào)基因或著下調(diào)基因的交集[13]。
1.4GO和KEGG富集分析
利用DAVID 6.8(Database for Annotation,Visualization and Integrated Discovery, https://david.ncifcrf.gov/)數(shù)據(jù)庫分析基因組規(guī)模數(shù)據(jù)集的生物信息,并進行基因和蛋白質(zhì)的功能信息的可視化[14]。GO分析用于分析大量注釋基因的生物學過程、分子功能及細胞組成[15]。KEGG(Kyoto Encyclopedia of Genes and Genomes)分析是從分子水平上了解基因和蛋白質(zhì)所參與的信號通路和生物學功能。GO和KEGG富集分析均以P<0.05為差異有統(tǒng)計學意義。
1.5PPI網(wǎng)絡(luò)的構(gòu)建及關(guān)鍵基因的篩選
將全部的DEGs導(dǎo)入String數(shù)據(jù)庫(http://string-db.org)[16]中進行分析,以置信度≥0.4為PPI顯著。將分析結(jié)果導(dǎo)入Cytoscape 3.7.2軟件中進行可視化分析[17]。應(yīng)用Cytoscape軟件中的cytoHubba插件從PPI網(wǎng)絡(luò)中篩選關(guān)鍵基因,選擇度定位≥12的DEGs作為關(guān)鍵基因。
1.6關(guān)鍵基因的生存預(yù)后分析
通過在線生存分析工具Kaplan-Meier plotter(http://kmp lot.com/analysis/),根據(jù)上述篩選條件,按照關(guān)鍵基因排名從上至下進行生存預(yù)后分析,評估每個關(guān)鍵基因在OV中的預(yù)后意義[18]。根據(jù)基因的表達中值,將病人樣本分為兩組(高表達組和低表達組)進行分析,參數(shù)設(shè)置為默認,以P<0.05為差異有統(tǒng)計學意義。
2結(jié)果
2.1從5個數(shù)據(jù)集中篩選出的DEGs
本文從GSE18521數(shù)據(jù)集中篩選出了6 669個DEGs(共有45 118個基因),從GSE12470數(shù)據(jù)集中篩選出了6 068個DEGs(共有18 819個基因),從GSE27651數(shù)據(jù)集中篩選出了6 593個DEGs(共有45 118個基因),以及從GSE14001數(shù)據(jù)集中共篩選出了12 408 個DEGs(共有45 118個基因),從GSE10971數(shù)據(jù)集中篩選出了5 612個DEGs(共有45 118個基因),其結(jié)果通過火山圖直觀展示,紅色代表高表達基因,綠色代表低表達基因,黑色表示表達水平差異并不顯著的基因(圖1)。
進一步對5個獨立數(shù)據(jù)集進行交集分析,找出5個數(shù)據(jù)集的共同DEGs,其中表達上調(diào)基因94個(logFC>1,P<0.05),下調(diào)基因為40個(logFC<-1,P<0.05)(圖2),具體的基因名稱見表1。
2.2DEGs的GO和KEGG富集分析
在生物過程上,上調(diào)DEGs大多參與RNA代謝過程和其他代謝過程的調(diào)節(jié),RNA轉(zhuǎn)錄和DNA模板的調(diào)控以及分子功能調(diào)節(jié),大分子代謝過程和氮化合物代謝過程的調(diào)節(jié);而下調(diào)DEGs大多參與細胞過程、細胞蛋白質(zhì)代謝過程、蛋白質(zhì)修飾過程和蛋白質(zhì)磷酸化過程的調(diào)控(圖3A)。在細胞成分上,上調(diào)DEGs屬于細胞內(nèi)細胞器成分、膜結(jié)合細胞器成分、細胞內(nèi)膜結(jié)合細胞器成分、細胞質(zhì)成分抑或?qū)儆诩毎獬煞?而下調(diào)DEGs分布于細胞核、細胞外基質(zhì)、細胞質(zhì)和質(zhì)膜上(圖3B)。在分子功能上,上調(diào)DEGs一般具有絲氨酸型內(nèi)肽酶活性、蛋白質(zhì)二聚活性、內(nèi)肽酶活性、微管蛋白結(jié)合和蛋白質(zhì)均聚活性等;而下調(diào)DEGs一般具有polyA結(jié)合功能和一氧化氮合酶結(jié)合功能(圖3C)。在KEGG信號通路上,上調(diào)DEGs多數(shù)參與細胞周期及細胞周期中的有絲分裂過程,細胞周期檢驗點過程,DNA修復(fù)和M期信號途徑;而下調(diào)DEGs多參與STAT信號通路、黏附斑粘連途徑、Epstein-Barr病毒感染和腫瘤信號途徑等(圖3D)。
2.35個數(shù)據(jù)集的交互分析及關(guān)鍵基因的篩選
為了從系統(tǒng)角度發(fā)現(xiàn)和分析相關(guān)DEGs之間的相互作用,通過String在線數(shù)據(jù)庫分析得到5個數(shù)據(jù)集的134個DEGs之間的PPI交互網(wǎng)絡(luò)(圖4A)。在PPI網(wǎng)絡(luò)中,存在一些基因能夠與其他基因發(fā)生強的相互作用,而往往這些基因還處于PPI網(wǎng)絡(luò)中的關(guān)鍵位置,因此被稱為關(guān)鍵基因,它們也被認為是疾病發(fā)生的潛在驅(qū)動因子[19]。為找出導(dǎo)致HGSC發(fā)生的關(guān)鍵基因,我們使用Cytoscape軟件插件過濾出69個DEGs,再根據(jù)排名篩選出前12個關(guān)鍵基因,顏色由紅至黃,紅色越深表示關(guān)鍵基因在PPI中具有的作用越大(圖4B)。
2.4關(guān)鍵基因的生存預(yù)后分析
通過Kaplan Meier-plotter網(wǎng)站對篩選出的12個關(guān)鍵基因進行生存預(yù)后分析,其中6個基因?qū)GSC預(yù)后有顯著影響,分別為BUB1B(r=1.20,P<0.05)、CENPF(r=1.25,P<0.05)、BIRC5(r=0.87,P<0.05)、UBE2C(r=1.15,P<0.05)、ASPM(r=1.55,P<0.05)、TOP2A(r=1.20,P<0.05)(圖5)。這些上調(diào)基因的高表達會顯著降低HGSC病人的生存率。
3討論
目前認為,LGSC由卵巢上皮性包涵體(OEI)至良性囊腺瘤再至SBT連續(xù)發(fā)展而來,而HGSC由輸卵管遠端發(fā)展而來,即使二者在起源上有相似之處,但目前普遍認為,兩種疾病在臨床上具有不同的病理特征,這意味著尋找能鑒別LGSC和HGSC的腫瘤標志物極為重要[20]。
有研究表明,50%的HGSC與DNA修復(fù)缺陷有關(guān)[21]。根據(jù)GO和KEGG富集分析,本研究顯示上調(diào)DEGs參與DNA模板的調(diào)控和DNA修復(fù),這可以作為尋找HGSC靶基因的依據(jù)。之后通過生物信息學分析找到6個與預(yù)后顯著相關(guān)的基因,這6個基因在HGSC中都表現(xiàn)為表達上調(diào)。有研究結(jié)果表明,BUB1B基因的GLEBS結(jié)構(gòu)域?qū)χ委熌z質(zhì)母細胞瘤有重要作用,并且PTTG3P-FOXM1-BUB1B信號軸上調(diào)成為肺腺瘤的治療靶點[22-23];CENPF基因相關(guān)級聯(lián)信號軸的失調(diào)促進前列腺癌的轉(zhuǎn)移[24];BIRC5基因的高表達對淋巴瘤的細胞活力具有重要作用,使用相關(guān)藥物降低BIRC5在淋巴瘤中的表達具有潛在靶向治療作用[25];在高風險的乳癌病人中,UBE2C基因高表達者具有不良預(yù)后[26];ASPM基因可作為肝細胞癌血管侵襲、早期復(fù)發(fā)和預(yù)后不良的新型標記物[27];在早期乳癌病人中檢測到TOP2A基因表達異常[28]。目前研究發(fā)現(xiàn)BUB1B基因在高級別腫瘤疾病中的表達較高,并與長期預(yù)后有關(guān)[29],這與本研究生物信息學分析的結(jié)果一致。雖然上述基因在卵巢癌中的研究甚少,但是根據(jù)它們在其他腫瘤中的研究,我們猜測這些基因處于腫瘤信號通路的某個關(guān)鍵節(jié)點上,影響機體的正常生理功能,從而引起腫瘤的發(fā)生。
綜上所述,本研究通過對5個數(shù)據(jù)集進行生物信息學分析,挖掘出了與HGSC有關(guān)的DEGs共134個,其中與HGSC預(yù)后顯著相關(guān)的基因6個,這6個基因可能對HGSC的臨床治療及預(yù)后判斷具有潛在的指導(dǎo)價值,并為后續(xù)的實驗研究提供新的思路。但是,對于本研究篩選出的這些基因是否能夠有效鑒別LGSC和HGSC,還需要在今后的研究中進一步探討。
[參考文獻]
[1]MCCLUGGAGE W G. Morphological subtypes of ovarian carcinoma: a review with emphasis on new developments and pathogenesis[J]." Pathology, 2011,43(5):420-432.
[2]SHIH I M, KURMAN R J. Ovarian tumorigenesis: a proposed model based on morphological and molecular genetic analysis[J]." Am J Pathol, 2004,164(5):1511-1518.
[3]KURMAN R J. Origin and molecular pathogenesis of ovarian high-grade serous carcinoma[J]." Ann Oncol: Off J Eur Soc Med Oncol, 2013,24 Suppl 10:x16-x21.
[4]MEDEIROS F, MUTO M G, LEE Y, et al. The tubal fimbria
24青島大學學報(醫(yī)學版)57卷
is a preferred site for early adenocarcinoma in women with familial ovarian cancer syndrome[J]." Am J Surg Pathol, 2006,30(2):230-236.
[5]BARRETT T, WILHITE S E, LEDOUX P, et al. NCBI GEO: archive for functional genomics data sets: update[J]." Nucleic Acids Research, 2012,41(D1):D991-D995.
[6]EDGAR R, DOMRACHEV M, LASH A E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository[J]." Nucleic Acids Res, 2002,30(1):207-210.
[7]LI J, YUE H R, YU H L, et al. Development and validation of SIRT3-related nomogram predictive of overall survival in patients with serous ovarian cancer[J]." J Ovarian Res, 2019,12(1):47.
[8]LOU W Y, DING B S, ZHONG G S, et al. Dysregulation of pseudogene/lncRNA-hsa-miR-363-3p-SPOCK2 pathway fuels stage progression of ovarian cancer[J]." Aging, 2019,11(23):11416-11439.
[9]TUNG C S, MOK S C, TSANG Y T M, et al. PAX2 expression in low malignant potential ovarian tumors and low-grade ovarian serous carcinomas[J]." Mod Pathol: Off J U S Can Acad Pathol Inc, 2009,22(9):1243-1250.
[10]MOK S C, BONOME T, VATHIPADIEKAL V, et al. A gene signature predictive for outcome in advanced ovarian can-cer identifies a survival factor: microfibril-associated glycoprotein 2[J]." Cancer Cell, 2009,16(6):521-532.
[11]KING E R, TUNG C S, TSANG Y T M, et al. The anterior gradient homolog 3 (AGR3) gene is associated with differen-tiation and survival in ovarian cancer[J]." Am J Surg Pathol, 2011,35(6):904-912.
[12]OXNARD G R, LO P C, NISHINO M, et al. Natural history and molecular characteristics of lung cancers harboring EGFR exon 20 insertions[J]." Journal of Thoracic Oncology, 2013,8(2):179-184.
[13]MICHOEL T, MAERE S, BONNET E, et al. Validating module network learning algorithms using simulated data[J]." BMC Bioinform, 2007,8 Suppl 2:S5.
[14]DENNIS G, SHERMAN B T, HOSACK D A, et al. DAVID: database for annotation, visualization, and integrated discove-ry[J]." Genome Biol, 2003,4(5):P3.
[15]CONSORTIUM G O. The Gene Ontology (GO) project in 2006[J]." Nucleic Acids Research, 2006,34(90001):D322-D326.
[16]SZKLARCZYK D, MORRIS J H, COOK H, et al. The STRING database in 2017: quality-controlled protein-protein association networks,made broadly accessible[J]." Nucleic Acids Res, 2017:45(1):362-368.
[17]SHANNON P, MARKIEL A, OZIER O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks[J]." Genome Res, 2003,13(11):2498-2504.
[18]SZSZ A M, LNCZKY A, NAGY , et al. Cross-validation of survival associated biomarkers in gastric cancer using transcriptomic data of 1,065 patients[J]." Oncotarget, 2016,7(31):49322-49333.
[19]XIAO Y B, FENG M, RAN H Y, et al. Identification of key differentially expressed genes associated with non-small cell lung cancer by bioinformatics analyses[J]." Mol Med Rep, 2018,17(5):6379-6386.
[20]LI J, FADARE O, XIANG L, et al. Ovarian serous carcinoma: recent concepts on its origin and carcinogenesis[J]." J Hematol Oncol, 2012,5:8.
[21]HILL S J, DECKER B, ROBERTS E A, et al. Prediction of DNA repair inhibitor response in short-term patient-derived ovarian cancer organoids[J]." Cancer Discov, 2018,8(11):1404-1421.
[22]DING Y, HUBERT C G, HERMAN J, et al. Cancer-specific requirement for BUB1B/BUBR1 in human brain tumor isolates and genetically transformed cells[J]." Cancer Discov, 2013,3(2):198-211.
[23]SHIH J H, CHEN H Y, LIN S C, et al. Integrative analyses of noncoding RNAs reveal the potential mechanisms augmenting tumor malignancy in lung adenocarcinoma[J]." Nucleic Acids Res, 2020,48(3):1175-1191.
[24]LIN S C, KAO C Y, LEE H J, et al. Dysregulation of miRNAs-COUP-TFII-FOXM1-CENPF axis contributes to the metastasis of prostate cancer[J]." Nature Communications, 2016,7:11418.
[25]PISE-MASISON C A, RADONOVICH M F, DOHONEY K M, et al. Gene expression profiling of ATL patients: compilation of disease-related genes and evidence for TCF4 involvement in BIRC5 gene expression and cell viability[J]." Blood, 2009,113(17):4016-4026.
[26]PSYRRI A, KALOGERAS K T, KRONENWETT R, et al. Prognostic significance of UBE2C mRNA expression in high-risk early breast cancer. A Hellenic Cooperative Oncology Group (HeCOG) Study[J]." Annals of Oncology, 2012,23(6):1422-1427.
[27]LIN S Y, PAN H W, LIU S H, et al. ASPM is a novel mar-ker for vascular invasion, early recurrence, and poor prognosis of hepatocellular carcinoma[J]." Clin Cancer Res: Off J Am Assoc Cancer Res, 2008,14(15):4814-4820.
[28]TUBBS R, BARLOW W E, BUDD G T, et al. Outcome of patients with early-stage breast cancer treated with doxorubicin-based adjuvant chemotherapy as a function of HER2 and TOP2A status[J]." J Clin Oncol: Off J Am Soc Clin Oncol, 2009,27(24):3881-3886.
[29]MUKHERJEE A, JOSEPH C, CRAZE M, et al. The role of BUB and CDC proteins in low-grade breast cancers[J]." Lancet Lond Engl, 2015,385 Suppl 1:S72.
(本文編輯 馬偉平)