Kangjie SHEN ,Chuanyuan WEI ,Yi XIE ,Lu WANG ,Shuyu WANG,2 ,Ming REN ,Xinyi DENG,Daohe WANG,Zixu GAO,Zihao FENG,3,*,Jianying GU,3,*
ABSTRACT Objective To identify novel biomarkers and therapeutic targets for primary melanoma using network-based microarray data analysis.Methods Eligible microarray datasets from the Gene Expression Omnibus (GEO) database were used to identify differentially expressed genes (DEGs).The protein-protein interaction (PPI) network,Gene Ontology (GO),and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to identify hub genes and pathways that might affect the survival of melanoma patients.Immunohistochemistry results obtained from the Human Protein Atlas (HPA) database confirmed the protein expression levels of hub genes.The Cancer Genome Atlas (TCGA) database was used to further verify the gene expression levels and conduct survival analysis.Results Three microarray datasets (GSE3189,GSE15605,and GSE46517) containing 122 melanoma and 30 normal skin tissue samples were included.A total of 262 common differentially expressed genes (cDEGs) were identified based on three statistical approaches (Fisher’s method,the random effects model (REM),and vote counting) with strict criteria.Of these,two upregulated genes,centromere protein F (CENPF) and pituitary tumortransforming gene 1 (PTTG1),were selected as hub genes.HPA and TCGA database analyses confirmed that CENPF and PTTG1 were overexpressed in melanoma.Survival analysis showed that high expression levels of CENPF were significantly correlated with decreased overall survival (OS) (P=0.028).Conclusion The expression level of CENPF was significantly upregulated in melanoma and correlated with decreased OS.Thus,CENPF may represent a novel biomarker and therapeutic target for melanoma patients.
KEY WORDS Melanoma;Network-based analysis;Microarray;Centromere protein F;Biomarkers1
Malignant melanoma,the primary cause of death from skin cancer,is the most dangerous skin tumor with an increasing global incidence year after year[1-4].At present,several mechanisms of melanoma tumorigenesis and progression have been identified[5].Mutations in the BRAF gene are the most common mutations associated with melanoma,among which BRAF V600E has been detected in approximately 50% of patients[6-7].Although BRAF V600E inhibitors,such as vemurafenib[8],are commonly used in melanoma patients,chemotherapy can be unsuccessful due to drug resistance[9].Recently,scientists have developed immunotherapies and targeted therapies that have shown promise for improving the prognosis of melanoma patients[10].Many potential risk factors for melanoma have been explored,such as exposure to radiation,the number of melanocytic nevi,and genotoxic effects[11].Researchers have found that some genes are closely related to the pathogenesis of melanoma and the prognosis of patients.For example,Lu et al.found that the expression level of ubiquitinlike with PHD and ring finger domains 1 (UHRF1) is closely related to tumorigenesis,indicating that it may be a potential biomarker for melanoma diagnosis and prognosis[12].Xia et al.found that cyclin A2 (CCNA2) was related to the OS of melanoma patients,and the increased expression of tripartite motif-containing protein 32 (TRIM32) led to the increased OS and diseasefree survival risk[13].The identification of oncogenes associated with melanoma requires different therapeutic methods,and small molecule inhibitors can be applied to target specific receptors involved in the pathogenesis of melanoma[14].Therefore,it is particularly important to identify precise and effective biomarkers as well as therapeutic targets for melanoma,which may help to adjust the therapeutic regimen in a timely manner in order to improve the prognosis of melanoma patients.
In recent decades,the development of sequencing technology and high-throughput gene chips has enabled rapid analysis of the gene expression profile of cancer,revealing the underlying mechanisms between genomic background and cancer development[15].With the help of gene chip technology,bioinformatics allows researchers to elucidate valuable information from complex data through sequence alignment,statistical analysis,visualization mapping,biological process annotation,pathway analyses,and biomolecular networks.However,small sample sizes can lead to inflated and unrealistic results relative to the tens of thousands of investigated probes,exacerbating the situation.Combining similar microarray datasets from public databases can increase the generalizability and reliability of statistical results.Network-based meta-analysis provides an opportunity to significantly increase the statistical power and produce more accurate and robust findings,resulting in a more precise determination of DEGs[16].In addition,the development of bioinformatics tools for microarray data helps to integrate and visualize lists of DEGs.NetworkAnalyst (https://www.networkanalyst.ca/),a visual analytics platform for the comprehensive metaanalysis of gene expression data,allows researchers to integrate multiple microarray datasets and has been widely used in microarray-based meta-analysis[17].
In this study,we performed a network-based analysis with 3 eligible microarray datasets,including 122 melanoma and 30 normal skin tissue samples,from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) to identify vital DEGs in melanoma relative to normal skin tissues.PPI network and GO and KEGG pathway enrichment analyses were performed to identify hub genes and pathways that might affect the occurrence of melanoma.The HPA and TCGA databases were used to verify the protein and gene expression levels as well as to conduct survival analysis.These results may provide guidance for identifying new biomarkers and therapeutic targets for patients with melanoma.
Gene expression data for melanoma were downloaded from the GEO database.The search strategy was as follows:“expression profiling by array”(filter),“Homo sapiens”(organism),and“melanoma”(title).Only microarray studies (a) using mRNA samples from melanoma and normal skin and (b) containing three or more samples per group were included in our study.Studies that met any of the following criteria were excluded:1.non-microarray gene chips;2.genes originating from animals rather than humans;3.methylated gene chips;4.chips containing melanoma cell lines.Using these criteria,a total of three datasets (GSE3189,GSE15605,and GSE46517) met the inclusion criteria and were ultimately included for subsequent analysis.The GSE number,information regarding the expression platform,and the number of normal skin and melanoma samples were collected as the basic characteristics.
The network-based analysis was conducted using NetworkAnalyst software.NetworkAnalyst was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for meta-analysis[18].Using the NetworkAnalyst conversion tool,all gene probes were converted to Entrez and Symbol IDs according to the annotation information of the platform.All data from the three datasets were pre-processed using variance stabilizing normalization followed by quantile normalization.Box plots and principal component analysis (PCA) were used to visualize the distribution of samples in each dataset.DEGs from each dataset were explored using the NetworkAnalyst tool with an adjustedP<0.05 and |log fold change| >2.The“VennDiagram”package in R software (The R Foundation,Vienna,Austria) was used to identify the cDEGs among the three datasets.Volcano plots were used to show the distributions of the DEGs in each dataset.The ComBat procedures in NetworkAnalyst were used to stabilize the expression ratios of genes with exceptionally high or low ratios using empirical Bayes methods as well as individual gene variances by shrinking variances across all other genes[19].We also utilized them to reduce potential study-specific batch effects.
Each eligible dataset was uploaded,processed,and annotated to ensure the consistency of class labels and data formats across the three datasets.For microarraybased meta-analysis,three methods (Fisher’s method,REM,and vote counting) were applied with strict significance levels ofP<0.000 1,P<0.000 1,and vote number <2 plusP<0.000 1,respectively.Fisher’s method (-2∑Log(P)) is a widely used statistical approach in meta-analyses for combining p values from different studies,independent of the sample size.The REM was selected for network-based analysis according to the between-study heterogeneity based on Cochran’s Q test[20].In the REM,each study contained a random effect that could incorporate unknown cross-study heterogeneities into the model.Vote counting is a network-based analysis method in which DEGs are first selected based on a threshold (significance level ofP<0.05) to obtain a list of DEGs for each study.The vote for each gene can then be calculated as the total number of times it occurs in all DEG lists.The final DEGs were selected based on the minimum number of votes set by the users.DEGs from each method were identified,and a Venn plot was created using the Venn online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) to identify the cDEGs among the three methods.
We ranked the cDEGs according to the absolute value of the combined effect size (ES);thus,the top 10 upregulated and downregulated genes were selected.The“interactive heatmaps”tool in NetworkAnalyst was applied to visualize the differential expression levels of the genes among the various samples.
To investigate the potential functions of the identified cDEGs,GO and KEGG pathway analyses were performed using the Database for Annotation,Visualization,and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/home.jsp).The GO terms refer to the following items:biological process (BP),cellular component (CC),and molecular function (MF).GO analysis uses hypergeometric tests to perform enrichment analysis of gene sets[21].KEGG pathway analysis aims to identify and visualize significantly enriched pathways of molecular interactions,relationships,and reactions[22].The Cytoscape plug-in BiNGO was used to further perform GO enrichment analyses to better visualize the biological significance of cDEGs[23,24],which could help to reveal the interactions of key proteins and provide clues regarding the upstream and downstream mechanisms of protein action and clinical drug development.
Using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database[25],we constructed a PPI network of the cDEGs found in our research.The modification and visualization of the network were performed by Cytoscape 3.7.2 based on Java software[23].The degree and betweenness centrality of nodes have been widely used to identify hub genes in Cytoscape[26].Genes with a higher degree or betweenness values were considered as potentially important hub genes.Genes were ranked by degree,and hub genes were considered genes with at least 10 node degrees.
The protein expression levels of hub genes were determined using HPA,a website that contains immunohistochemistry-based expression data for the 20 most common types of cancers and the 12 most common individual tumors in each cancer type[27].
Gene Expression Profiling Interactive Analysis (GEPIA) is a newly developed interactive web server for analyzing the RNA sequencing and expression data of 9,736 tumors and 8,587 normal samples from the TCGA database and the Genotype-Tissue Expression project using a standard processing pipeline[28].To verify the mRNA expression levels of the hub genes in melanoma and normal skin samples,box plots were produced using GEPIA.To explore the effects of hub genes on OS,Kaplan-Meier curves were constructed using GEPIA based on the expression profiles in melanoma tissue samples and the corresponding survival times extracted from TCGA.The median expression level of each hub gene was set as the cutoff value,and TCGA cohorts were split into low-and high-expression groups.Log-rank tests and hazard ratios with 95% confidence intervals were used to assess the differences between groups of melanoma patients exhibiting low or high expression levels of hub genes.
The network-based analysis was performed using NetworkAnalyst software.The significance levels of the three methods (Fisher’s method,REM,and vote counting) wereP<0.000 1,P<0.000 1,and vote number <2 plusP<0.000 1,respectively.The Benjamini-Hochberg application of the false discovery rate was used to correct eachP-value.Significantly enriched GO terms and KEGG pathways were identified using hypergeometric tests with aP-value threshold of <0.05.A log-rankP<0.05 was considered to indicate a statistically significant difference in the survival analysis.
Three datasets (GSE3189,GSE15605,and GSE46517) with 122 melanoma and 30 normal skin samples from the GEO database were included in the present study.PCA showed that the datasets were notably disassociated prior to the batch effect adjustment and well intermixed after the ComBat procedures,which ensured the accuracy of the current study (Fig.S1).The characteristics of the three datasets are summarized in Table 1,and the overall workflow is shown in Fig.1.
Table 1 Characteristics of the three microarray datasets included in the network-based analysis
Fig.1 Workflow of the network-based microarray analysis
Using the“Gene Expression Table”of NetworkAnalyst,an integrative analysis across datasets was performed by exploring the DEGs in each dataset.There were 575 617,and 90 DEGs in GSE3189,GSE15605,and GSE46517,respectively.Volcano plots were used to show the distribution of DEGs in each dataset (Fig.S2),and the Venn plot demonstrated that 39 DEGs were common among the three datasets (Fig.2A).
Fig.2 Profile of cDEGs identified in the network-based analysis
Three microarray datasets were analyzed using NetworkAnalyst,a web interface for the integrative analysis of microarray data.Three individual methods (Fisher’s method,REM,and voting count) with stringent significance criteria (combinedP<0.000 1 or vote counts ≥ 2) were applied.A total of 4 723 746,and 808 DEGs were identified,respectively.A Venn plot was used to select 262 cDEGs (Fig.2B).Among these,144 (54.96%) genes were upregulated,and 118 (45.04%) genes were downregulated in melanoma patients.In the combined datasets,SRY-box transcription factor 10 (SOX10),myosin VA (MYO5A),and solute carrier family 45 member 2 (SLC45A2) were the top 3 upregulated genes,while tetraspanin 8 (TSPAN8),PDZ domain-containing 2 (PDZD2),and poly(A) binding protein-interacting protein 2B (PAIP2B) were the top 3 downregulated genes.The top 10 upregulated and downregulated cDEGs identified in the present study are shown in Table 2.A heatmap of these cDEGs is illustrated in Fig.2C,which shows a profound gene expression difference between melanoma and normal skin.
To identify the potential functions of the cDEGs,we performed GO and KEGG pathway analyses using the DAVID database.Significantly enriched KEGG pathways and GO terms were identified using hypergeometric tests with aP-value threshold of <0.05.The top five KEGG and GO (BP,CC,and MF) terms with the most significant differences are shown in Fig.3.Using BiNGO in Cytoscape,the GO terms of BP,CC,and MF with their associations were further visualized,and we were able to obtain a global perspective of the changes in gene expression patterns (Fig.4).
The STRING database was used to set up the PPI network and search for hub genes that could potentially serve as therapeutic targets or biomarkers for melanoma.To provide a clear overview of the PPI network,Cytoscape was employed for network modification and visualization.The color and size of the nodes were set according to their degrees while the edge size was determined usingthe combined scores.The overall PPI network is shown in Fig.5.The cDEGs between melanoma and normal skin were ranked according to their degrees.Targets with at least 10 node degrees were selected as the hub genes.Finally,a total of two hub genes,centromere protein F (CENPF) and pituitary tumor-transforming gene 1 (PTTG1),were identified (Table 3).Both were upregulated in melanoma patients with a degree of 10.The interaction networks of each hub gene are displayed in Fig.S3.
Table 2 Top 10 upregulated and downregulated cDEGs identified in the network-based analysis of melanoma vs.normal skin
Immunohistochemistry using the HPA database showed that both CENPF and PTTG1 were overexpressed at the protein expression level in melanoma samples (Fig.6),which further confirmed the above results.
To validate the differential expression of hub genes in melanoma and normal skin samples,a web-based GEPIA search was implemented.Box plots were generated,which verified that the expression levels of CENPF and PTTG1 were significantly increased in melanoma samples compared with those in normal skin samples (P<0.05) (Fig.7A-B).Furthermore,we explored the effect of the two hub genes on the OS of patients with melanoma.Kaplan-Meier analysis of these genes was performed using GEPIA (Fig.7C-D).The results revealed that the increased expression level of CENPF was significantly correlated with a poor OS (P<0.05).However,there was no significant difference in patient survival between the high and low PTTG1 expression groups (P=0.28).Therefore,high expression of CENPF may represent a poor prognostic factor for patients with melanoma.Fig.8 shows that CENPF was also upregulated in many other cancer types.
Fig.3 Bubble plots of the top five KEGG (A),GO:BP (B),GO:CC (C),and GO:MF (D) pathways/terms with the most significant differences
Fig.4 Overall PPI network of the cDEGs
Fig.5 GO analysis of significant cDEGs with their associations determined using BiNGO in Cytoscape
Table 3 Hub genes identified in the network-based analysis of melanoma vs.normal skin
The incidence of primary cutaneous melanoma is increasing,accounting for 232 100 (1.7%) of all newly diagnosed primary malignant tumors annually[29].However,the pathogenesis of melanoma remains unclear.The development of high-throughput genomic technologies has strengthened our understanding of the complex mechanisms underlying melanoma development and its genomic background.Notably,the results from conventional bioinformatics analysis may lack consistency,reproducibility,and robustness across studies[30].Combining information from multiple microarray datasets would increase the generalizability and reliability of bioinformatics analysis[22].In the present study,three eligible microarrays from the GEO database were selected to identify hub genes in melanoma compared to normal skin tissues,which were further validated using the HPA and TCGA databases.To our knowledge,this is the first attempt to conduct such a multi-dataset study in melanoma research.
Fig.7 Gene expression levels (A-B) and Kaplan-Meier curves (C-D) of CENPF and PTTG1 in GEPIA
Fig.8 Expression levels of CENPF in systemic tissues and various malignant tumors
In our network-based analysis,all of the original datasets underwent strict pre-treatment criteria,which ensured the reality and reliability of our study.By combining three statistical approaches,a total of 262 cDEGs were identified based on stringent standards.Moreover,GO and KEGG pathway analyses were performed to identify the potential functions of the cDEGs.The top five KEGG terms were“Metabolic pathways”,“inositol phosphate metabolism”,“phosphatidylinositol signaling system”,“synaptic vesicle cycle”,and“aldosterone-regulated sodium reabsorption”.Notably,the most enriched GO terms were“response to drug”(BP),“extracellular exosome”(CC),and“protein binding”(MF).These results partly explain why melanoma is prone to metastasis but relatively sensitive to immunotherapy.
Two targets,CENPF and PTTG1,met the criteria and were further selected as hub genes.Both genes were upregulated in melanoma tissues.Subsequently,verification studies were conducted in the TCGA and HPA databases to ensure the robustness of the results.The results of immunohistochemical tissue staining from the HPA,together with the transcript results from TCGA,showed that CENPF and PTTG1 were both upregulated in melanoma,which was in agreement with our analysis.In addition,survival analyses of the two hub genes were performed with the help of GEPIA,which demonstrated that the expression level of CENPF was significantly correlated with unfavorable OS in melanoma patients (P=0.028),indicating that patients with increased expression levels of CENPF may have a poor prognosis.
Previous studies have reported the role of CENPF in other cancer types.For example,Yang et al.[31]found that lymphoid-specific helicase was overexpressed in hepatocellular carcinoma (HCC) and promoted HCC tumor growth through transcriptional regulation of CENPF expression.Chen et al.[32]found that CENPF was upregulated in breast cancer cells and had a positive effect on cellular functions,including proliferation,migration,and invasion.CENPF has also been identified as an effective biomarker in non-muscle-invasive bladder cancer[33].However,to date,there have been no basic or extensive data studies examining the role of CENPF in melanoma.Through GO analysis,we found that CENPF was associated with“response to drug”(BP),“cytosol”(CC),and“protein binding”(MF),which provided clues for exploring effective biomarkers and therapeutics for melanoma.
Nonetheless,several limitations of the current study must be acknowledged.First,although some of the cDEGs have been suggested to be linked with melanoma pathophysiology,novel targets could also be identified from the network-based analysis that may have biological significance in melanoma initiation and development.Second,due to the limited research conditions,we could not performin vitroorin vivoexperiments to validate the expression level of CENPF in melanoma samples.Third,although all of the eligible datasets in GEO were applied,the sample size was still rather small compared to the thousands of genes investigated.Therefore,sample selection bias may have influenced the final results.Finally,although we set strict inclusion criteria,removed ineligible microarrays,and adjusted batch effects,heterogeneity remained,which may have exerted a negative effect on our analysis.
In summary,this study is the first to use network-based analysis for biomarker discovery in melanoma.We combined three microarray datasets and applied three statistical approaches with strict criteria.The precise statistical methods,stringent statistical standards,and cross-testing of multiple databases provided stable and credible results.The results indicated that CENPF was significantly upregulated in melanoma samples and correlated with a poor OS of patients.Thus,it may serve as a potential biomarker and therapeutic target for melanoma.
This study was funded by the National Natural Science Foundation of China (grant no.81972559) and the Shanghai Shenkang Hospital Development Center Project (project no.HDC2020CR2067B).
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
All the authors have consented to the publication of this article.
Competing Interests
The authors declare that they have no competing interests.The authors state that the views expressed in the article are their own and not the official position of the institution or funder.
JY G and ZH F contributed to the conception of the study.KJ S,CY W,and Y X performed the bioinformatics analysis and wrote the manuscript.L W and SY W polished the language and revised the manuscript.M R and XY D adjusted the figures and tables.DH W and ZX G helped perform the analysis with constructive discussions.
Supplementary Figures
Fig.S1 Illustration of PCA plots as validation tools for batch effect removal
Fig.S2 Volcano plots of DEG distribution in GSE3189,GSE15605,and GSE46517
Fig.S3 Interaction networks of PTTG1 (A) and CENPF (B)
Chinese Journal of Plastic and Reconstructive Surgery2020年4期