Zhaowei Xu, Yang Li, Qing Lei, Likun Huang, Dan-yun Lai,Shu-juan Guo, He-wei Jiang, Hongyan Hou, Yun-xiao Zheng,Xue-ning Wang, Jiaoxiang Wu, Ming-liang Ma, Bo Zhang, Hong Chen,Caizheng Yu, Jun-biao Xue, Hai-nan Zhang, Huan Qi, Siqi Yu,Mingxi Lin, Yandi Zhang, Xiaosong Lin, Zongjie Yao, Huiming Sheng,Ziyong Sun, Feng Wang,*, Xionglin Fan,*, Sheng-ce Tao,*
1 Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Jiao Tong University, Shanghai 200240, China
2 Key Laboratory of Gastrointestinal Cancer(Fujian Medical University),Ministry of Education,School of Basic Medical Sciences,Fujian Medical University, Fuzhou 350122, China
3 Department of Pathogen Biology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430074, China
4 Fujian Key Laboratory of Crop Breeding by Design, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops,Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350028, China
5 Department of Clinical Laboratory, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology,Wuhan 430074, China
6 Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
7 Department of Public Health, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology,Wuhan 430074, China
KEYWORDS
Abstract Coronavirus disease 2019 (COVID-19), which is caused by SARS-CoV-2, varies with regard to symptoms and mortality rates among populations. Humoral immunity plays critical roles in SARS-CoV-2 infection and recovery from COVID-19. However, differences in immune responses and clinical features among COVID-19 patients remain largely unknown.Here,we report a database for COVID-19-specific IgG/IgM immune responses and clinical parameters (named COVID-ONE-hi). COVID-ONE-hi is based on the data that contain the IgG/IgM responses to 24 full-length/truncated proteins corresponding to 20 of 28 known SARS-CoV-2 proteins and 199 spike protein peptides against 2360 serum samples collected from 783 COVID-19 patients. In addition, 96 clinical parameters for the 2360 serum samples and basic information for the 783 patients are integrated into the database. Furthermore, COVID-ONE-hi provides a dashboard for defining samples and a one-click analysis pipeline for a single group or paired groups. A set of samples of interest is easily defined by adjusting the scale bars of a variety of parameters. After the‘‘START”button is clicked,one can readily obtain a comprehensive analysis report for further interpretation. COVID-ONE-hi is freely available at www.COVID-ONE.cn.
COVID-19 is an unprecedented global threat caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),which has already caused 209,308,033 infections and claimed 4,393,014 lives as of August 19, 2021 (https://coronavirus.jhu.edu/map.html)[1].However,there is still no effective medicine [2,3] for COVID-19.
Most patients recover via their own immunity, including SARS-CoV-2-specific IgG responses, especially neutralizing antibodies [4–6]. Overall, it is of great interest to decipher SARS-CoV-2-specific IgG/IgM responses at a system level and to correlate antibody responses to clinical parameters.
To understand how the human immune system responds to SARS-CoV-2, we constructed a SARS-CoV-2 proteome microarray containing 18 of the 28 predicted proteins and applied it to characterize IgG and IgM antibodies for the sera of 29 convalescent patients [7]. Recently, we upgraded the SARS-CoV-2 protein microarray, and the new microarray contains 24 full-length/truncated proteins corresponding to 20 known SARS-CoV-2 proteins and 199 peptides fully covering the spike protein [8]. Using this microarray, we screened 2360 serum samples from 783 COVID-19 patients, covering mild, severe, and critical cases. Thus, we compiled a dataset with comprehensive information on SARS-CoV-2-specific antibody responses and rich in clinical parameters.
To share the dataset efficiently, in addition to the related research that we have already published[9–13],we built a database for COVID-19-specific humoral immune responses and clinical parameters, namely, COVID-ONE-hi (www.covidone.cn), using Shiny. This database contains a comprehensive dataset of IgG and IgM responses to the 24 full-length/truncated proteins corresponding to 20 known SARS-CoV-2 proteins and 199 spike protein peptides from a cohort of 783 COVID-19 patients. To bolster clinical relevance, 96 clinical parameters and basic patient information are also included.COVID-ONE-hi provides search,data analysis, and visualization functions. In particular, COVID-ONE-hi integrates antibody response landscape analysis, correlation analysis,machine learning,etc.In the data analysis module, users can easily define sample group(s) of interest by adjusting scale bars, and the sample group can be either one group or paired groups. In-depth analysis is achieved by clicking a single button; optionally, the results can be saved and downloaded as an independent package for further analysis.
To our knowledge,COVID-ONE-hi is the first database for COVID-19-specific humoral immune responses. We believe that COVID-19 humoral immunity will be of broad interest and will facilitate understanding of immune responses in COVID-19 to combat the pandemic.
COVID-ONE-hi is a Shiny (v1.5.0)-based database. Shiny dashboard (v0.7.1) and Shiny BS (v0.61) were used to shape the UI, and the package DT (v0.15) was used to format data tables.For data analysis,dplyr(v1.0.2),tidyverse(v1.3.0),randomForest (v4.6–14), pROC (v1.16.2), and umap (v0.2.6.0)were integrated into Shiny. Pheatmap (v1.0.12) and ggplot2(v3.3.2)were used to carry out plotting.For the basic environment,the operation system is Ubuntu 20.04 LTS,and the version of R is 3.6.3.
To calculate the rate of antibody response for each protein,the mean plus 2 times standard deviation (SD) of the control serum was set as the cut-off.R was used for most data analysis and drawing,i.e., Pearson correlation coefficient, receiver operating characteristic (ROC), T-test, cluster analysis, and machine learning.
In this study, we collected 2360 serum samples from 783 patients (387 males and 396 females) with an average age of 61.4 years and average onset time of 50 days. Among these 783 patients, there were 369 mild, 309 severe, and 105 critical cases, with 723 cured and 60 dead (Figure 1A;Table 1,Table S1).
Table 1 The clinical information of involved patients
Figure 1 Overview of data resources and functional modules of COVID-ONE-hi
To systematically analyze immune responses to SARSCoV-2 infection, we screened 2360 serum samples using SARS-CoV-2 protein microarray that contains 24 full-length/truncated proteins corresponding to 20 known SARS-CoV-2 proteins and 199 peptides fully covering the spike protein.Additionally, we analyzed 89 blood parameters for the 2360 serum samples (including complete blood count, blood chemistry study, and blood enzyme tests). Hence, we obtained a comprehensive dataset that contains COVID-19-specific humoral immune responses and clinical parameters.
By combining clinical information, IgG/IgM immune responses, and blood parameters, we established a database(COVID-ONE-hi) that provides a one-stop analysis pipeline for COVID-19-specific humoral immune responses and clinical parameters(Figure 1B).To help users obtain more COVID-19 serum profiling data,we set up a page on the COVID-ONE-hi website,named‘‘More studies”,to archive other highly related data of COVID-19 serum profiling (protein/peptide microarray/phage display)[14–19].In addition,a healthy control dataset was added to the ‘‘HELP” page, which contains the IgG and IgM responses for 528 healthy people against the 24 full-length/truncated proteins and 199 spike protein peptides(Table S2).
The following three steps are included in the analysis module: users select a set of samples in the panel of patient information and click ‘‘START”; COVID-ONE-hi filters candidate samples according to the given parameters; and COVID-ONE-hi conducts analysis and provides results on the webpage.
To demonstrate how to use COVID-ONE-hi for analysis,we provide two cases for single group and paired groups as examples.
To study the features of dead COVID-19 patients, we selected the ‘‘death” parameter of outcome in a single-group analysis module. This cohort contained 392 serum samples from 60 patients (38 malevs.22 female), with an average age of 69.6 years (Table 2). The IgG response landscape analysis of SARS-CoV-2 proteins showed that the positive rates of S1 subunit of spike protein (S1 protein), N protein, and ORF3b were 95%, 93%, and 87%, respectively, which are consistent with previous studies [20,21] (Figure 2A). Interestingly, NSP7 had a IgG-positive rate of 88%, suggesting that NSP7 may play an important role in COVID-19(Figure 2A).In addition,the spike peptide S1-45 had the highest positive rate(87%)for the IgM response, indicating that the region including S1-45 may play an important role in IgM immunity (Figure S1).
Table 2 Serum sample information of Case I
Correlation analysis of clinical parameters showed that the neutrophil count had negative correlations with the monocyte count and the lymphocyte ratio (Figure 2B). In addition,correlation analysis of IgG responses showed high correlations between S1 IgG response and IgG responses of full-length/truncated N proteins, with S1 IgG response and N-Cter IgG response showing the highest correlation (Figure 2C and D).To study influencing factors of S1 antibody production, we analyzed the correlation between the S1 IgG response and clinical parameters, and found that S1 IgG response correlated with globulin (Figure 2D).
Previous studies have shown that gender has considerable effect on the severity and outcome of COVID-19 [22,23] and is associated with underlying differences in immune responses to infection [24]. To study differences in IgG/IgM immune responses and clinical parameters between the genders, we defined males as Group 1 and females as Group 2 for severe and critical patients, with 231 males at average age of 64.3 and 183 females at average age of 68.1. Consistent with previous studies [25], males had a higher risk of severe/critical COVID-19 than females (231/387vs.183/396,P< 0.001)(Tables 3and4).
Table 3 Serum sample information of Case II
Table 4 The binary logistic regression parameter of severity in association with the gender among COVID-19 patients
UMAP analysis showed no overall difference in IgG immunity between 387 males and 396 females (Figure 3A). To explore the disease mechanism in the genders, we performed in-depth analyses for antibody responses and blood parameters using COVID-ONE-hi. The antibody response landscape showed that male patients had higher IgG-positive rates than females for ORF9b,RdRp,and NSP1(Figure 3B).Moreover,longitudinal antibody dynamic analysis showed that males had a stronger ORF9b IgG response during the whole period of symptom onset, with a stronger NSP1 IgG response during the early stage of symptom onset,but had no significant difference in RdRp IgG response compared with females (Figure 3C). ORF9b has been considered a drug target for the treatment of COVID-19 because it suppresses type I interferon responses [26–28]. To explore the relevance between ORF9b antibody responses and COVID-19 severity, we compared ORF9b IgG responses between mild and severe/critical cases in different genders, and the results showed that higher ORF9b IgG response was observed in severe/critical cases than in mild cases in males, whereas no significant difference was observed between mild and severe/critical cases in females(Figure 3D).
To further decipher differences between female and male patients of COVID-19, we employed random forest for machine learning. The results showed creatinine, which is an acute kidney injury marker, to be the most significant factor between males and females (Figure 4A). To explore the relevance between creatinine and gender in COVID-19, we compared the median and dynamic creatinine levels between males and females, and observed that both the median and dynamic creatinine levels in males were significantly higher than those in females (Figure 4B and C). To explore the relevance between creatinine and COVID-19 severity, we
compared the dynamic creatinine levels between mild and severe/critical cases in males and females, respectively. Similar to ORF9b IgG responses, male patients with severe/critical COVID-19 symptoms had a higher level of creatinine(Figure 4D). Hence, ORF9b antibodies and creatinine are associated with severe/critical symptoms in male COVID-19 patients, which suggests different pathogeneses and complications between male and female COVID-19 patients.
In this study, we built COVID-ONE-hi, a COVID-19-specific database, using R Shiny. COVID-ONE-hi is based on a comprehensive dataset generated by analyzing 2360 COVID-19 sera using the SARS-CoV-2 protein microarray containing 24 full-length/truncated proteins corresponding to 20 of the 28 known SARS-CoV-2 proteins and 199 peptides completely covering the entire spike protein sequence.
There are several published studies identifying the clinical characteristics, biomarkers,and specific antibody responses of diverse COVID-19 patients(Table S3).To strengthen the credibility of our dataset,we compared SARS-CoV-2-specific antibody responses with other studies at different levels. At the protein level, we analyzed the dynamic response to the S1 and N proteins.The results showed that the responses to S1 and N proteins peaked at 6 weeks after the onset of symptoms for IgG and 4 weeks for IgM, which is consistent with the results of previous studies[18,20](Figure S2).At the peptide level,we compared IgG recognition of immunodominant regions in the SARS-CoV-2 spike protein and found that some high response areas that we identified[12]are consistent with those identified by Shrock et al. [14]: aa 25–36, aa 553–588, aa 770–829,aa 1148–1159, and aa 1256–1273. And another hot spot(aa 451–474) was only detected in our study. Regarding antibody diagnosis,Assia et al.[19]achieved an area under the curve(AUC)value of 0.986 for IgG and 0.988 for IgM for the detection of prior SARS-CoV-2 infection when combining N and spike proteins.In our study,the AUC values of the N protein for IgG and IgM are 0.995 and 0.988,respectively,and the AUC values of the S1 protein for IgG and IgM are 0.992 and 0.992, respectively. We also found that S2-78 (aa 1148–1159)IgG is comparable to S1 IgG for COVID-19 patients, with an AUC value of 0.99 for IgG and 0.953 for IgM[11].
To our knowledge,COVID-ONE-hi is the first database for COVID-19-specific immune responses enriched in clinical parameters and has the following features. 1) Universality:COVID-ONE-hi contains 783 COVID-19 patients that have been classified by their medical history (Table S4), and thus will be of broad interest for researchers and clinicians from diverse backgrounds. 2) Accessibility: COVID-ONE-hi provides a one-stop analysis pipeline, by which users can easily obtain meaningful information. 3) Scalability: COVID-ONEhi is built on the R platform, which is freely accessible, and many modular tools are readily available; thus, we can easily expand and incorporate new analyses for the dataset whenever necessary without changing the overall structure of the database. Nonetheless, there are some limitations for COVIDONE-hi. For example, it lacks data for convalescent patients,peptide-level humoral responses to proteins other than S protein,and multicentre samples.In the future,we will analyze the dynamic responses of SARS-CoV-2-specific antibodies using ~ 500 serum samples from ~ 100 COVID-19 convalescent patients. We will also integrate published peptide microarray/phage display-related data [14–16,29] and attempt to update the database covering the whole SARS-CoV-2 proteome at the peptide or amino acid level.In addition, the SARS-CoV-2 protein microarray has already been promoted by CDI Labs (www.cdi.bio) and ArrayJet(www.arrayjet.co.uk), and we anticipate more diverse data for SARS-CoV-2-specific antibody responses from multicentre samples. We strongly believe that by sharing a large dataset and facilitating data analysis, COVID-ONE-hi will be a valuable resource for COVID-19 research.
The study was approved by the Ethical Committee of Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China (ITJ-C20200128).Written informed consent was obtained from all participants enrolled in this study.
COVID-ONE-hi is freely accessible at www.covid-one.cn. If users need the raw data of antibody responses or clinical parameters, please contact the corresponding author (taosc@sjtu.edu.cn).
Zhaowei Xu:Software, Formal analysis, Writing - original draft.Yang Li:Methodology, Formal analysis, Investigation.Qing Lei:Methodology, Formal analysis, Investigation.Likun Huang:Software.Dan-yun Lai:Methodology, Formal analysis, Validation, Investigation.Shu-juan Guo:Methodology, Formal analysis.He-wei Jiang:Methodology, Investigation.Hongyan Hou:Resources.Yun-xiao Zheng:Formal analysis.Xue-ning Wang:Formal analysis.Jiaoxiang Wu:Resources.Ming-liang Ma:Formal analysis.Bo Zhang:Resources.Hong Chen:Formal analysis.Caizheng Yu:Resources.Jun-biao Xue:Formal analysis.Hai-nan Zhang:Methodology, Investigation.Huan Qi:Formal analysis.Siqi Yu:Formal analysis.Mingxi Lin:Formal analysis.Yandi Zhang:Investigation.Xiaosong Lin:Investigation.Zongjie Yao:Investigation.Huiming Sheng:Resources.Ziyong Sun:
Investigation.Feng Wang:Resources.Xionglin Fan:Conceptualization, Investigation.Sheng-ce Tao:Conceptualization,Methodology, Writing - review & editing, Supervision. All authors have read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
Acknowledgments
This work was partially supported by the National Key R&D Program of China Grant (Grant No. 2016YFA0500600) and the National Natural Science Foundation of China (Grant Nos. 31970130, 31600672, 31900112, 21907065, and 32000027).
Supplementary material
Supplementary data to this article can be found online at https://doi.org/10.1016/j.gpb.2021.09.006.
ORCID
ORCID 0000-0002-9134-1892 (Zhaowei Xu)
ORCID 0000-0002-3182-6169 (Yang Li)
ORCID 0000-0002-6679-1752 (Qing Lei)
ORCID 0000-0002-3377-8340 (Likun Huang)
ORCID 0000-0001-8719-6947 (Dan-yun Lai)
ORCID 0000-0002-4252-0608 (Shu-juan Guo)
ORCID 0000-0001-8700-7042 (He-wei Jiang)
ORCID 0000-0003-2337-0509 (Hongyan Hou)
ORCID 0000-0001-8461-8884 (Yun-xiao Zheng)
ORCID 0000-0001-8550-7166 (Xue-ning Wang)
ORCID 0000-0002-2309-4164 (Jiaoxiang Wu)
ORCID 0000-0002-0045-4876 (Ming-liang Ma)
ORCID 0000-0002-1441-3100 (Bo Zhang)
ORCID 0000-0002-5412-7467 (Hong Chen)
ORCID 0000-0003-4583-9339 (Caizheng Yu)
ORCID 0000-0003-2029-7430 (Jun-biao Xue)
ORCID 0000-0002-1058-0462 (Hai-nan Zhang)
ORCID 0000-0002-3800-0140 (Huan Qi)
ORCID 0000-0001-9373-4596 (Siqi Yu)
ORCID 0000-0002-9745-9488 (Mingxi Lin)
ORCID 0000-0001-6268-5461 (Yandi Zhang)
ORCID 0000-0002-1792-9639 (Xiaosong Lin)
ORCID 0000-0002-4383-1862 (Zongjie Yao)
ORCID 0000-0001-9382-3687 (Huiming Sheng)
ORCID 0000-0002-6443-9755 (Ziyong Sun)
ORCID 0000-0001-6324-9135 (Feng Wang)
ORCID 0000-0001-9754-372X (Xionglin Fan)
ORCID 0000-0002-9210-1823 (Sheng-ce Tao)
Genomics,Proteomics & Bioinformatics2021年5期