Nayereh Esmaeilzadeh, Mohammadtaghi Shakeri, Mostafa Esmaeilzadeh, Vahid Rahmanian
1Department of Epidemiology, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
2Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
3Department of Mechanical Engineering, Mashhad Branch, Azad University, Mashhad, Iran
4Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran
Currently, the COVID-19 epidemic has spread to more than 210 countries, with 3 272 202 confirmed cases and 230 104 deaths globally as of 3rd May 2020. Iran is found as the hotspot region of COVID-19 in the Eastern Mediterranean with more than 93 thousands confirmed cases and 5 957 deaths until 30th April[1]. Suppression strategies especially case isolation, elective home quarantine, and other mitigation approaches such as the closure of educational centers and transmission control by the lockdown of social activities are applied to reduce the basic reproduction number to less than 1[2]. The strategies have achieved varying degrees of success in different countries[3]. The autoregressive integrated moving average (ARIMA) models were developed to determine the temporal patterns and short-term prediction[4].
This approach is useful for forecasting and evaluation of confronting measures and a number of studies have confirmed it[5-7]. The results of this study can help to make an informative decision by the government and set proper policy to adopt interventions for this infectious disease.
The daily new laboratory-confirmed, recovered and death due to COVID-19 cases between 20th February 2020 and 30th April 2020 extracted from World Health Organization website[1].
Firstly, we developed the ARIMA model for each series. This model includes single regression, multiple regression, and the moving averages. It can remove the confounding effect of time. Therefore, the time series model ARIMA (p, d, q) consists of several components. The order of p, d, and q is explained as the autoregressive part of the model, the integrated part of the model, and the moving average parameter[8].
This linear combination is formulated as:
Where y is a dependent variable (daily cases of COVID19), λpis an autoregressive operator coefficient γqis the moving average operator coefficient, yt-pis the value of the cases of COVID-19 in an earlier time, εt-qis the value of the cases of COVID-19 deviation in q time, and etis a random error term with the white-nose distribution. The assumption of this model is based on the stationary data, so we performed the Bartlett and the unit-root tests for determination stationery for variance and mean value of data and then transformed them if needed. For estimation of the number of autoregressive and moving average parameters, we used autocorrelation functions and partial autocorrelation functions correlograms, after which possible models were identified[9].
In the next step, we evaluated the goodness-of-fit of the end model through checking white noise residuals with Ljung-Box (Q) test and the best- model which was fitted to the data was selected based on least value of the Akaike Information Criterion (AIC)[5].
Then, the best ARIMA models were applied to the prediction of the events of COVID-19, and the forecasting precision was estimated by the root-mean-square error (RMSE). This is computed using the following formula:
Figure 1. Autocorrelation and partial autocorrelation functions plots and daily observed numbers of series of COVID-19, fitted values (20th February to 30th April) and 1-step ahead predicted values (14 days ahead).
Where Ytof events is the observed number, Υt is the forecast values at time t, and N is the number of events[10]. The statistical significance level was set at 0.05. Stata (ver.14) was used as the software for the statistical analysis.
The trend of the actual and predicted number of cases for each series of COVID-19 including new cases, recovery, and death cases for 71 days from 20th February 2020 to 30th April 2020 is presented in Figure 1. Also, these graphs indicate the forecast numbers for the 14 days ahead as shown in see Table 1.
After the stationary tests, the square root transformation was used for the new cases, death cases, and recovery cases, and no one needed a regular difference. All statistical methods were performed on the transformed data. Autocorrelation functions and partial autocorrelation functions plots were drawn for each series of COVID-19 cases. In these charts, the grey zone displays the 95% confidence interval and the lines that are continuously out of range is considered as significantly different (Figure 1).
The potential ARIMA models for the new cases of COVID-19 cases were ARIMA (1, 0, 0) and ARIMA (1, 0, 1), and for the recovery cases were ARIMA (1, 0, 1) and ARIMA (2, 0, 1). Finally, ARIMA (1, 0, 1) and ARIMA (1, 0, 0) were recruited for the death cases.
The goodness-of-fit of the models was evaluated by using the Ljung-Box (Q) test and AIC. The ARIMA (1, 0, 0), ARIMA (1, 0, 1), and ARIMA (1, 0, 1) were selected for determining the new confirmed cases, the death cases, and the recovery cases as the best ARIMA models, respectively.
Table 1. The forecast values (95% CI) according to fitted models of COVID-19 for the period from 1st May to 14th May 2020.
Table 2. Characteristics of the best ARIMA fitted models series of COVID-19 from 20th February to 30th April.
The results of the goodness-of-fit of the models are presented in Table 2. Note that this is only for the best-fitted models. For the residuals of the selected models, it is shown that the data were completely modelled.
We found models based on the best models that fit for each series of COVID-19 between 20th February 2020 and 30th April 2020 and then forecasted them for 14 days ahead (Figure 1A and Table 1). Next, we compared the actual data of COVID-19 events with the predicted cases. The predicted models are approximately in line with the real death and new confirmed cases, but the recovery cases are less precise than others as shown in Figure 1. The formula of models is as follows:
The equation of daily new laboratory-confirmed cases is:
Eq. (3) indicates that an increment in the square number of new cases at this time leads to increase of 98% in square number for new cases, one day ahead (P<0.001), and the Wald test is significant. After an exponential increase in the middle of the epidemic period, the situation is converted as shown in Figure 1A. In a short time, we have predicted a declining trend in the occurrence of new cases.
The equation of the recovery cases is:
Eq. (4) shows that the rising square number of recovery cases at this time results in a significant increase of 98% in the square number of recovery cases one day ahead, and a negative correlation with the deviation in one time ago (P<0.001). The Wald test is significant. This character is shown in Figure 1B, where we expect to see a somewhat decreased trend over time.
The equation of death cases is as follows:
Similarly, Eq. (5) shows that the increasing square number of death cases at this time leads to an increase in the square number of death cases one day ahead, and a negative correlation with the deviation in one time ago (P<0.001). The Wald test is significant. Figure 1C shows that after an exponential increase in the early stage of the epidemic period, the situation converted. In a short time, we predict a smoothly decline trend in the occurrence of death cases as shown in Figure 1C.
The forecasting in this study was based on the primary time series methods. This means that it is affected by the outlier data, not considering the unknown noise. Therefore, the models have better performance for the short term, but the findings should be explained with thriftiness[8,9]. However, the application and interpretation of these models are simple and is an immediate tools for monitoring systems[6,7].
The government of the Islamic Republic of Iran advised to close the educational centers and locked down activities and other confronting approaches from the earliest days of the outbreak on 24th February.
It is noted that Iranian people celebrate their own new year starting on 21th March 2020. They follow the calendar which is based on solar and is different from Christian’s calendar. In their new year, people visiting family and friends traditionally and this results in the growing number of contacts between people which eventually can increase the number of new cases and casualties with the spread of COVID-19. It is anticipated that these patterns may be repeated during or after the Ramadan (the holy month in Islam) due to crowding people for praying in mosques and holy shrines. Therefore, the government should consider preventing measures to control the spread of the viruses under these conditions.
The predicted number of new confirmed, death, and recovery cases indicated somewhat is decreasing. The goodness-of-fit criteria were suitable for these events. However, the confirmed cases can rise remarkably, unless necessary preventive measures are kept in place. In conclusion, the proposed models in this work can act as a predictive tool for public health planning for better understanding of the dynamics of COVID-19 in a resource-constrained context with minimal data entry. Updating these data can be highly useful for an accurate predictions.
Conflict of interest statement
The authors declare that there is no conflict of interest.
Acknowledgements
This study was conducted using existing COVID-19 data on the web official site of the World Health Organization and did not impose additional costs. The authors would like to thank for the support received from Mashhad University of Medical Sciences (Identified No: IR.MUMS.REC.1399.140).
Authors’ contributions
All authors contributed equally in conceptualizing the article, retrieving related literature and drafting the final manuscript.
Asian Pacific Journal of Tropical Medicine2020年11期