Name in English ID Emails Mohammad AL Dahneem 2180005862 2180005862@iau.edu.sa Amjad Mahdi Motiloq 2180002090 2180002090@iau.edu.sa Mohammed Hashim 2180001833 2180001833@iau.edu.sa Ayman Al-jumaia 2180004090 2180004090@iau.edu.sa Coordinator’s Name Dr. Mahbubunnabi Tamal The Date 28 / 04 / 2021 1 Prediction of COVID-19 Cases Abstract Covid-19 becomes global pain to societies. This means that it has become a threat to the whole world. The increase in confirmed cases in most countries, as well as deaths, has increased significantly, which warns of facing economic disasters and human losses in these countries. This is what prompted researchers to work in the search for mathematical models and statistical calculations to obtain predictions of what might happen in the coming days for the curves of this epidemic, then know the economic effects. The Kingdom of Saudi Arabia is one of the countries that has been affected by this epidemic, with economic impacts and human losses. Using real-time data from 15 March 2020 to 21 April 2020 provided from the ministry of health, the data fitted in software to forecasting the deaths and recovery of an epidemic in two coming weeks. Our result of predicting for coved -19 epidemic in the kingdom is the cumulative deaths will be growing continuously and reaches 40 deaths at beginning of May 2020. Then comparing our predicting module with models that were created by other researchers such as ARIMA, SIR models. Found that our module is approximately the same as that compares it with it and the forecasting is the same as an exponential function that grows continuously. Therefore, as a result, the Saudi Ministry of Health must take measures to close some commercial activities and curb human gatherings to preserve the health of citizens and prevent the spread of this epidemic. I. Introduction Coronavirus is a wide range of viruses that cause diseases ranging from colds to more serious diseases, such as the Middle East respiratory syndrome (MERS) and Severe acute respiratory syndrome (SARS). It began in Wuhan, China, in December 2019 [1] [14]. A non-organism that is originally protein-coated and consists of an animal-sourced RNA that is likely to have spread through bats or two, the word corona is a Latin word in the sense of crown [2]. Coronavirus is known as an animal source, which means that it first evolved in animals and then transmitted to humans, and coronavirus has not been definitively linked to a particular animal, but researchers believe that the transmission of the virus occurred in the open food market in Wuhan, China [2]. 2 Most people who got affected by Covid-19 initially experience symptoms similar to colds, flu, and the most common symptoms are high heat, a narrow cough, the same general fatigue, and others that are not widespread such as muscle pain, nausea, diarrhea, loss of olfactory sense, head pain, throat pain, and recovery without the need for special treatment [15]. Older persons and those with underlying medical problems such as cardiovascular diseases, diabetes, chronic respiratory diseases, and cancer are the most likely to develop serious illnesses [3] [4]. Hance, the purpose of this research is to compare two prediction models, which is the AMIRA and SIR. Also, to find which is the best prediction model to the data that we have. Literature review a) The SIR prediction model The aim of the SIR model is to predict the number of the individuals who are actively infected, vulnerable to infection, have recovered from infection, or died due to the infection at any given time. The SIR model which stands for (susceptible, infected and recovered), was first introduced in 1927 after a decade of the influenza pandemic [10]. The simplicity of the SIR model makes it easy to compute which might be one of reasons for its popularity. It allows modelers to estimate disease behavior by approximating a small number of parameters. There are only 2 parameters that define the SIR model: the effective contact rate (β), that affects the transition from the susceptible to the infected compartment, and the rate of recovery (γ), that affects the transition from the infected to the recovered compartment [11]. b) The ARIMA prediction model ARIMA - the autoregressive integrated moving average - is a famous analysis statistical model that attempt to predict the trends or understand the data set by using time series data to accomplish 3 the required prediction. They are known for their excellent approach to the time series forecasting and their ability to approach the problem in a very complementary way. In other words, the description of the autocorrelations in the data is the main objective of the ARIMA models [12]. Many famous forecasting models such as Random-walk, random-trend models, autoregressive models, and exponential smoothing models are in fact just a special case of the Arima model. ARIMA (p,d,q), is a standard notation where the integer values are substituted in the place of the parameters as indication of using the ARIMA model. The ARIMA model parameters are defined as follows: p: lag observations number included in the model. d: the differencing degree. q: The size that the moving average window operates. The process of fitting in the ARIMA model requires the stationarizition of the time series which can be done by differencing. Stationarizing a time series helps the researcher to find clues about the proper forecasting model. Statistical stationarity: is when the statistical properties like mean, variance and autocorrelation become constant over time [13]. The forecasting equation for the Arima model is constructed as follows. First, denoting the y with the d^th difference of Y, which means: The general forecasting equation in terms of y: 4 The SIR model is one of the simplest models that is used to predict infectious diseases that are transmitted from one human to another human, and many models are derived from this basic form. The name of the SIR model is short for S(t) which represents the number of susceptible individuals. I(t) represent the number of infectious individuals, R(t) for the number of recovered and immune individuals. Transmission of the disease depends on the friction between the population and if a person infected meets an exposed person who is not immune to the disease, it transmits the infection to him [5]. ARIMA stands for (Auto-Regressive-Integrated-Moving-Average) it is a class of models that describes a particular time series based on its own previous values. ARIMA model is illustrated by three terms, (p, d, q). “p” It refers to the number of lags of Y to be used as predictors (AutoRegressive). “q” refers to the number of lagged errors that should go into the ARIMA Model (Moving Average). “d” refers to the minimum number of differences needed to make the series stationary [6]. c) Predicting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA model Saudi Arabia has two holy cities, which are mecca and medina. In these two cities held the Umrah and Hajj pilgrimages, about 2 million people are planning to visit the Saudi Arabia to perform the Umrah and Hajj pilgrimages. Hence, it is necessary to do the prediction of the covid19 cases in Saudi Arabia. 5 In [7], they do the prediction of the covid-19 cases after four weeks of the actual data date, by using different models, ARIMA, ARMA, AR, and MA. Moreover, comparing each model by divided the data in two groups to evaluate the performance of each model. The first group is to be testing, and the other used to training the model. By using this method, and analyzing the data, they found the best performance is the ARIMA model. Also, give acceptable agreement with the actual values that can predict the cases in Saudi Aribia accurately [7]. According to the ARIMA prediction modal in [7], the number of the coronavirus cases is increases and will not decreases. However, the Saudi Aribia has completely closed its borders, and preventing the travel to and from it for citizens, residents, and tourists. Therefore, the Saudi Aribia government has suspended the Umrah and Hajj pilgrimages at that period. d) Predicting the Outbreak of COVID-19 in Saudi Arabia using SIR model After discovering the cases of Covid -19 in the Kingdom of Saudi Arabia and the virus spread in the world, the researchers started searching and statistical analysis to know the prediction of covid-19 and what might happen from the spread of this virus or its regression and recovery from it in the Kingdom. They analyzed approximately 2 months of real-time data for recovering and death cases for these two months. They analyze the data to forecasting by using two models, the Logistic Growth and SIR models. The Logistic Growth Model has been used in many cases earlier such as predicting the Ebola epidemic in 2015 [8]. The dynamics of the epidemic is the analysis of the cumulative total of cases infected with Covid- 19. This model can analyze how many days the patient can quarantine and how can we benefit from quarantine in reducing the rates of infection with the epidemic. Susceptible-InfectedRecovered Model (SIR), In this model, assumptions were made, including a constant rate mixing of people, the population, and the number of recoveries equal to the number of infected. By using 6 this model, is found that the outbreak will rise dramatically to mid-May 2020. After this rise, cases will remain at a steady rate until June 2020, then the curve begins to decline until the end of the outbreak, which is at the beginning of July 2020. They found that this prediction is not accurate because the data they were analyzed was small data [9]. To complete this prediction, they need more information and data. Such as the number of infections for a long time period and how many people infect without knowing, also the rate of spread in children. However, this model predicts that covid -19 is spread widely in Saudi Arabia at least for 2 months. This means that there are many people who will be infected by this virus if the ministry of health does not take Procedures to close some activities and a curfew during that period [9]. In conclusion, the literature discussed two models, which are the autoregressive integrated moving average (AMIRA) and stands for Susceptible SIR models. Furthermore, it is compared the two models, to figure out which model is acceptable for our data. Also, it is provided two examples of the prediction of covid-19 cases in Saudi Aribia, by the two model AMIRA and SIR. II. Materials and Methods A) Database The data was collected and organized in table 1 & table 2. The rows representing the date and columns showing the number of recovery cases, the number of death cases, the number of cumulative recovery cases, and the number of cumulative deaths cases. Table 1 Covid -19 cases in Saudi Arabia from 15 March to 4 April 7 Date Number of Recovery Number of Death cumulative recovery cumulative deaths March 15, 2020 1 0 1 0 March 16, 2020 0 0 1 0 March 17, 2020 0 1 1 1 March 18, 2020 1 0 2 1 March 19, 2020 0 0 2 1 March 20, 2020 3 2 5 3 March 21, 2020 0 0 5 3 March 22, 2020 1 0 6 3 March 23, 2020 1 0 7 3 March 24, 2020 0 2 7 5 March 25, 2020 0 0 7 5 March 26, 2020 0 0 7 5 March 27, 2020 0 1 7 6 March 28, 2020 0 1 7 7 March 29, 2020 1 0 8 7 March 30, 2020 0 3 8 10 March 31, 2020 0 0 8 10 April 1, 2020 0 1 8 11 April 2, 2020 0 0 8 11 April 3, 2020 0 0 8 11 April 4, 2020 0 0 8 11 Table 1 (continue) Covid -19 cases in Saudi Arabia from 5 April to 21 April Date Number of Recovery Number of Death cumulative recovery cumulative deaths April 5, 2020 0 0 8 11 April 6, 2020 0 0 8 11 April 7, 2020 1 0 9 11 April 8, 2020 1 0 10 11 April 9, 2020 0 1 10 12 April 10, 2020 0 0 10 12 April 11, 2020 1 0 11 12 April 12, 2020 0 1 11 13 April 13, 2020 0 0 11 13 April 14, 2020 0 0 11 13 April 15, 2020 0 0 11 13 April 16, 2020 0 1 11 14 April 17, 2020 0 0 11 14 April 18, 2020 1 1 12 15 April 19, 2020 0 0 12 15 April 20, 2020 0 0 12 15 April 21, 2020 1 1 13 16 8 B) MS Excel These numbers were collected, graphs and statistical analysis were made using Microsoft Excel and to see if this information was sufficient to make a comparison with other models. By using MS excel software to fit the data using polynomial fitting equation. Polynomial formula: F(x) = an xn + an-1xn-1 + an-2xn-2 + … + a1x +a0 = 0 Using the equation shown, the cumulative recovery and cumulative deaths curves are fitted and the equations of these two curves were determined. using these equations can predicting for the coming period of this coved-19 epidemic. C) Comparison between models When the curves and prediction equations are determined for the coming period, the results are compared with models that researchers have worked on previously for predictions of the Covid-19 epidemic in Saudi Arabia. The comparison is made in all respects from the size of the data and the number of forecast days, as well as the forecast function whether it is ascending, descending, or even constant. Then, the percentage of error between the models is calculated and the result of the correct prediction of the Covid-19 epidemic is calculated. III. Results The objective of this research paper is to come up with a relation between the cumulative and non-cumulative cases of recovery and death of Covid-19 for the provided data shown above in Table 1. Then find suitable prediction models that estimate the future movement of the cases. The graphs, after analyzing and rearranging the table, are as follow: 9 Deaths & cumulative deaths Number of Death April 20, 2020 April 18, 2020 April 16, 2020 April 14, 2020 April 12, 2020 April 10, 2020 April 8, 2020 April 6, 2020 April 4, 2020 April 2, 2020 March 31, 2020 March 29, 2020 March 27, 2020 March 25, 2020 March 23, 2020 March 21, 2020 March 19, 2020 March 17, 2020 March 15, 2020 18 16 14 12 10 8 6 4 2 0 cumulative deaths Graph (1): Shows the number of death and the cumulative death due to Covid-19 from 15 March to 21 April of 2020. In term of death cases as shown in Graph (1), there was no death cases for a few short periods of time and a long period from 2 to 8 of April. While on the specific day of 30th of March there was a large surge of death cases due to the pandemic. Graph (2): Represents the fitting cumulative death cases of Covid-19 from 15 March to 21 April of 2020. Graph (2) above represents the cumulative death cases of the same period. It was plotted using Microsoft Excel© and it was fitted into a 4-degree polynomial equation that has the following configuration. 10 y = 5E-05x4 - 9.0204x3 + 594307x2 - 2E+10x + 2E+14 In term of the cumulative death cases, there is a clear direct proportional relationship between the death cases and the cumulative ones. That is why we have a notable surge of cumulative death cases at the same day of the 30th of March and a stationary cumulative movement in the same periods when there were no death cases recorded. Recovery & cumulative recovery cases Number of Recovery April 20, 2020 April 18, 2020 April 16, 2020 April 14, 2020 April 12, 2020 April 10, 2020 April 8, 2020 April 6, 2020 April 4, 2020 April 2, 2020 March 31, 2020 March 29, 2020 March 27, 2020 March 25, 2020 March 23, 2020 March 21, 2020 March 19, 2020 March 17, 2020 March 15, 2020 14 12 10 8 6 4 2 0 cumulative recovery Graph (3): Shows the number of recovery and the cumulative recovery from Covid-19 from 15 March to 21 April of 2020. In term of recovery numbers as shown in Graph (3), it can be seen that there were no cases of recovery in three long period of time, from 24 to 28 of March, from 30 of March to 6 of April and from 12 to 17 of April. While on the 20th of April there was a large surge of recovery cases at that day. 11 Graph (4): Represents the fitting cumulative recovery cases from Covid-19 from 15 March to 21 April of 2020. Graph (4) above represents the cumulative recovery of the same period. It was plotted using Microsoft Excel© and it was fitted into a 6-degree polynomial equation that has the following configuration. y = 5E-07x6 - 0.1321x5 + 14501x4 - 8E+08x3 + 3E+13x2 - 5E+17x + 4E+21 In term of the cumulative cases as shown in Graphs (3) and (4). There is a clear direct proportional relationship between the recovery cases and the cumulative ones. That is why we have a notable surge of cumulative cases on the same day of the 20th of April and a stationary cumulative movement in the same periods when there are no cases of recovery. When observing the prediction models (SIR and Arima) used in the literature and compare it to the simple fitting model used in both graph (2) and (4). It is clear that the Arima model offer more accurate predictions to the simple fitting. Therefore, it should predict the number of confirmed cases of COVID-19 in Saudi Arabia in the next coming weeks better than the SIR models. The forecasting results also shows that the COVID-19 cases in the next weeks are more likely to go up as we can see clearly that both death and recovery of the confirmed cases rate and 12 the cumulative cases rate of COVID-19 continue to increase. As shown in both Graphs (2) and (4). IV. Discussion In order to comprehend the risk of increase in death from Covid-19 - the probability of a person dying from the disease - we need to know how many people die from the disease so far. We want to know the final number of deaths resulting from a certain group of infected populations. However, the final results (death or recovery) of all cases are not yet known because the outbreak continues. The time from the onset of symptoms to death ranges from two to eight weeks due to Covid-19. This means that individuals in the early or advanced stages of infection will die later. That is why we cannot give or get the final death risk figure. What we know is the total number of confirmed deaths so far. WHO publishes confirmed deaths weekly operational update reports in response to the pandemic in situation reports. This means that we can follow the change in the number of deaths over time. But this does not tell us that a person with diseases may die from it to find out, we need to know the final result of all cases. Some individuals currently infected with COVID-19 could die later. Statistics are an essential component in helping the government and health care organizations to fight the pandemic of this virus and to increase the rate of recoveries and decrease the rate of deaths or in other words ' Flattening the Curve'. The first two graphs (Graph [1] and Graph [2] ) show the daily and the cumulative number of confirmed deaths and recoveries that we were able to find during the time period from March 10th to April 21st in 2020. We remark here that each chart shows something different. The first chart (Graph 1) shows the number of deaths and the cumulative number of deaths during the specified 13 period, through the first chart we can note here that the death rate during the specified period ranges from zero to three deaths per day (one death in most days). On the other hand, the second chart (Graph 2) shows the number of recoveries and the cumulative number of recoveries, which range from zero to three recoveries per day, zeros on most days, and this gives an impression of the severity of the virus because the rate of recoveries is lower than the rate of deaths. The third and fourth figures (Graph [3] and Graph [4]) show cumulative deaths and recoveries, respectively, during the time period from March 10th to April 21st in 2020. Each of the blue dots in the charts shows the data we have obtained for both cumulative deaths and recovery, and the red line shows the forecast for the beginnings of May for the rate of deaths and cumulative recovery based on the previous cases which can be represented by the following equations, respectively. πΆπ’ππ’πππ‘ππ£π π·πππ‘βπ : y = 5E-05x4 - 9.0204x3 + 594307x2 - 2E+10x + 2E+14 πΆπ’ππ’πππ‘ππ£π π ππππ£πππ¦ βΆ y = 5E-07x6 - 0.1321x5 + 14501x4 - 8E+08x3 + 3E+13x2 - 5E+17x + 4E+21 As we discussed in the literature review, it is also possible to use the known infectious disease prevalence models such as SIR and ARIMA to predict the spread of the epidemic at a local level within a specified period of time. Therefore, according to the examples that provieded in the literature review, and after comparing of the two models, it is clearly that the AMIRA is acceptable model for the data that we have and graphs that we fitted. 14 V. Conclusion In this research, we studied the database (table 1 & table 2) which gives us the number of deaths and recovery. Then, the cumulative recovery and cumulative deaths was calculated. Also, the graph that we created, was fitted by using polynomial function. To find the best fitting we did many tests, and we found out the best fitting is by using polynomial function. By using this fitting, we were able to find the prediction of the cumulative recovery and cumulative deaths. Base on the literature review, two different models was compared, which are the AMIRA and SIR, we figure out that a beat forecasting is AMRIA, which has the accurate prediction and less error. Furthermore, to confirm that AMRIA is the accurate prediction, we compared our forecasting as discussed, to the two model in literature review, we confirmed that AMRIA in a good agreement with the actual values that we have. References [1] "WHO Statement Regarding Cluster of Pneumonia Cases in Wuhan, China". www.who.int Retrieved 23 April 2021. [2] Eschner, Kat (2020-01-28). "We're still not sure where the Wuhan coronavirus really came from". Retrieved 23 April 2021. [3] "Clinical characteristics of COVID-19". European Centre for Disease Prevention and Control. Retrieved 23 April 2021. 15 [4] Grant MC, Geoghegan L, Arbyn M, Mohammed Z, McGuinness L, Clarke EL, Wade RG (23 June 2020). "The prevalence of symptoms in 24,410 adults infected by the novel coronavirus (SARS-CoV-2; COVID-19): A systematic review and meta-analysis of 148 studies from 9 countries". Retrieved 23 April 2021. [5] Harko, Tiberiu; Lobo, Francisco S. N.; Mak, M. K. (2014). "Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates". Retrieved 23 April 2021. [6] Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. Forecasting: principles and practice. oTexts. Retrieved 23 April 2021. [7] Alzahrani, S., Aljamaan, I. and Al-Fakih, E., 2020. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. Journal of Infection and Public Health, 13(7), pp.914-919. [8] Pell, B.,Kuang, Y., Viboud, C., Chowell, G. Using phenomenological models for forecasting the 2015 Ebola challenge. Epidemics 2018, 22, 62–70, Retrieved 24 April 2021. [9] Alboaneen, D.,Pranggono, B., Alshammari, D., Alqahtani, N. and Alyaffer, R., n.d. Predicting the Epidemiological Outbreak of the Coronavirus Disease 2019 (COVID-19) in Saudi Arabia. Journal of Environmental Research and Public Health, 2020, 17, 4568. [10] Y.-C. a. L. P.-E. a. C. Chen, "A time-dependent SIR model for COVID-19 with undetectable infected persons," IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 14-16, 2020. 16 [11] J. a. L. T. Tolles, "Modeling epidemics with compartmental models," Jama, vol. 323, no. 24, pp. 2515--2516, 2020. [12] D. a. G. Benvenuto, "Application of the ARIMA model on the COVID-2019 epidemic dataset," Data in brief, vol. 29, no. 1, pp. 3-4, 2020. [13] A. a. F. Hernandez-Matamoros, "Forecasting of COVID19 per regions using ARIMA models and polynomial functions," Applied Soft Computing, vol. 96, no. 2, 2020. [14] Chen, Y.; Liu, Q.; Guo, D. Emerging coronaviruses: Genome structure, replication, and pathogenesis. J. Med. Virol. 2020, 92, 418–423. [15] Ge, X.Y.; Li, J.L.; Yang, X.L.; Chmura, A.A.; Zhu, G.; Epstein, J.H.; Mazet, J.K.; Hu, B.; Zhang, W.; Peng, C.; et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 2013, 503, 535–538 Acknowledgments We would like to thank dr. Mohammed Albilousii, dr. Saleh Alzahrani, and dr. Mahbubunnabi Tamal for help us to understand the topic to complete the research paper. Also, we would like to express our deep and sincere gratitude and outmost respect to Dr. Gameel Saleh from Biomedical Engineering Department at IAU for providing the research topic and raw data. 17 18