Uploaded by forcheggtome

final project V 7

advertisement
Name in English
ID
Emails
Mohammad AL Dahneem
2180005862
2180005862@iau.edu.sa
Amjad Mahdi Motiloq
2180002090
2180002090@iau.edu.sa
Mohammed Hashim
2180001833
2180001833@iau.edu.sa
Ayman Al-jumaia
2180004090
2180004090@iau.edu.sa
Coordinator’s Name
Dr. Mahbubunnabi Tamal
The Date
28 / 04 / 2021
1
Prediction of COVID-19 Cases
Abstract
Covid-19 becomes global pain to societies. This means that it has become a threat to the whole
world. The increase in confirmed cases in most countries, as well as deaths, has increased
significantly, which warns of facing economic disasters and human losses in these countries. This
is what prompted researchers to work in the search for mathematical models and statistical
calculations to obtain predictions of what might happen in the coming days for the curves of this
epidemic, then know the economic effects. The Kingdom of Saudi Arabia is one of the countries
that has been affected by this epidemic, with economic impacts and human losses. Using real-time
data from 15 March 2020 to 21 April 2020 provided from the ministry of health, the data fitted in
software to forecasting the deaths and recovery of an epidemic in two coming weeks. Our result
of predicting for coved -19 epidemic in the kingdom is the cumulative deaths will be growing
continuously and reaches 40 deaths at beginning of May 2020. Then comparing our predicting
module with models that were created by other researchers such as ARIMA, SIR models. Found
that our module is approximately the same as that compares it with it and the forecasting is the
same as an exponential function that grows continuously. Therefore, as a result, the Saudi Ministry
of Health must take measures to close some commercial activities and curb human gatherings to
preserve the health of citizens and prevent the spread of this epidemic.
I.
Introduction
Coronavirus is a wide range of viruses that cause diseases ranging from colds to more serious
diseases, such as the Middle East respiratory syndrome (MERS) and Severe acute respiratory
syndrome (SARS). It began in Wuhan, China, in December 2019 [1] [14]. A non-organism that is
originally protein-coated and consists of an animal-sourced RNA that is likely to have spread
through bats or two, the word corona is a Latin word in the sense of crown [2]. Coronavirus is
known as an animal source, which means that it first evolved in animals and then transmitted to
humans, and coronavirus has not been definitively linked to a particular animal, but researchers
believe that the transmission of the virus occurred in the open food market in Wuhan, China [2].
2
Most people who got affected by Covid-19 initially experience symptoms similar to colds, flu,
and the most common symptoms are high heat, a narrow cough, the same general fatigue, and
others that are not widespread such as muscle pain, nausea, diarrhea, loss of olfactory sense, head
pain, throat pain, and recovery without the need for special treatment [15]. Older persons and those
with underlying medical problems such as cardiovascular diseases, diabetes, chronic respiratory
diseases, and cancer are the most likely to develop serious illnesses [3] [4]. Hance, the purpose of
this research is to compare two prediction models, which is the AMIRA and SIR. Also, to find
which is the best prediction model to the data that we have.
Literature review
a) The SIR prediction model
The aim of the SIR model is to predict the number of the individuals who are actively infected,
vulnerable to infection, have recovered from infection, or died due to the infection at any given
time. The SIR model which stands for (susceptible, infected and recovered), was first introduced
in 1927 after a decade of the influenza pandemic [10]. The simplicity of the SIR model makes it
easy to compute which might be one of reasons for its popularity. It allows modelers to estimate
disease behavior by approximating a small number of parameters. There are only 2 parameters that
define the SIR model: the effective contact rate (β), that affects the transition from the susceptible
to the infected compartment, and the rate of recovery (γ), that affects the transition from the
infected to the recovered compartment [11].
b) The ARIMA prediction model
ARIMA - the autoregressive integrated moving average - is a famous analysis statistical model
that attempt to predict the trends or understand the data set by using time series data to accomplish
3
the required prediction. They are known for their excellent approach to the time series forecasting
and their ability to approach the problem in a very complementary way. In other words, the
description of the autocorrelations in the data is the main objective of the ARIMA models [12].
Many famous forecasting models such as Random-walk, random-trend models, autoregressive
models, and exponential smoothing models are in fact just a special case of the Arima model.
ARIMA (p,d,q), is a standard notation where the integer values are substituted in the place of the
parameters as indication of using the ARIMA model. The ARIMA model parameters are defined
as follows:
p: lag observations number included in the model.
d: the differencing degree.
q: The size that the moving average window operates.
The process of fitting in the ARIMA model requires the stationarizition of the time series which
can be done by differencing. Stationarizing a time series helps the researcher to find clues about
the proper forecasting model. Statistical stationarity: is when the statistical properties like mean,
variance and autocorrelation become constant over time [13]. The forecasting equation for the
Arima model is constructed as follows. First, denoting the y with the d^th difference of Y, which
means:
The general forecasting equation in terms of y:
4
The SIR model is one of the simplest models that is used to predict infectious diseases that are
transmitted from one human to another human, and many models are derived from this basic form.
The name of the SIR model is short for S(t) which represents the number of susceptible individuals.
I(t) represent the number of infectious individuals, R(t) for the number of recovered and immune
individuals. Transmission of the disease depends on the friction between the population and if a
person infected meets an exposed person who is not immune to the disease, it transmits the
infection to him [5].
ARIMA stands for (Auto-Regressive-Integrated-Moving-Average) it is a class of models that
describes a particular time series based on its own previous values. ARIMA model is illustrated
by three terms, (p, d, q). “p” It refers to the number of lags of Y to be used as predictors (AutoRegressive). “q” refers to the number of lagged errors that should go into the ARIMA Model
(Moving Average). “d” refers to the minimum number of differences needed to make the series
stationary [6].
c) Predicting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA
model
Saudi Arabia has two holy cities, which are mecca and medina. In these two cities held the
Umrah and Hajj pilgrimages, about 2 million people are planning to visit the Saudi Arabia to
perform the Umrah and Hajj pilgrimages. Hence, it is necessary to do the prediction of the covid19 cases in Saudi Arabia.
5
In [7], they do the prediction of the covid-19 cases after four weeks of the actual data date, by
using different models, ARIMA, ARMA, AR, and MA. Moreover, comparing each model by
divided the data in two groups to evaluate the performance of each model. The first group is to be
testing, and the other used to training the model. By using this method, and analyzing the data,
they found the best performance is the ARIMA model. Also, give acceptable agreement with the
actual values that can predict the cases in Saudi Aribia accurately [7]. According to the ARIMA
prediction modal in [7], the number of the coronavirus cases is increases and will not decreases.
However, the Saudi Aribia has completely closed its borders, and preventing the travel to and from
it for citizens, residents, and tourists. Therefore, the Saudi Aribia government has suspended the
Umrah and Hajj pilgrimages at that period.
d) Predicting the Outbreak of COVID-19 in Saudi Arabia using SIR model
After discovering the cases of Covid -19 in the Kingdom of Saudi Arabia and the virus spread
in the world, the researchers started searching and statistical analysis to know the prediction of
covid-19 and what might happen from the spread of this virus or its regression and recovery from
it in the Kingdom. They analyzed approximately 2 months of real-time data for recovering and
death cases for these two months. They analyze the data to forecasting by using two models, the
Logistic Growth and SIR models. The Logistic Growth Model has been used in many cases earlier
such as predicting the Ebola epidemic in 2015 [8].
The dynamics of the epidemic is the analysis of the cumulative total of cases infected with
Covid- 19. This model can analyze how many days the patient can quarantine and how can we
benefit from quarantine in reducing the rates of infection with the epidemic. Susceptible-InfectedRecovered Model (SIR), In this model, assumptions were made, including a constant rate mixing
of people, the population, and the number of recoveries equal to the number of infected. By using
6
this model, is found that the outbreak will rise dramatically to mid-May 2020. After this rise, cases
will remain at a steady rate until June 2020, then the curve begins to decline until the end of the
outbreak, which is at the beginning of July 2020. They found that this prediction is not accurate
because the data they were analyzed was small data [9].
To complete this prediction, they need more information and data. Such as the number of
infections for a long time period and how many people infect without knowing, also the rate of
spread in children. However, this model predicts that covid -19 is spread widely in Saudi Arabia
at least for 2 months. This means that there are many people who will be infected by this virus if
the ministry of health does not take Procedures to close some activities and a curfew during that
period [9].
In conclusion, the literature discussed two models, which are the autoregressive integrated
moving average (AMIRA) and stands for Susceptible SIR models. Furthermore, it is compared the
two models, to figure out which model is acceptable for our data. Also, it is provided two examples
of the prediction of covid-19 cases in Saudi Aribia, by the two model AMIRA and SIR.
II.
Materials and Methods
A) Database
The data was collected and organized in table 1 & table 2. The rows representing the date
and columns showing the number of recovery cases, the number of death cases, the number of
cumulative recovery cases, and the number of cumulative deaths cases.
Table 1
Covid -19 cases in Saudi Arabia from 15 March to 4 April
7
Date
Number of Recovery
Number of Death
cumulative recovery
cumulative deaths
March 15, 2020
1
0
1
0
March 16, 2020
0
0
1
0
March 17, 2020
0
1
1
1
March 18, 2020
1
0
2
1
March 19, 2020
0
0
2
1
March 20, 2020
3
2
5
3
March 21, 2020
0
0
5
3
March 22, 2020
1
0
6
3
March 23, 2020
1
0
7
3
March 24, 2020
0
2
7
5
March 25, 2020
0
0
7
5
March 26, 2020
0
0
7
5
March 27, 2020
0
1
7
6
March 28, 2020
0
1
7
7
March 29, 2020
1
0
8
7
March 30, 2020
0
3
8
10
March 31, 2020
0
0
8
10
April 1, 2020
0
1
8
11
April 2, 2020
0
0
8
11
April 3, 2020
0
0
8
11
April 4, 2020
0
0
8
11
Table 1 (continue)
Covid -19 cases in Saudi Arabia from 5 April to 21 April
Date
Number of Recovery
Number of Death
cumulative recovery
cumulative deaths
April 5, 2020
0
0
8
11
April 6, 2020
0
0
8
11
April 7, 2020
1
0
9
11
April 8, 2020
1
0
10
11
April 9, 2020
0
1
10
12
April 10, 2020
0
0
10
12
April 11, 2020
1
0
11
12
April 12, 2020
0
1
11
13
April 13, 2020
0
0
11
13
April 14, 2020
0
0
11
13
April 15, 2020
0
0
11
13
April 16, 2020
0
1
11
14
April 17, 2020
0
0
11
14
April 18, 2020
1
1
12
15
April 19, 2020
0
0
12
15
April 20, 2020
0
0
12
15
April 21, 2020
1
1
13
16
8
B) MS Excel
These numbers were collected, graphs and statistical analysis were made using Microsoft
Excel and to see if this information was sufficient to make a comparison with other models. By
using MS excel software to fit the data using polynomial fitting equation.
Polynomial formula: F(x) = an xn + an-1xn-1 + an-2xn-2 + … + a1x +a0 = 0
Using the equation shown, the cumulative recovery and cumulative deaths curves are fitted and
the equations of these two curves were determined. using these equations can predicting for the
coming period of this coved-19 epidemic.
C) Comparison between models
When the curves and prediction equations are determined for the coming period, the results
are compared with models that researchers have worked on previously for predictions of the
Covid-19 epidemic in Saudi Arabia. The comparison is made in all respects from the size of the
data and the number of forecast days, as well as the forecast function whether it is ascending,
descending, or even constant. Then, the percentage of error between the models is calculated and
the result of the correct prediction of the Covid-19 epidemic is calculated.
III.
Results
The objective of this research paper is to come up with a relation between the cumulative and
non-cumulative cases of recovery and death of Covid-19 for the provided data shown above in
Table 1. Then find suitable prediction models that estimate the future movement of the cases. The
graphs, after analyzing and rearranging the table, are as follow:
9
Deaths & cumulative deaths
Number of Death
April 20, 2020
April 18, 2020
April 16, 2020
April 14, 2020
April 12, 2020
April 10, 2020
April 8, 2020
April 6, 2020
April 4, 2020
April 2, 2020
March 31, 2020
March 29, 2020
March 27, 2020
March 25, 2020
March 23, 2020
March 21, 2020
March 19, 2020
March 17, 2020
March 15, 2020
18
16
14
12
10
8
6
4
2
0
cumulative deaths
Graph (1): Shows the number of death and the cumulative death due to Covid-19 from
15 March to 21 April of 2020.
In term of death cases as shown in Graph (1), there was no death cases for a few short periods
of time and a long period from 2 to 8 of April. While on the specific day of 30th of March there
was a large surge of death cases due to the pandemic.
Graph (2): Represents the fitting cumulative death cases of Covid-19 from 15 March to 21 April of 2020.
Graph (2) above represents the cumulative death cases of the same period. It was plotted using
Microsoft Excel© and it was fitted into a 4-degree polynomial equation that has the following
configuration.
10
y = 5E-05x4 - 9.0204x3 + 594307x2 - 2E+10x + 2E+14
In term of the cumulative death cases, there is a clear direct proportional relationship between
the death cases and the cumulative ones. That is why we have a notable surge of cumulative death
cases at the same day of the 30th of March and a stationary cumulative movement in the same
periods when there were no death cases recorded.
Recovery & cumulative recovery cases
Number of Recovery
April 20, 2020
April 18, 2020
April 16, 2020
April 14, 2020
April 12, 2020
April 10, 2020
April 8, 2020
April 6, 2020
April 4, 2020
April 2, 2020
March 31, 2020
March 29, 2020
March 27, 2020
March 25, 2020
March 23, 2020
March 21, 2020
March 19, 2020
March 17, 2020
March 15, 2020
14
12
10
8
6
4
2
0
cumulative recovery
Graph (3): Shows the number of recovery and the cumulative recovery from Covid-19
from 15 March to 21 April of 2020.
In term of recovery numbers as shown in Graph (3), it can be seen that there were no cases
of recovery in three long period of time, from 24 to 28 of March, from 30 of March to 6 of April
and from 12 to 17 of April. While on the 20th of April there was a large surge of recovery cases at
that day.
11
Graph (4): Represents the fitting cumulative recovery cases from Covid-19 from 15 March to 21 April of 2020.
Graph (4) above represents the cumulative recovery of the same period. It was plotted using
Microsoft Excel© and it was fitted into a 6-degree polynomial equation that has the following
configuration.
y = 5E-07x6 - 0.1321x5 + 14501x4 - 8E+08x3 + 3E+13x2 - 5E+17x + 4E+21
In term of the cumulative cases as shown in Graphs (3) and (4). There is a clear direct
proportional relationship between the recovery cases and the cumulative ones. That is why we
have a notable surge of cumulative cases on the same day of the 20th of April and a stationary
cumulative movement in the same periods when there are no cases of recovery.
When observing the prediction models (SIR and Arima) used in the literature and compare it to
the simple fitting model used in both graph (2) and (4). It is clear that the Arima model offer
more accurate predictions to the simple fitting. Therefore, it should predict the number of
confirmed cases of COVID-19 in Saudi Arabia in the next coming weeks better than the SIR
models. The forecasting results also shows that the COVID-19 cases in the next weeks are more
likely to go up as we can see clearly that both death and recovery of the confirmed cases rate and
12
the cumulative cases rate of COVID-19 continue to increase. As shown in both Graphs (2) and
(4).
IV.
Discussion
In order to comprehend the risk of increase in death from Covid-19 - the probability of a person
dying from the disease - we need to know how many people die from the disease so far. We want
to know the final number of deaths resulting from a certain group of infected populations.
However, the final results (death or recovery) of all cases are not yet known because the outbreak
continues. The time from the onset of symptoms to death ranges from two to eight weeks due to
Covid-19. This means that individuals in the early or advanced stages of infection will die later.
That is why we cannot give or get the final death risk figure. What we know is the total number of
confirmed deaths so far. WHO publishes confirmed deaths weekly operational update reports in
response to the pandemic in situation reports. This means that we can follow the change in the
number of deaths over time. But this does not tell us that a person with diseases may die from it to find out, we need to know the final result of all cases. Some individuals currently infected with
COVID-19 could die later. Statistics are an essential component in helping the government and
health care organizations to fight the pandemic of this virus and to increase the rate of recoveries
and decrease the rate of deaths or in other words ' Flattening the Curve'.
The first two graphs (Graph [1] and Graph [2] ) show the daily and the cumulative number of
confirmed deaths and recoveries that we were able to find during the time period from March 10th
to April 21st in 2020. We remark here that each chart shows something different. The first chart
(Graph 1) shows the number of deaths and the cumulative number of deaths during the specified
13
period, through the first chart we can note here that the death rate during the specified period ranges
from zero to three deaths per day (one death in most days). On the other hand, the second chart
(Graph 2) shows the number of recoveries and the cumulative number of recoveries, which range
from zero to three recoveries per day, zeros on most days, and this gives an impression of the
severity of the virus because the rate of recoveries is lower than the rate of deaths.
The third and fourth figures (Graph [3] and Graph [4]) show cumulative deaths and
recoveries, respectively, during the time period from March 10th to April 21st in 2020. Each of
the blue dots in the charts shows the data we have obtained for both cumulative deaths and
recovery, and the red line shows the forecast for the beginnings of May for the rate of deaths and
cumulative recovery based on the previous cases which can be represented by the following
equations, respectively.
πΆπ‘’π‘šπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π·π‘’π‘Žπ‘‘β„Žπ‘ : y = 5E-05x4 - 9.0204x3 + 594307x2 - 2E+10x + 2E+14
πΆπ‘’π‘šπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π‘…π‘’π‘π‘œπ‘£π‘’π‘Ÿπ‘¦ ∢ y = 5E-07x6 - 0.1321x5 + 14501x4 - 8E+08x3 + 3E+13x2 - 5E+17x +
4E+21
As we discussed in the literature review, it is also possible to use the known infectious disease
prevalence models such as SIR and ARIMA to predict the spread of the epidemic at a local level
within a specified period of time. Therefore, according to the examples that provieded in the
literature review, and after comparing of the two models, it is clearly that the AMIRA is acceptable
model for the data that we have and graphs that we fitted.
14
V.
Conclusion
In this research, we studied the database (table 1 & table 2) which gives us the number of deaths
and recovery. Then, the cumulative recovery and cumulative deaths was calculated. Also, the
graph that we created, was fitted by using polynomial function. To find the best fitting we did
many tests, and we found out the best fitting is by using polynomial function. By using this fitting,
we were able to find the prediction of the cumulative recovery and cumulative deaths. Base on the
literature review, two different models was compared, which are the AMIRA and SIR, we figure
out that a beat forecasting is AMRIA, which has the accurate prediction and less error.
Furthermore, to confirm that AMRIA is the accurate prediction, we compared our forecasting as
discussed, to the two model in literature review, we confirmed that AMRIA in a good agreement
with the actual values that we have.
References
[1] "WHO Statement Regarding Cluster of Pneumonia Cases in Wuhan, China". www.who.int
Retrieved 23 April 2021.
[2] Eschner, Kat (2020-01-28). "We're still not sure where the Wuhan coronavirus really came
from". Retrieved 23 April 2021.
[3] "Clinical characteristics of COVID-19". European Centre for Disease Prevention and Control.
Retrieved 23 April 2021.
15
[4] Grant MC, Geoghegan L, Arbyn M, Mohammed Z, McGuinness L, Clarke EL, Wade RG (23
June 2020). "The prevalence of symptoms in 24,410 adults infected by the novel coronavirus
(SARS-CoV-2; COVID-19): A systematic review and meta-analysis of 148 studies from 9
countries". Retrieved 23 April 2021.
[5] Harko, Tiberiu; Lobo, Francisco S. N.; Mak, M. K. (2014). "Exact analytical solutions of the
Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and
birth rates". Retrieved 23 April 2021.
[6] Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. Forecasting:
principles and practice. oTexts. Retrieved 23 April 2021.
[7] Alzahrani, S., Aljamaan, I. and Al-Fakih, E., 2020. Forecasting the spread of the COVID-19
pandemic in Saudi Arabia using ARIMA prediction model under current public health
interventions. Journal of Infection and Public Health, 13(7), pp.914-919.
[8] Pell, B.,Kuang, Y., Viboud, C., Chowell, G. Using phenomenological models for forecasting
the 2015 Ebola challenge. Epidemics 2018, 22, 62–70, Retrieved 24 April 2021.
[9] Alboaneen, D.,Pranggono, B., Alshammari, D., Alqahtani, N. and Alyaffer, R., n.d. Predicting
the Epidemiological Outbreak of the Coronavirus Disease 2019 (COVID-19) in Saudi Arabia.
Journal of Environmental Research and Public Health, 2020, 17, 4568.
[10] Y.-C. a. L. P.-E. a. C. Chen, "A time-dependent SIR model for COVID-19 with undetectable
infected persons," IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 14-16, 2020.
16
[11] J. a. L. T. Tolles, "Modeling epidemics with compartmental models," Jama, vol. 323, no. 24,
pp. 2515--2516, 2020.
[12] D. a. G. Benvenuto, "Application of the ARIMA model on the COVID-2019 epidemic
dataset," Data in brief, vol. 29, no. 1, pp. 3-4, 2020.
[13] A. a. F. Hernandez-Matamoros, "Forecasting of COVID19 per regions using ARIMA models
and polynomial functions," Applied Soft Computing, vol. 96, no. 2, 2020.
[14] Chen, Y.; Liu, Q.; Guo, D. Emerging coronaviruses: Genome structure, replication, and
pathogenesis. J. Med. Virol. 2020, 92, 418–423.
[15] Ge, X.Y.; Li, J.L.; Yang, X.L.; Chmura, A.A.; Zhu, G.; Epstein, J.H.; Mazet, J.K.; Hu, B.;
Zhang, W.; Peng, C.; et al. Isolation and characterization of a bat SARS-like coronavirus that uses
the ACE2 receptor. Nature 2013, 503, 535–538
Acknowledgments
We would like to thank dr. Mohammed Albilousii, dr. Saleh Alzahrani, and dr. Mahbubunnabi
Tamal for help us to understand the topic to complete the research paper. Also, we would like to
express our deep and sincere gratitude and outmost respect to Dr. Gameel Saleh from Biomedical
Engineering Department at IAU for providing the research topic and raw data.
17
18
Download