Abstract:
ABSTRACT
Severe Acute Respiratory Syndrome is the primary cause of the pandemic coronavirus
disease. The first case was reported in Wuhan, China, on 30th December, 2019 with
the first case on 13th March, 2020 in Kenya. This contagious disease has become a
global issue because it has resulted in millions of deaths, economic disruption leading
to loss of employment and economic instability. This study therefore aimed at mod-
elling daily COVID-19 cases in Kenya, using an Autoregressive Integrated Moving
Average (ARIMA) model and a Seasonal Autoregressive Integrated Moving Average
(SARIMA) model. The specific objectives were: to fit an Autoregressive Integrated
Moving Average (ARIMA) model, to fit a SARIMA model, to validate the model and
to determine the forecast of COVID-19 cases. The World Health Organization was
used as the source of secondary data dating from 13th March, 2020 to 30th April,
2023. These data was analyzed using R software. The training data was found to
be non-stationary using a test known as Augmented Dickey Fuller, and it was differ-
enced seasonally to make it stationary. The methodology used to fit the models was
Box-Jenkins which uses the least AIC and BIC as its fitting criteria. The data revealed
weekly seasonality hence invalidating the ARIMA model. SARIMA model was fitted
and model validation using test data was done. The model with the least forecast errors
was selected. The SARIMA(1,0,1)(2,1,2)7 was selected with the least AIC = 2082.5,
MAE = 2.9867, RMSE = 4.5815. Using the model, a ninety days forecast into the fu-
ture was generated based on daily COVID-19 data. These forecasts will greatly create
awareness of the trend and seasonality of this disease and therefore can be very useful
to the health care providers as well as the government for purpose of planning, policy
formulation, evaluation and resource allocation. This study recommends a compara-
tive study on Bayesian SARIMA and SARIMA model to be perfomed, consideration
of the possible change in probabilistic structures of the data and fitting of the BATS
and TBATS models to the data