2020 Volume 11 Issue 4

Information Generation Rate and its Relationship with the Entropy of Non-Linear Models: Covid-19 Case, Peru 2020

 

Danny Villegas Rivas*, Manuel Milla Pino, Salli Villegas Rivas, Erick Delgado Bazan, Yary Pérez Pérez, Zadith Garrido Campaña, Martín Grados Vasquez, Cesar Osorio Carrera, Luis Ramírez Calderón, José Paredes Carranza, Ricardo Shimabuku Ysa


Abstract

In this paper, entropy was studied in non-linear models including exponential, Gompertz, and logistic, to estimate epidemiological parameters of interest in data from confirmed cases of infection by COVID-19 in Peru. The data related to the spread of COVID-19 in Peru comes from the information available on the INS-Peru institutional portal (2020). The Akaike information criterion (AIC) and the residual standard error (ERR) were considered to evaluate the entropy of the models. The estimation of the parameters of the models was carried out using maximum likelihood and by the Bootstrap method. The results showed that the entropy of the models is related to the information generation rate, associated with the differential in the number of tests applied. Entropy severely affected maximum likelihood estimators. The Bootstrap estimators showed better performance against EMV with the estimated peak of confirmed cases. Bootstrap estimators were significantly affected by sample size, especially when n ≤ 10. The results of this research suggest considering the entropy and the information generation rate (differential in the application of tests for the diagnosis of COVID-19 in Peru), as well as the use of Bootstrap estimators as an alternative to estimate parameters of epidemiological models.

Keywords: Nonlinear models, Information generation rate, Bootstrap


Introduction

After the most recent H5N1 avian influenza epidemics and during the 2009 H1N1 influenza pandemic, the international scientific community in the public health area has made efforts within the framework of the imperative need to develop standardized research and collect data that will serve as support to face eventual pandemics (Sundus, et al., 2018; Shakeri, et al., 2018; Alzahrani, et al., 2019; Ren-Zhang, et al., 2020). On December 31, 2019, 27 cases of pneumonia of unknown etiology were identified in Wuhan City, Hubei Province in China. Wuhan is the most populous city in central China with a population of over 11 million. These patients presented most notably with clinical symptoms of dry cough, dyspnea, fever, and bilateral pulmonary infiltrates on imaging. All of the cases were related to the Wuhan Huanan Seafood Wholesale Market, which marketed fish and a variety of live animal species, such as poultry, bats, marmots, and snakes (Lu et al. 2020). The causative agent was identified from throat swab samples conducted by the Chinese Center for Disease Control and Prevention (CCDC) on January 7, 2020, and was subsequently named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS- CoV-2). The disease was named COVID-19 by the World Health Organization, known by its Spanish acronym WHO (World Health Organization, 2020).

To apply epidemiological models it is essential to understand the phenomena of complexity and chaos since chaos theory has been considered as a possible underlying explanatory model. The parameters associated with chaos are dimension measurements and information generation rates (entropy), understanding entropy as a measure of disorder. Since these analyses require large series of data that frequently make their calculation very difficult if not impossible in practical terms, theories and methods were devised to make the statistical study of regularity feasible, relating the information generation index with entropy, applied to small series of clinical data originated from complex “noisy” systems to demonstrate the existence or non-existence of chaos and non-linearity (Cuestas, 2013). Beyond the criteria that involve the evaluation of models, from the adjustment coefficient (R2), the entropy (Information criteria), the number of parameters (Mallows' Cp), the residual standard error (ERR), which although they are criteria that allow calibrating the predictive capacity of the model, they are not a sufficient condition for these models to be used as instruments for decision-making. It is the circumstances surrounding the environment of the phenomenon that determine the quality of the information, that is, of the sample, and consequently the levels of entropy of the models.

In the case of COVID-19 in Peru, where on March 6, 2020, the first case of contagion was registered, marking the beginning of a public health problem that has led the Peruvian Government to take measures ranging from social distancing, mandatory social isolation, until mandatory social immobilization, and that to date (April 15, 2020) despite these mediations, the figure stands at more than 11,000 confirmed cases of contagion by COVID-19 in much of the national territory. In this sense, in Peru, the limitations in the acquisition of tests as a consequence of a global phenomenon, about which there is little knowledge given its recent appearance, has woven a series of situations that have impacted not only the daily life of people but also the possibility of having models that allow defining the behavior of COVID-19, in terms of fairly precise estimates concerning the peak of contagion, with which action lines can be established and the class and duration of measures to decrease the contagion rate. In this sense, the increase in the number of scientific investigations, and the proliferation of long and complex data sets, in recent years have expanded the scope in the applications of statistical methods (González-Díaz, 2016). That is why, in the face of the problems that this virus has generated, especially about models frequently used in epidemiology, which despite exhibiting a good fit, allow only a partial description of the behavior of this pandemic, but make estimation impossible of parameters that allow designing public policy strategies, the central object of this research is to study entropy in non-linear models, especially, exponential, Gompertz and logistic, to more accurately estimate epidemiological parameters of interest, namely the number and peak of COVID-19 infections.

Materials and Methods

The data related to the spread of COVID-19 in Peru comes from information available on the INS-Peru institutional portal (2020), for the period from March 6 to April 15, 2020. For modeling the estimation of the number of infected for COVID-19 in Peru growth the models that were considered include exponential, Gompertz, and logistic, that unlike the models frequently used in epidemiology, such as the SIR and SEIR, based on differential equations and that tend to make unrealistic estimates in the case of these epidemics, the growth models allow, in addition to modeling the behavior of the epidemic up to the phase where it would reach the peak of contagion, they would also be able to make estimates of the mentioned peak, which, as far as possible, would be consistent and asymptotically unbiased.

The growth models considered in this research are briefly detailed below:

According to Seber and Wild (1989), the Gompertz model is defined as follows:

It is a response/growth curve across the true axis, that is not limited to non-negative values even though this is the range for most response and growth data.

If b < 0 the mean function increases, while it decreases for b > 0.

In practice, several reparametrizations of the model have been carried out.

According to Bruce and Versteeg (1992), the logistic model is defined as follows:

Selection Criteria based on Information Measures

In this research, in addition to the widely known criteria to evaluate the goodness of fit of the models, such as the coefficient of determination (R2) and the residual standard error (ERR), there are the information criteria or entropy indices.

Akaike Information Criteria (AIC)

This criterion is detailed in González and Landro (2018), who points out that if the problem consists of selecting the coefficients β that are as close as possible to the vector β* , the distance between the distributions fYβ*  and fYβ  can be characterized by an entropy measure of the form (see Akaike, H. (1978b)):

(where the first addend of the second member represents the ability to fit of fYβ  for fYβ*  and the second addend, for a given function fYβ* , is a constant). The minimization of the entropy measure implies the minimization of the information criterion (see Kullback, 1959):

 

Assuming that β=β*+∆β  (where β=β1  β ∆βk T  is an arbitrary norm vector small), then the criterion KLβ*,β  admits a Taylor series expansion of the form:

If fyβ*  is a regular function, the first term of the second member of this expression vanishes and, consequently, it follows that  (where ∆βI2=∆βTIβ*∆β  , where I2   is the Euclidean norm and I(⦁) is the information matrix of Fisher). Suppose that β is included in an s-dimensional space Θs1,2,…,k-1 , while the vector of the true values of the coefficients, β* , is included in a k-dimensional space ( k > s). Denoting by βs*   the projection of β*  onΘs  in the sense of the Euclidean norm; it is shown that 2KLβ*,βsβs*-β*I2+βs-βs*I2  (where βsΘs  and it is verified that βsβs* ).

Replacing βs  by the vector of random variables βs  formed by the restricted maximum-likelihood estimators of β*  in Θs  and, taking into account that, for values of n that are sufficiently large, , it is verified that 2EKLβ*,βsβs*-β*I2+sn . This expression constitutes a measure of the deviations of βs  to the vector β*  and allows us to conclude that the expected value of this deviation includes a component that represents the error related to the selection of a coefficient space approximated by βs*   and another which represents the error due to the estimation of the vector of the coefficients. Akaike showed that, under certain conditions of regularity, the likelihood ratio is:

And therefore, that  is an unbiased estimator of the measure EKLβ*-βs . The Akaike information criterion (AIC) consists of minimizing the logarithm of the likelihood function -2LnY,βs+2s   s=1,2…,k-1  in which the first term represents the measure of the error due to the lack of capacity to adapt to the approximation and the second term defines the penalty factor. Under the assumption of normality of the assumed true model, its density function assumes the form:

and the likelihood function of the candidate model Ytp  will be of the form. Therefore, the Kullback-Leibler distance will assume the form:

1-…-pYt-p2.

Hence, the Kullback-Leibler distance will assume the form:

Thus, substituting in this expression the coefficients j , σε*2  ^ 2 and σ2   by their maximum-likelihood estimators, we obtain:

From this definition the following selection criteria results:

which allows obtaining an asymptotically efficient estimator .

Bootstrapping Estimation

In addition to the maximum likelihood estimators of the parameters of the nonlinear models considered in this investigation, the estimation was performed using the Bootstrap method proposed by Efron (1979), which is one of the simplest methods used to obtain an estimator of a parameter β=β(P)  where P is the postulated statistical model. Alonso (2001) presents the Bootstrap method in a general situation:

Let be Z = (Z1, Z2, ..., Zn)  a data set generated by the statistical model P, and let be T(Z) the statistic whose distribution L(T ; P) we wish to estimate. The Bootstrap method proposes as an estimator of L(T ; P) the distribution L*(T*; n Pˆ ) of the statistic T* =T (Z*), where Z* is a data set generated by the estimated model Pn . Note that if Pn=P , then the distributions L(T; P) and L*(T*; Pn)  coincide. Then if we have a good estimator of P, it is logical to suppose that L*(T*; Pn)  it will approach L(T ; P).

The models described above, their estimators (EMV & Bootstrap), and the model selection criteria (AIC & ERR) were determined in the R environment, using the “drc” package and the “boot” package (R Core Team 2020). For details see Appendixes 1 and 2.

Results

Table 1 shows the results of the evaluation of entropy in three non-linear models (exponential, logistic & Gompertz) adjusted to data from confirmed cases of contagion by COVID-19 in Peru in the period March 6 - April 15, 2020, and related statistics. There it is observed that, for each of the models considered, the entropy index (AIC) and the residual standard error (ERR) increase as the sample size increases (days considered in the study). In the same way, the variances σβi2  of the maximum likelihood estimators (EMV) of the peak of the curve (c in the case of the exponential model and d in the Gompertz & logistic models) increase when n grows, which results in unstable estimators, even when the data shows a good fit, especially in the case of the logistic model (see Figure 1). On the other hand, Figure 2 shows a comparison of the estimates by maximum likelihood and Bootstrap of the peak of confirmed cases of contagion by COVID-19. There it is observed that the Bootstrap estimators show a better performance than the EMV, as well as a considerable increase in their value on day 35. However, for values ​​of n ≤ 10 the EMV show a better performance than the Bootstrap estimators.

Discussion

About the results of the entropy measurement of the models, Cuestas, (2013) points out that entropy is related to the information generation rate, hence the increase in the AIC and EER values ​​associated with the three models as n grows, it may be related to the rate of generation of official information expressed in the differential of the number of rapid and molecular tests applied. Regarding the Bootstrap estimators, Quintana (2003) points out that the error of the Bootstrap approximation to the distribution of the pivotal Tn  is of order n-1  in probability, so the Bootstrap can not only allow approximating the probabilistic distribution statistics of interest when obtaining it is complex, but also allows to improve the normal approximation of the classical estimators, among them the EMV. In this sense, this may explain the performance of the Bootstrap estimators against the EMV when n grows, and in turn, the behavior of the EMV against the Bootstrap estimators when n ≤ 10.

 

Table 1. Evaluation of the Entropy of Non-linear Models Adjusted to the Data of Confirmed Cases of Contagion by COVID-19 in Peru between March 6 - April 15, 2020.

Model

Day

(n)

AIC

EER

Standard Error of the Estimator σβi

b

c

d

e

Exponential

5

16,759

0,918

-

3,916

3,663

1,688

10

81,086

11,175

-

1979,617

7,758

300,304

15

159,928

42,816

-

4414,170

23,613

261,318

20

226,444

61,771

-

5523,214

29,102

280,223

25

360,903

299,779

-

7324,569

59,657

240,469

30

413,530

219,771

-

9990,748

82,813

197,884

35

567,516

749,163

-

19743,778

259,320

440,470

 

41

740,754

1910,435

-

24706,220

610,530

600,690

Gompertz

5

19,061

1,338

0,456

117,441

5,046

8,714

10

61,233

4,047

0,029

2,520

1424,007

5,051

15

115,949

9,658

0,014

4,509

1935,455

3,745

20

159,989

11,500

0,023

4,868

67,288

0,580

25

241,163

26,883

0,004

13,015

1781,100

3,380

30

305,814

35,976

0,001

13,889

2887,740

1,577

35

476,137

200,505

0,001

63,644

200460,000

1,761

 

41

571,927

241,088

0,001

52,256

26707,000

1,458

Logistic

5

19,235

1,362

0,510

116,640

4,793

7,736

10

60,086

3,822

0,084

3,179

7338,922

12,575

15

114,991

9,354

0,804

6,384

371,240

2,560

20

157,362

10,769

0,031

5,368

24,473

0,289

25

242,296

27,499

0,025

28,825

3767,651

9,667

30

298,883

32,051

0,011

21,401

8464,700

7,301

35

459,827

158,831

0,010

43,261

589060,000

6,873

 

41

561,218

211,572

0,015

54,610

3842,900

1,349

Figure 1. Adjusted Logistic Model on the Data from Confirmed Cases of COVID-19 Infection in Peru (March 6- April 15, 2020).

Figure 2. Maximum Likelihood Estimate vs. Bootstrap of the Peak of Contagion (d) using a Logistic Model on Data from Confirmed Cases of Contagion by COVID-19 in Peru (March 6-April 15, 2020).

 

Conclusions

The findings of this research fundamentally gravitate around the following aspects: first, it was evidenced that the entropy of the non-linear models considered in this work (exponential, logistic & Gompertz) is related to the information generation rate, which is associated with the differential in the number of tests applied. Likewise, entropy severely affected the maximum likelihood estimators. On the other hand, despite the effects of entropy, the Bootstrap estimators showed a better performance compared to the EMV with the estimated peak of confirmed cases, which showed greater consistency and stability of these estimators, in addition to being less sensitive. The entropy associated with the rate of generation of information related to confirmed cases of contagion by COVID-19 in Peru. However, Bootstrap estimators were significantly affected by sample size, especially when n ≤ 10. It is suggested to consider the entropy and the information generation rate (differential in the application of tests for the diagnosis of COVID-19 in Peru), as well as the use of Bootstrap estimators as an alternative, to estimate parameters of epidemiological models. Finally, the results of this research indicate that there is solid evidence to affirm that in the epidemiological area, especially in the case of COVID-19 in Peru, the criterion based on the entropy of the model allows the description of this phenomenon from a model of mathematical ability to make predictions that contribute to the implementation of control-oriented strategies as a fundamental action within the public policy; however, based on the increase in the rate of information generation and the fact that the Bootstrap and EMV estimators show a trend towards stability, it is presumed that with a slight increase in the number of sampling days (n ≈ 50), more precise estimates could be made of the peak of confirmed cases of COVID-19 infection in Peru.

Appendix 1. R Code to Adjust Non-linear Models on Data Related to the Number of Confirmed Cases of COVID-19 Infection in Peru until April 2020

library(drc)

library(stats)

library(kableExtra)

plot(dia, infectados, xlab = "Tiempo", ylab = "N")

tablamod <- data.frame(

Modelo = c("Exponencial", "Gompertz", "Logístico", "Log-logístico", "Weibull"),

Parametro.fct = c("EXD.3()", "G.4()", "L.5()", "LL.5()", "W1.4()")

)

kable(tablamod, caption = "Tabla 1.  Códigos para modelos en fct") %>%

kable_styling(full_width = F) %>%

column_spec(1, bold = T, border_right = T) %>%

column_spec(2, width = "10em")

dataset1<-data.frame(dia,infectados)

dataset1

result.G <- drm(infectados~dia, data = dataset1, fct = G.4())

summary(result.G)

plot(result.G, xlab = "Día", ylab = "N° de infectados")

AIC(result.G)

result.Exp <- drm(infectados~dia, data = dataset1, fct = EXD.3())

summary(result.Exp)

plot(result.Exp, xlab = "Día", ylab = "N° de infectados")

AIC(result.Exp)

result.Log <- drm(infectados~dia, data = dataset1, fct = L.5())

summary(result.Log)

plot(result.Log, xlab = "Día", ylab = "N° de infectados")

AIC(result.Log)

Appendix 2. R Code to Estimate Non-linear Model Parameters through Bootstrap on Data related to the Number of Confirmed Cases of COVID-19 Infection in Peru until April 2020

library(Bootstrap)

library(ISLR)

library(drc)

library(stats)

library(kableExtra)

plot(dia, infectados, xlab = "Tiempo", ylab = "N")

tablamod <- data.frame(

Modelo = c("Exponencial", "Gompertz", "Logístico", "Log-logístico", "Weibull"),

Parametro.fct = c("EXD.3()", "G.4()", "L.4()", "LL.4()", "W1.4()")

)

kable(tablamod, caption = "Tabla 1.  Códigos para modelos en fct") %>%

kable_styling(full_width = F) %>%

column_spec(1, bold = T, border_right = T) %>%

column_spec(2, width = "10em")

dataset1<-data.frame(dia,infectados)

dataset1

result.Log <- drm(infectados~dia, data = dataset1, fct = L.4())

summary(result.Log)

AIC(result.Log)

set.seed(1)

library(ISLR)

indices.train <- sample(x = nrow(dataset1), size = 0.5*(nrow(dataset1)), replace = FALSE)

datos.entrenamiento <- dataset1[indices.train,]

datos.test <-dataset1[-indices.train,]

n=nrow(dataset1)

head(result.Log)

result.Log$fit$par[3]

meanstar1=mean(dataset1$infectados)

sdstar1=sd(dataset1$infectados)

R = 1000

Fstar = numeric(R)

for (i in 1:R) {

siminfectados= rnorm(n, mean=meanstar1, sd=sdstar1)

simtiempo=dataset1$dia

simdata = data.frame(siminfectados,simtiempo)

result.Log <- drm(siminfectados~simtiempo, data = simdata, fct = L.4())

Fstar[i]=result.Log$fit$par[3]

}

Fstar

mean(Fstar)

hist(Fstar,main="",xlab="Pico de contagio",ylab="Densidad",breaks=90,freq=FALSE).

References

Akaike H (1978b) On the likelihood of a time series model. The Statistician. 27: 237-242.

Alonso A (2001) Bootstrap confidence intervals for indicators in regression. Marrow, 10 (1), 11-20.

Alzahrani, M. I., Alzahrani, M. I., Alsolami, E. A., Al Sulami, A. H., Abdulrahman, A., Alhejaili, M. A. A.

Bruce RD, & Versteeg DJ (1992) A statistical procedure for modeling continuous toxicity data. Environ. Toxicol. Chem, 11(1), 1485–1494.

Cuesta E. (2013). Predictability of the spatial and temporal spread of the AH1N1 influenza epidemic in Argentina. Available at: http://lildbi.fcm.unc.edu.ar/lildbi/tesis/cuestas_eduardo_jose.pdf

Efron B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistic. 7(1): 1- 26.

González C. (2016). Statistical significance and p-values. Cuban Journal of Hygiene and Epidemiology, 54(3), 60-62.

González M, & Landro A. (2018). Information criteria and stochastic complexity. Journal of Research in Financial Models, 7(1), 21-40.

Kullback S. (1959). Information theory and statistics. New York: Wiley.

 Lu H, Stratton CW, & Tang Y. (2020) Outbreak of pneumonia of unknown etiology in Wuhan China: the mystery and the miracle. J. Med. Virol, 92(4), 401-402.

National Institute of Health [Internet]. Lima: INS; 2020 [cited 2020 Apr 9]. Available at: https://web.ins.gob.pe/

Organization for Economic Cooperation and Development (2006) Current approaches in the statistical analysis of ecotoxicity data: A guidance to application – annexes. Paris: p. 80.

Quintana M. (2003) Bootstrap estimation with sequential data. [Doctoral Thesis]. Spain: Department of Mathematics, University of Las Palmas de Gran Canaria; 2003.Available at: https://accedacris.ulpgc.es/bitstream/10553/19572/4/0287981_00000_0000.pdf

R Core Team (2020) R Development Core Team. R: A language and environment for statistical computing. Disponible en: http://www.r-project.org, http://www.bioassay.dk

 

Seber GA, & Wild CJ. (1989) Nonlinear Regression, New York: Wiley & Sons.

 

Sundus, A., Ismail, N. E., & Gnanasan, S. (2018). Exploration of healthcare practitioner’s perception regarding pharmacist’s role in cancer palliative care, Malaysia. Pharmacophore, 9(4), 1-7.

World Health Organization (2019) Technical Focus: Early Clinical and Epidemiological Investigations on COVID-19 for a Public Health Response; 2019. [Internet] [cited 2020 Apr 9]; Available at: https://www.who.int/emergencies/diseases/novel‐coronavirus‐2019/technical‐guidance/early‐investigations.

World Health Organization, WHO Director-General’s Remarks at the Media Briefing on 2019-nCoV. 2020.[Internet]. [citado 08 de abril de 2020]; Disponible en: https://www.who.int/dg/speeches/ detail/who-director-general-s-remarks-at-the-media-briefing-on-2019-ncov-on-11- february-2020.

 

 

INDEXING
ESCI, SCIRUS, BiologyBrowser, Chemical Abstracts, CABI, Intute catalogue, Science Central, EBSCOhost databases, Google Scholars, Genamics JournalSeek, Open J gate, Ulrich's, ResearchGate, Academic Journals Database, CASSI, ACS (American Chemical Society), CiteFactor, and many other international scientific databases.

JOURNAL OF BIOCHEMICAL TECHNOLOGY
JOURNAL OF BIOCHEMICAL TECHNOLOGY
Journal of Biochemical Technology is a peer reviewed International Journal published by the Sevas Publishing on behalf of the Biochemical Technology Society, a Registered Charity Organization from India

AREA OF INTEREST
AREA OF INTEREST
Areas of high interest, cover new advances in enzymatic and protein mechanims; applied molecular genetics and biotechnology; genomics and proteomics; metabolic; medical, environmental, food and agro biotechnology.

FOCUS AND SCOPE
FOCUS AND SCOPE
Journal Of Biochemical Technology” Provides A Medium For The Rapid Publication Of Full-Length Articles, Mini-Reviews Of New And Emerging Products And Short Communications On All Aspects Of ...

Publish with us

Call for papers

Issue 1 Volume 12 - 2021

Publishing steps

1.Prepare
your paper
2.Submit
and revise
3.Track
your research
4.Share
and promote
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge. Keywords include, Biochemical Research: Endo/exocytosis, Trafficking, Membrane Biology, Cell Migration, Cell-Matrix Organelle Biogenesis, Cytoskeleton Proteolysis, Cell Death, Cell Cycle, Cancer, Cell Growth/Death, Differentiation, Drug Targets, Gene Therapy, Models of Disease, Proteomics, Stem Cells, Bioenergetics, Mitochondria, Free Radicals, Redox Signaling, Ion Transport/Channels, Oxidative