---
title: "Mathematical Models for Infection"
author: "A gentle introduction"
date: "Chiara Sabatti, during the Covid-19 shelter-in-place, Spring 2020"
output:
beamer_presentation:
fig_caption: false
fig_height: 5
---
```{r setup, include=FALSE}
#source('http://datascience101.stanford.edu/profile.R')
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(gridExtra)
```
## Goals
- I am not an epidemiologist
- The following is only an attempt to gain understanding of terms and data we are presented with daily
- We start with a simple, in many way unrealistic model
- There are more sophisticated models that one can use and some might be in use in "professional" projections of the COVID-19 epidemic
- It might be an interesting research project to try to understand more complex models
## A simple model for infection
- W. O. Kermack and A. G. McKendrick (1927)
- The goal is to have a sense of how many people are going to be infected over time when a contagious disease enter a population and when the epidemic will end
- It is a *deterministic* model: it gives a fixed number of infected for each unit of time. Of course, reality is not deterministic, but random, a lot of different outcomes are possible on any given day. But this is a starting point.
- It was proposed to explain the rapid rise and fall in the number of infected patients observed in epidemics such as the plague (London 1665-1666, Bombay 1906) and cholera (London 1865).
## A SIR model
We assume that the population has a fixed size $N$ (i.e., no births, deaths due to disease, or deaths by natural causes)
At any point $t$ in time, the population can be partitioned in three groups:
- $S_t$ the people that are *susceptible* to be infected, as they have not been infected yet
- $I_t$ the people that are currently *infected* and capable of transmitting the disease
- $R_t$ the people that have *recovered* from infection (or have died) and therefore cannot be infected or infect any longer.
$$ N=S_t+I_t+R_t$$
Note that we assume that the incubation period of the infectious agent is instantaneous, and duration of infectivity is same as length of the disease.
## Model for infections
- We assume that time is discrete, so that infections happen between one day and the next.
- We assume that, within a day, everyone has the same chance to interact with anyone else in the population in a way that has the same probability of communicating the disease (this requires a completely homogeneous population with no age, spatial, or social structure).
For each affected individual the probability of interacting with another member of the population and infecting him/her is $\beta$. So, any day, one contagious individual is expected to infect $\beta N$ people.
- Note, however, that only the fraction $S_t/N$ of the population can actually be newly infected.
If today we have $I_t$ infected individuals, tomorrow ($t+1$)
$$\text{new infections at time } t+1 =\beta N\times \frac{S_t}{N} \times I_t$$
## The interpretation of $\beta$
- Note that $\beta$ should depend on the population size: if the population is large, the chance of one affected individual to interact with any other specific individual should be smaller.
- So, often, rather than using $\beta$ for these models, one uses an alternative parameter $B$ such that $\beta=B/N$ -- so that the role of population size is explicitly called out.
We will see more about this later.
## Recovery
- We assume that a portion $\gamma$ of the infected recovers at each period
$$\text{new recoveries at time } t+1=\gamma I_{t}$$
- Without going into the mathematics of it, we note that $1/\gamma$ gives you an indication of how long a person is infected: the easiest it is to recover---the larger $\gamma$---the shorter the period one is ill.
## What happens to the three quantities over time
\begin{eqnarray*}
R_t& = & R_{t-1} + \gamma I_{t-1}\\
S_t& = & S_{t-1} - \beta \times S_{t-1} \times I_{t-1}\\
I_t & = & I_{t-1}(1-\gamma)+ \beta \times S_{t-1} \times I_{t-1}
\end{eqnarray*}
**Note**: the number of cases etc. should be integers, but here they can be fractional.
## A simulation for $N=1000$
```{r}
gamma<-0.04
beta<-0.0005
S<-997
I<-3
R=0
Sc<-S
Ic<-I
Rc<-R
for(i in 1:1000)
{
Rc<-Rc+gamma*Ic
NI<-beta*Sc*Ic
Sc<-Sc-NI
Ic<-(1-gamma)*Ic+NI
R<-c(R,Rc)
S<-c(S,Sc)
I<-c(I,Ic)
}
```
## Simulation results
```{r,echo=FALSE}
longtitle<-paste(paste(paste("Epidemic dynamics with beta=",as.character(beta)),", gamma="),as.character(gamma))
plot(S,type="l",xlab="days",ylab="Counts",col="blue",main=longtitle)
lines(1:length(I),I,col="red")
lines(1:length(R),R,col="green")
legend(x=600, y=500, c("Susceptible", "Infected", "Recovered"), col=c("blue","red","green"),lty=c(1,1,1),box.lty=0)
```
## What influence the parameters?
- Suppose that one disease can be communicated only by sharing saliva and another as soon as two people are within 6 feet of each other. Which parameter value should be different in the model for these two diseases?
- Suppose we find a cure that shortens the period while a person is sick and infective. What parameters would this change?
- Suppose at some point a vaccine is discovered and administered to a fraction of the population: what would this change?
- Suppose people practice "social distancing": what would this change in the model?
To test out some of our ideas, we are going to change the simulation
## Let's change the simulation
```{r}
gamma<-0.04
beta<-0.0005/5
S<-997
I<-3
R=0
Sc<-S
Ic<-I
Rc<-R
for(i in 1:1000)
{
Rc<-Rc+gamma*Ic
NI<-beta*Sc*Ic
Sc<-Sc-NI
Ic<-(1-gamma)*Ic+NI
R<-c(R,Rc)
S<-c(S,Sc)
I<-c(I,Ic)
}
```
## Simulation results
```{r,echo=FALSE}
longtitle<-paste(paste(paste("Epidemic dynamics with beta=",as.character(beta)),", gamma="),as.character(gamma))
plot(S,type="l",xlab="days",ylab="Counts",col="blue",main=longtitle,ylim=c(0,1000))
lines(1:length(I),I,col="red")
lines(1:length(R),R,col="green")
legend(x=600, y=500, c("Susceptible", "Infected", "Recovered"), col=c("blue","red","green"),lty=c(1,1,1),box.lty=0)
```
## Let's change the simulation again
```{r}
gamma<-0.04*2
beta<-0.0005
S<-997
I<-3
R=0
Sc<-S
Ic<-I
Rc<-R
for(i in 1:1000)
{
Rc<-Rc+gamma*Ic
NI<-beta*Sc*Ic
Sc<-Sc-NI
Ic<-(1-gamma)*Ic+NI
R<-c(R,Rc)
S<-c(S,Sc)
I<-c(I,Ic)
}
```
## Simulation results
```{r,echo=FALSE}
longtitle<-paste(paste(paste("Epidemic dynamics with beta=",as.character(beta)),", gamma="),as.character(gamma))
plot(S,type="l",xlab="days",ylab="Counts",col="blue",main=longtitle,ylim=c(0,1000))
lines(1:length(I),I,col="red")
lines(1:length(R),R,col="green")
legend(x=600, y=500, c("Susceptible", "Infected", "Recovered"), col=c("blue","red","green"),lty=c(1,1,1),box.lty=0)
```
## Getting a bit closer to the meening of parameters
Let's say we are at the beginning of the epidemic, so that the number of susceptible individuals is $\approx N$: there are no recovered, and the infected are a transcurable number
(this is particularly true if $N$ is large).
Then, one infectious individual will infect $\displaystyle\frac{\beta}{\gamma}N$ people. Why?
- At each time periods, the individual infects $\beta N$ people
- An individual is contagious for $1/\gamma$ time periods
This quantity is important enough that it gets a name:
$$R_0=\frac{\beta}{\gamma}S_0=\frac{\beta}{\gamma}N$$
## $R_0$ and epidemic
$R_0$ is the average number of people one case infects.
When **$R_0<1$**, each person who contracts the disease will infect fewer than one person before dying or recovering, so the outbreak will peter out.
When **$R_0>1$** each person who gets the disease will infect more than one person, so the epidemic will spread.
To verify this we can look at our simulation and tweek the parameters so that we get one $R_0<1$:
$R_0=\frac{0.00005\times 1000}{0.06}=0.83$
**Note**: the results actually hold for a continuous model, so with discrete numbers things might be a bit funny around the edges.
```{r, echo=FALSE,include=FALSE}
gamma<-0.06
beta<-0.00005
S<-997
I<-3
R=0
Sc<-S
Ic<-I
Rc<-R
for(i in 1:200)
{
Rc<-Rc+gamma*Ic
NI<-beta*Sc*Ic
Sc<-Sc-NI
Ic<-(1-gamma)*Ic+NI
R<-c(R,Rc)
S<-c(S,Sc)
I<-c(I,Ic)
}
```
## Simulation results: no epidemic
```{r,echo=FALSE}
longtitle<-paste(paste(paste("Epidemic dynamics with beta=",as.character(beta)),", gamma="),as.character(gamma))
plot(S,type="l",xlab="days",ylab="Counts",col="blue",main=longtitle,ylim=c(0,1000))
lines(1:length(I),I,col="red")
lines(1:length(R),R,col="green")
legend(x=150, y=500, c("Susceptible", "Infected", "Recovered"), col=c("blue","red","green"),lty=c(1,1,1),box.lty=0)
```
## The early phase of the epidemic
Now, let's look at the beginning of the epidemic, when $R_0>1$, and $N$ is large, so that for a while $S_t\approx N$.
Then
$$I_t=I_{t-1}+ \beta N I_{t-1}-\gamma I_{t-1}= (1+\beta N-\gamma)I_{t-1}$$
Which means that
$$I_{t}=(1+\beta N-\gamma)^tI_0,$$
that is, there is an **exponential** growth of the number of cases. Note also that
$$I_{t}-I_{t-1}=(\beta N-\gamma)I_{t-1},$$
that is the number of new cases in each day is **proportional** to the total number of cases.
## Early stage epidemic
Let $\rho=\beta N-\gamma$; note that if $R_0>1 \Longleftrightarrow\rho>0$: the role of $R_0$ in determining if there is an epidemic or not is reflected in $\rho$.
\begin{eqnarray*}
I_{t}& = &(1+\rho)^{t}I_0=1.06^{t}I_0\\
I_{t}-I_{t-1}& = &(\rho)I_{t-1}\\
log(I_{t}-I_{t-1})& = &\log(\rho)+ \log(I_{t-1})
\end{eqnarray*}
*Note*: we obviously cannot take the $\log()$ when $\rho<0$.
Let's choose some parameter values that give a "slow" growth, so that we can have a "long" early stage in our small population $N=1000$: $\beta=0.0001$, $\alpha=0.04$.
*Note*: In our case $\rho=0.0001 \times\displaystyle 1000- 0.04= 0.06$ and the number of cases here will be very small and it will look unrealistic, but this is what we need with a small population to have a longish first fase.
```{r,echo=FALSE,include=FALSE}
gamma<-0.04
beta<-0.0001
S<-999
I<-1
R=0
Sc<-S
Ic<-I
Rc<-R
for(i in 1:1000)
{
Rc<-Rc+gamma*Ic
NI<-beta*Sc*Ic
Sc<-Sc-NI
Ic<-(1-gamma)*Ic+NI
R<-c(R,Rc)
S<-c(S,Sc)
I<-c(I,Ic)
}
```
## Exponential growth of cases in early stages: simulation
$$I_{t}=(1+\rho)^{t}I_0=1.06^{t}I_0$$
```{r,echo=FALSE}
par(mfrow=c(1,2))
plot(I[1:100],xlab="Day",ylab="Counts",main="Total cases")
lines(1:100,(1+0.06)^(0:99),col="red")
plot(diff(I)[1:100],xlab="Day",ylab="Counts",main="New cases")
```
## New cases and existing cases in early stages
$$I_{t}-I_{t-1}=\rho I_{t-1}=0.06I_{t-1}$$
```{r,echo=FALSE}
plot((I[1:100]),(diff(I)[1:100]),xlab="Total cases",ylab="New Cases")
abline(a=0,b=0.06,col="red")
```
## New cases and existing cases in early stages
$$log(I_{t}-I_{t-1})=\log(\rho)+ \log(I_{t-1})= \log(0.06)+ \log(I_{t-1})$$
```{r,echo=FALSE}
plot(log(I[1:100]),log(diff(I)[1:100]),xlab="Total cases",ylab="New Cases")
abline(a=log(0.06),b=1,col="red")
```
## New cases and existing cases in early stages - $\log$ plot
- The role of $\beta$, $N$ and $\gamma$ here is simply in defining the intercept $\log(\rho)$ of the line with slope 1.
- It is easier to resolve on the same graph the very first stages when the number of cases is small and the later stages when the number of cases is high
- It is easier to detect when we depart from the slope of line 1
## Data
Kermack and McKendrick (1927) [A contribution to the mathematical theory of epidemics](https://royalsocietypublishing.org/doi/10.1098/rspa.1927.0118), *Proceedings of the Royal Society, A*
![](bombay.pdf)
## Data from: https://github.com/nytimes/covid-19-data
Each US state corresponds to a series of connected points
```{r,echo=FALSE,include=FALSE}
us<-read_csv("USAcovid/us-states.csv")
colorsState<-read_csv("states_info.csv")
StateColours<-(colorsState[,2][[1]])
```
```{r,echo=FALSE,message = FALSE,warning=FALSE,fig.height=4,fig.width=6,out.width='.89\\linewidth'}
p<-ggplot(us)+geom_line(aes(x=date,y=(cases),col=state),se=FALSE,stat = "smooth",alpha=0.7) +geom_point(aes(x=date,y=(cases),col=state),alpha=0.7)+ labs(x="Date",y="Cases", title="Total COVID-19 confirmed cases over the course of the epidemic") + theme_bw() +scale_color_manual(values=StateColours,name="State")+
theme(legend.position="none", legend.box = "horizontal")
p
```
Red is NY, black NJ, golden CA, green WA, blue MI
## USA data
```{r,echo=FALSE,message = FALSE,warning=FALSE,fig.height=4,fig.width=6,out.width='.89\\linewidth'}
p<-ggplot(us)+geom_line(aes(x=date,y=(deaths),col=state),se=FALSE,stat = "smooth",alpha=0.7) +geom_point(aes(x=date,y=(deaths),col=state),alpha=0.7)+ labs(x="Date",y="Cases", title="Total COVID-19 deaths over the course of the epidemic") + theme_bw() +scale_color_manual(values=StateColours,name="State")+
theme(legend.position="none", legend.box = "horizontal")
p
```
Red is NY, black NJ, golden CA, green WA, blue MI
## Few USA states
```{r,echo=FALSE,message = FALSE,fig.height=6,fig.width=6,out.width='.99\\linewidth'}
now<-"New York"
thisstate<-filter(us,state==now)
mydiff<-function(x){return(c(x[1],diff(x)))}
thisstate<-mutate(thisstate,new_cases=mydiff(cases))
plot(log(thisstate$cases),log(thisstate$new_cases),col=colorsState[colorsState$state==now,2][[1]],pch=19,xlab="Total cases (log scale)",ylab="New case (log scale)")
now<-"California"
thisstate<-filter(us,state==now)
thisstate<-mutate(thisstate,new_cases=mydiff(cases))
points(log(thisstate$cases),log(thisstate$new_cases),col=colorsState[colorsState$state==now,2][[1]],pch=19)
now<-"Washington"
thisstate<-filter(us,state==now)
thisstate<-mutate(thisstate,new_cases=mydiff(cases))
points(log(thisstate$cases),log(thisstate$new_cases),col=colorsState[colorsState$state==now,2][[1]],pch=19)
now<-"New Jersey"
thisstate<-filter(us,state==now)
thisstate<-mutate(thisstate,new_cases=mydiff(cases))
points(log(thisstate$cases),log(thisstate$new_cases),col=colorsState[colorsState$state==now,2][[1]],pch=19)
abline(a=0,b=1,col="gray70")
legend(x=0, y=8, c("New York","California","Washington","New Jersey"), fill=c("red3","goldenrod1","green2","black"),box.lty=0)
```
## Looking at data
- In the early stages, the role of $\gamma$ and $\beta$ is indistingushable: we only get to see the effect of $\rho=\beta N-\gamma$.
- Is COVID-19 spreading less in California than in New York ($\beta_{CA}<\beta_{NY}$?)
- It is difficul to interpret the data, as we do not have a complete count of the number of cases; this strongly depends on the number of individuals who have been tested.
## Few USA states --looking at deaths
```{r,echo=FALSE,message = FALSE,fig.height=6,fig.width=6,out.width='.99\\linewidth'}
now<-"New York"
thisstate<-filter(us,state==now)
mydiff<-function(x){return(c(x[1],diff(x)))}
thisstate<-mutate(thisstate,new_deaths=mydiff(deaths))
plot(log(thisstate$cases),log(thisstate$new_deaths),col=colorsState[colorsState$state==now,2][[1]],pch=19,xlab="Total deaths (log scale)",ylab="New deaths (log scale)",ylim=c(0,12),xlim=c(0,12))
now<-"California"
thisstate<-filter(us,state==now)
thisstate<-mutate(thisstate,new_deaths=mydiff(deaths))
points(log(thisstate$cases),log(thisstate$new_deaths),col=colorsState[colorsState$state==now,2][[1]],pch=19)
now<-"Washington"
thisstate<-filter(us,state==now)
mydiff<-function(x){return(c(x[1],diff(x)))}
thisstate<-mutate(thisstate,new_deaths=mydiff(deaths))
points(log(thisstate$cases),log(thisstate$new_deaths),col=colorsState[colorsState$state==now,2][[1]],pch=19)
now<-"New Jersey"
thisstate<-filter(us,state==now)
mydiff<-function(x){return(c(x[1],diff(x)))}
thisstate<-mutate(thisstate,new_deaths=mydiff(deaths))
points(log(thisstate$cases),log(thisstate$new_deaths),col=colorsState[colorsState$state==now,2][[1]],pch=19)
abline(a=0,b=1,col="gray70")
legend(x=0, y=8, c("New York","California","Washington","New Jersey"), fill=c("red3","goldenrod1","green2","black"),box.lty=0)
```