# Fitting the SIR model of disease to data in Python

## Introduction and the problem

The SIR model for spread of disease was first proposed in 1927 in a collection of three articles in the Proceedings of the Royal Society by Anderson Gray McKendrick and William Ogilvy Kermack; the resulting theory is known as KermackâMcKendrick theory; now considered a subclass of a more general theory known as compartmental models in epidemiology. The three original articles were republished in 1991, in a special issue of the Bulletin of Mathematical Biology.

The SIR model is so named because it assumes a static population, so no births or deaths, divided into three mutually exclusive classes: those susceptible to the disease; those infected with the disease, and those recovered with immunity from the disease. This model is clearly not applicable to all possible epidemics: there may be births and deaths, people may be re-infected, and so on. More complex models take these and other factors into account.

The SIR model consists of three non-linear ordinary differential equations, parameterized by two growth factors $$\beta$$ and $$\gamma$$:

\begin{eqnarray*} \frac{dS}{dt}&=&-\frac{\beta IS}{N}\\
\frac{dI}{dt}&=&\frac{\beta IS}{N}-\gamma I\\
\frac{dR}{dt}&=&\gamma I \end{eqnarray*}

Here $$N$$ is the population, and since each of $$S$$, $$I$$ and $$R$$ represent the number of people in mutually exclusive sets, we should have $$S+I+R=N$$. Note that the right hand sides of the equations sum to zero, hence $\frac{dS}{dt}+\frac{dI}{dt}+\frac{dR}{dt}=\frac{dN}{dt}=0$ which indicates that the population is constant.

The problem here is to see if values of $$\beta$$ and $$\gamma$$ can be found which will provide a close fit to a well-known epidemiological case study: that of influenza in a British boarding school. This was described in a 1978 issue of the British Medical Journal.

This should provide a good test of the SIR model, as it satisfies all of the criteria. In this case there was a total population of 763, and an outbreak of 14 days, with infected numbers as:

Day: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Infections: 1 3 6 25 73 222 294 258 237 191 125 69 27 11 4

Some years ago, on my old (and now taken off line) blog, I explored using Julia for this task, which was managed easily (once I got the hang of Julia syntax). However, that was five years ago, and when I tried to recreate it I found that Julia has changed over the last five years, and my original comments and code no longer worked. So I decided to experiment with Python instead, in which I have more expertise, or at least, experience.

## Using Python

### Setup and some initial computation

We start by importing some of the modules and functions we need, and define the data from the table above:

import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import solve_ivp

data = [1, 3, 6, 25, 73, 222, 294, 258, 237, 191, 125, 69, 27, 11, 4]

Now we enter the SIR system with some (randomly chosen) values of $$\beta$$ and $$\gamma$$, using syntax conformable with the solver solve_ivp:

beta,gamma = [0.01,0.1]

def SIR(t,y):
S = y[0]
I = y[1]
R = y[2]
return([-beta*S*I, beta*S*I-gamma*I, gamma*I])

We can now solve this system using solve\_ivp and plot the results:

sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))

fig = plt.figure(figsize=(12,4))
plt.plot(sol.t,sol.y[0])
plt.plot(sol.t,sol.y[1])
plt.plot(sol.t,sol.y[2])
plt.plot(np.arange(0,15),data,"k*:")
plt.grid("True")
plt.legend(["Susceptible","Infected","Removed","Original Data"])

Note that we have used the t_eval argument in our call to solve_ivp which allows us to exactly specify the points at which the solution will be given. This will allow us to align points in the computed values of $$I$$ with the original data.

The output is this plot:

There is a nicer plot, with more attention paid to setup and colours, here. This use of scipy to solve the SIR equations uses the odeint tool. This works perfectly well, but I believe it is being deprecated in favour of solve_ivp.

### Fitting the data

To find values $$\beta$$ and $$\gamma$$ with a better fit to the data, we start by defining a function which gives the sum of squared differences between the data points, and the corresponding values of $$I$$. The problem will then be minimize that function.

def sumsq(p):
beta, gamma = p
def SIR(t,y):
S = y[0]
I = y[1]
R = y[2]
return([-beta*S*I, beta*S*I-gamma*I, gamma*I])
sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))
return(sum((sol.y[1][::5]-data)**2))

As you see, we have just wrapped the SIR definition and its solution inside a calling function whose variables are the parameters of the SIR equations.

To minimize this function we need to import another python function first:

from scipy.optimize import minimize

msol.x

with output:

array([0.00218035, 0.44553886])

To see if this does provide a better fit, we can simply run the solver with these values, and plot them, as we did at the beginning:

beta,gamma = msol.x
def SIR(t,y):
S = y[0]
I = y[1]
R = y[2]
return([-beta*S*I, beta*S*I-gamma*I, gamma*I])
sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))
fig = plt.figure(figsize=(10,4))
plt.plot(sol.t,sol.y[0],"b-")
plt.plot(sol.t,sol.y[1],"r-")
plt.plot(sol.t,sol.y[2],"g-")
plt.plot(np.arange(0,15),data,"k*:")
plt.legend(["Susceptible","Infected","Removed","Original Data"])

with output:

and as you see, a remarkably close fit!