# Fitting the SIR model of disease to data in Python

Share on:

## Introduction and the problem

The SIR model for spread of disease was first proposed in 1927 in a collection of three articles in the Proceedings of the Royal Society by Anderson Gray McKendrick and William Ogilvy Kermack; the resulting theory is known as Kermack–McKendrick theory; now considered a subclass of a more general theory known as compartmental models in epidemiology. The three original articles were republished in 1991, in a special issue of the Bulletin of Mathematical Biology.

The SIR model is so named because it assumes a static population, so no births or deaths, divided into three mutually exclusive classes: those susceptible to the disease; those infected with the disease, and those recovered with immunity from the disease. This model is clearly not applicable to all possible epidemics: there may be births and deaths, people may be re-infected, and so on. More complex models take these and other factors into account.

The SIR model consists of three non-linear ordinary differential equations, parameterized by two growth factors $$\beta$$ and $$\gamma$$:

\begin{eqnarray*} \frac{dS}{dt}&=&-\frac{\beta IS}{N}\\ \frac{dI}{dt}&=&\frac{\beta IS}{N}-\gamma I\\ \frac{dR}{dt}&=&\gamma I \end{eqnarray*}

Here $$N$$ is the population, and since each of $$S$$, $$I$$ and $$R$$ represent the number of people in mutually exclusive sets, we should have $$S+I+R=N$$. Note that the right hand sides of the equations sum to zero, hence $\frac{dS}{dt}+\frac{dI}{dt}+\frac{dR}{dt}=\frac{dN}{dt}=0$ which indicates that the population is constant.

The problem here is to see if values of $$\beta$$ and $$\gamma$$ can be found which will provide a close fit to a well-known epidemiological case study: that of influenza in a British boarding school. This was described in a 1978 issue of the British Medical Journal.

This should provide a good test of the SIR model, as it satisfies all of the criteria. In this case there was a total population of 763, and an outbreak of 14 days, with infected numbers as:

Day: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Infections: 1 3 6 25 73 222 294 258 237 191 125 69 27 11 4

Some years ago, on my old (and now taken off line) blog, I explored using Julia for this task, which was managed easily (once I got the hang of Julia syntax). However, that was five years ago, and when I tried to recreate it I found that Julia has changed over the last five years, and my original comments and code no longer worked. So I decided to experiment with Python instead, in which I have more expertise, or at least, experience.

## Using Python

### Setup and some initial computation

We start by importing some of the modules and functions we need, and define the data from the table above:

1import matplotlib.pyplot as plt
2import numpy as np
3from scipy.integrate import solve_ivp
4
5data = [1, 3, 6, 25, 73, 222, 294, 258, 237, 191, 125, 69, 27, 11, 4]


Now we enter the SIR system with some (randomly chosen) values of $$\beta$$ and $$\gamma$$, using syntax conformable with the solver solve_ivp:

1beta,gamma = [0.01,0.1]
2
3def SIR(t,y):
4    S = y
5    I = y
6    R = y
7    return([-beta*S*I, beta*S*I-gamma*I, gamma*I])


We can now solve this system using solve\_ivp and plot the results:

1sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))
2
3fig = plt.figure(figsize=(12,4))
4plt.plot(sol.t,sol.y)
5plt.plot(sol.t,sol.y)
6plt.plot(sol.t,sol.y)
7plt.plot(np.arange(0,15),data,"k*:")
8plt.grid("True")
9plt.legend(["Susceptible","Infected","Removed","Original Data"])


Note that we have used the t_eval argument in our call to solve_ivp which allows us to exactly specify the points at which the solution will be given. This will allow us to align points in the computed values of $$I$$ with the original data.

The output is this plot:

There is a nicer plot, with more attention paid to setup and colours, here. This use of scipy to solve the SIR equations uses the odeint tool. This works perfectly well, but I believe it is being deprecated in favour of solve_ivp.

### Fitting the data

To find values $$\beta$$ and $$\gamma$$ with a better fit to the data, we start by defining a function which gives the sum of squared differences between the data points, and the corresponding values of $$I$$. The problem will then be minimize that function.

1def sumsq(p):
2    beta, gamma = p
3    def SIR(t,y):
4	S = y
5	I = y
6	R = y
7	return([-beta*S*I, beta*S*I-gamma*I, gamma*I])
8    sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))
9    return(sum((sol.y[::5]-data)**2))


As you see, we have just wrapped the SIR definition and its solution inside a calling function whose variables are the parameters of the SIR equations.

To minimize this function we need to import another python function first:

1from scipy.optimize import minimize
2
4msol.x


with output:

1array([0.00218035, 0.44553886])


To see if this does provide a better fit, we can simply run the solver with these values, and plot them, as we did at the beginning:

1beta,gamma = msol.x
2def SIR(t,y):
3    S = y
4    I = y
5    R = y
6    return([-beta*S*I, beta*S*I-gamma*I, gamma*I])

1sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))

1fig = plt.figure(figsize=(10,4))
2plt.plot(sol.t,sol.y,"b-")
3plt.plot(sol.t,sol.y,"r-")
4plt.plot(sol.t,sol.y,"g-")
5plt.plot(np.arange(0,15),data,"k*:")
6plt.legend(["Susceptible","Infected","Removed","Original Data"])


with output:

and as you see, a remarkably close fit!