Share on:

The circumference of an ellipse

Note: This blog post is mainly computational, with a hint of proof-oriented mathematics here and there. For a more in-depth analysis, read the excellent article "Gauss, Landen, Ramanujan, the Arithmetic-Geometric Mean, Ellipses, pi, and the Ladies Diary" by Gert Akmkvist and Bruce Berndt, in The American Mathematical Monthly, vol 95 no. 7 (August-September 1988), pages 585-608, and happily made available online for free at

This article was the 1989 winner of the MAA Halmos-Ford award "For outstanding expository papers in The American Mathematical Monthly" and you should certainly read it.


It's one of the delights or annoyances of mathematics, which ever way you look at it, that there's no simple formula for the cicrcumference of an ellipse comparable to \(2\pi r\) for a circle.

Indeed, for an ellipse with semi axes \(a\) and \(b\), the circumference can be expressed as the integral

\[ 4\int_0^{\pi/2}\sqrt{a^2\cos^2\theta + b^2\sin^2\theta},d\theta \]

which is one of a class of integrals called elliptic integrals, and which cannot be expressed using algebraic or standard transcendental functions.

However, it turns out that there are ways of very quickly computing the circumference on an ellipse to any desired accuracy, using methods which originate with Carl Friedrich Gauss.

The arithmetic-geometric mean

Given \(a>b>0\), we can define two sequences by:

\[ a_0 = a,\qquad a_{k+1}=(a_k+b_k)/2 \]


\[ b_0=b,\qquad b_{k+1}=\sqrt{a_kb_k}. \]

Thus the \(a\) values are the arithmetic means of the previous pair; the \(b\) values the geometric mean of the pair.


\[ b<\sqrt{ab}<\frac{a+b}{2}<a \]

and since \(a_{k+1}<a_k\) and \(b_{k+1}>b_k\), it follows that the \(a\) values are decreasing and bounded below, and the \(b\) values are increasing and bounded above, so they both converge. Also, if \(c_k=\sqrt{a_k^2-b_k^2}\) then (see Almqvist \& Berndt pp 587-588):

\begin{align*} c_{k+1}&=\sqrt{a_{k+1}^2-b_{k+1}^2}\\\
&<\frac{c_k^2}{4M(a,b)}. \end{align*}

This shows that not only do the sequences converge to the same limit, but that the sequences converge quadratically; each iteration being double the precision of the previous.

The common limit is called the arithmetic-geometric mean of \(a\) and \(b\) and will be denoted here as \(M(a,b)\).

To give an indication of this speed, in Python:

  from mpmath import mp
  mp.dps = 50
  a,b = mp.mpf('3'), mp.mpf('2')
  for i in range(10):
      a,b = (a+b)/2, mp.sqrt(a*b)
      print('{0:52s} {1:52s}'.format(str(a),str(b)))

2.5                                                  2.4494897427831780981972840747058913919659474806567
2.4747448713915890490986420373529456959829737403283  2.4746160019198827700554874647235766528956885806854
2.4746804366557359095770647510382611744393311605069  2.474680435816873015671798992747485357556612143505
2.4746804362363044626244318718928732659979716520059  2.4746804362363044625888873352899297125054403531594
2.4746804362363044626066596035914014892517060025827  2.4746804362363044626066596035914014892516421855508
2.4746804362363044626066596035914014892516740940667  2.4746804362363044626066596035914014892516740940667
2.4746804362363044626066596035914014892516740940667  2.4746804362363044626066596035914014892516740940667
2.4746804362363044626066596035914014892516740940667  2.4746804362363044626066596035914014892516740940667
2.4746804362363044626066596035914014892516740940667  2.4746804362363044626066596035914014892516740940667
2.4746804362363044626066596035914014892516740940667  2.4746804362363044626066596035914014892516740940667

You see that we have reached the limit of precision in six steps. To better demonstrate the speed of convergence, use greater precision and display the difference \(a_k-b_k\):

  mp.dps = 2000
  a,b = mp.mpf('3'), mp.mpf('2')
  for i in range(10):
      a,b = (a+b)/2, mp.sqrt(a*b)


You can see that the precision does indeed roughly double at each step.

Elliptic integrals and the AGM

This integral:

\[ K(k) = \int_0^{\pi/2}\frac{1}{\sqrt{1-k^2\sin^2\theta}}d\theta \]

is called a complete elliptic integral of the first kind. If

\[ k^2=1-\frac{b^2}{a^2} \]


\[ \int_0^{\pi/2}\frac{1}{\sqrt{1-k^2\sin^2\theta}} d\theta = a\int_0^{\pi/2} \frac{1}{\sqrt{a^2\cos^2\theta +b^2\sin^2\theta}} d\theta \]

and this value is denoted \(I(a,b)\). Gauss observed (and proved) that

\[ I(a,b)=\frac{\pi}{2M(a,b)}. \]

This is in fact equivalent to the assertion that

\[ K(x) = \frac{\pi}{2M(1-x,1+x)}. \]

Gauss started with assuming that \(M(1-x,1+x)\) was an even function, so could be expressed as a power series in even powers of \(x\). He then compared the power series representing \(K(x)\) with the integral computed by expanding the integrand as a power series using the binomial theorem and integrating term by term, to obtain

\[ \frac{1}{M(1-x,1+x)}=\frac{2}{\pi}K(x)=1+\left(\frac{1}{2}\right)^{!2}x^2+ \frac{1\cdot 3}{2\cdot 4}^{!2}x^4+ \frac{1\cdot 3\cdot 5}{2\cdot 4\cdot 6}^{!2}x^6+\cdots \]

and this power series can be written as

\[ \sum_{n=0}^{\infty}\frac{(2n)!^2}{2^{4n}(n!)^4}x^{2n}. \]

Another proof, apparently originally due to Newman, uses first the substition \(t=a\tan\theta\) to rewrite the integral:

\[ I(a,b)=\int_0^{\pi/2}\frac{1}{\sqrt{a^2\cos^2\theta+b^2\sin^2\theta}} d\theta = \int^{\infty}_{-\infty}\frac{1}{\sqrt{(a^2+t^2)(b^2+t^2)}} dt \]

and make the substitution

\[ u = \frac{1}{2}\left(t-\frac{ab}{t}\right) \]

which produces (after "some care", according to the Borwein brothers in their book Pi and the AGM):

\[ I(a,b) = \int^{\infty}_{-\infty}\frac{1}{\sqrt{(\left(\frac{a+b}{2}\right)^2+t^2)(ab+t^2)}} dt =I(\frac{a+b}{2},\sqrt{ab}) = I(a_1,b_1). \]

Continuing this process we see that \(I(a,b) = I(a_k,b_k)\) for any \(k\), and taking the limit, that \(I(a,b)=I(M,M)\). Finally

\[ I(M,M) = \int_0^{\pi/2}\frac{1}{\sqrt{M^2\cos^2\theta+M^2\sin^2\theta}} d\theta = \int_0^{\pi/2}\frac{1}{M},d\theta = \frac{\pi}{2M}. \]

So we now know that the complete elliptic integral of the first kind is computable by means of the AGM. But what about the complete elliptic integral of the second kind?

Complete elliptic integrals of the second kind

These are defined as

\begin{align*} E(k)&=\int_0^{\pi/2}\sqrt{1-k^2\sin^2\theta} d\theta\\\
&=\int_0^1\sqrt{\frac{1-k^2t^2}{1-t^2}},dt. \end{align*}

Note: An alternative convention is to write:

\begin{align*} E(m)&=\int_0^{\pi/2}\sqrt{1-m\sin^2\theta} d\theta\\\
&=\int_0^1\sqrt{\frac{1-mt^2}{1-t^2}},dt. \end{align*}

This elliptic integral is the one we want, since the perimeter of an ellipse given in cartesion form as

\[ \frac{x^2}{a^2}+\frac{y^2}{b^2}=1 \]

is equal to

\[ 4aE\left(\sqrt{1-\frac{b^2}{a^2}}\right) \]

(using the first definition). This can be written as

\[ 4aE(e) \]

where \(e\) is the eccentricity of the ellipse.

The computation of elliptic integrals goes back as far as Euler, taking in Gauss and then Legendre along the way. Adrien-Marie Legendre (1752-1833) is better known now than he was in his lifetime; his contributions to elliptic integrals are now seen as fundamental, and in paving the way for the greater work of Abel and Jacobi.

In particular, Legendre showed that

\[ \frac{K(k)-E(k)}{K(k)}=\frac{1}{2}(c_0^2+2c_1^2+4c_2^2+\cdots + 2^nc_n^2+\cdots ) \]


\[ c_n^2=a_n^2-b_n^2. \]

The above equation can be rewritten to provide a fast-converging series for \(E(k)\):

\begin{align*} E(k)&=K(k)(1-\frac{1}{2}(c_0^2+2c_1^2+4c_2^2+\cdots + 2^nc_n^2+\cdots))\\\
&=\frac{\pi}{2M(a,b)}(1-\frac{1}{2}(c_0^2+2c_1^2+4c_2^2+\cdots + 2^nc_n^2+\cdots)). \end{align*}

In the sequences for the AGM, let \(M(a,b)\) be approximated by \(a_n\). This produces a sequence that convereges very quickly to \(E(k)\):

\begin{align*} s_0&=\frac{\pi}{2a}(1-\frac{1}{2}c_0^2)\\\
s_2&=\frac{\pi}{2a_2}(1-\frac{1}{2}(c_0^2+2c_1^2+4c_2^2)) \end{align*}

This can easily be managed recursively, given \(a\) and \(b\), by setting

\[ a_0=a,\qquad,b_0=b,\qquad p_0=1,\qquad s_0=a^2-b^2 \]

and then iterating by

\begin{align*} a_{k+1}&=(a_k+b_k)/2\\\
s_{k+1}&=s_k+p_k(a_{k+1}^2-b_{k+1}^2) \end{align*}

Then the values

\[ e_k = \frac{2\pi}{a_k}(a_0^2-\frac{1}{2}s_k) \]

approach the perimeter of the ellipse.

We can demonstrate this in Python for \(a,b=3,2\), again using the multiprecision library mpmath:

  mp.dps = 100
  a = mp.mpf(3)
  b = mp.mpf(2)
  p = 1
  s = a**2-b**2
  a0 = a
  e = 2*mp.pi/a*(a0*a0-1/2*s)
  for i in range(11):
      a, b, p = (a+b)/2, np.sqrt(a*b), p*2
      s += p*(a**2-b**2)


We see that we have reached the limits of this precision very quickly. And as above, we can look instead at the differences between succesive approximations:

a = mp.mpf(3)
b = mp.mpf(2)
p = 1
s = a**2-b**2
a0 = a
e = 2*mp.pi/a*(a0*a0-1/2*s)
for i in range(11):
    a, b, p = (a+b)/2, np.sqrt(a*b), p*2
    s += p*(a**2-b**2)
    e1 = 2*mp.pi/a*(a0*a0-1/2*s)
    e = e1


The convergence is seen to be quadratic.

Using Excel

It might seem utterly regressive to use Excel - or indeed, any spreadsheet - for computations such as this, but in fact Excel can be used very easily for recursive computations, as long as you're happy to use only IEEE double precision.

In Cells A1-E2, enter:

1 3 2 1 =A1^2-B1^2
2 =(A1+B1)/2 =SQRT(A1*B1) =2*C1 =D1+C2*(A2^2-B2^2) =2*PI()/A2*($A$1^2-1/2*D2)

and then copy A2-E2 down a few rows. Because of overflow errors, if you show 14 decimal places in column E, they'll never quite settle down. But in fact you reach the limits of double precision by row 5.

Voting power (7): Quarreling voters

In all the previous discussions of voting power, we have assumed that all winning coalitions are equally likely. But in practice that is not necessarily the case. Two or more voters may be opposed on so many issues that they would never vote the same way on any issues: such a pair of voters may be said to be quarrelling.

To see how this may make a difference, consider the voting game \[ [51; 49,48,3] \] The winning coalitions for which a party is critical are \[ [49,48],\; [49,3],\; [48,3] \] and since each party is critical to the same number of winning coalitions, the Banzhaf indices are equal.

But suppose that the two largest parties are quarrelling, and so the only winning coalitions are then \[ [49,3],\; [48,3]. \] This means that the smaller party has twice the power of each of the larger ones!

We can set this up in Sage, using polynomial rings, with an ideal that represents the quarrel. In this case:

sage: q = 51
sage: w = [49,48,3]
sage: R.<x,y,z> = PolynomialRing(QQ)
sage: qu = R.ideal(x*y)
sage: p = ((1+x^49)*(1+y^48)*(1+z^3)).reduce(qu)
sage: p

\[ x^{49}z^3+y^{48}z^3+x^{49}+y^{48}+z^3+1 \] From this polynomial, we choose the monomials whose degree sums are not less than the quota:

sage: wcs = [m.degrees() for m in p.monomials() if sum(m.degrees()) >= q]
sage: wcs

\[ [(49,0,3),(0,48,3)] \] To determine the initial (not normalized) Banzhaf values, we traverse this list of tuples, checking in each tuple if the value is critical to its coalition:

sage: beta = [0,0,0]
sage: for c in wcs:
	 sc = sum(c)
	 for i in range(3):
	     if sc-c[i] < q:
		beta[i] += 1

\[ [1,1,2] \] or \[ [0.24, 0.25, 0.5] \] for normalized values.


Of course the above can be generalized to any number of voters, and any number of quarrelling pairs. For example, consider the Australian Senate, of which the current party representation is:

Alignment Party Seats
Government Liberal 31
National 5
Opposition Labor 26
Crossbench Greens 9
One Nation 2
Centre Alliance 1
Lambie Alliance 1
Patrick Team 1

Although the current government consists of two separate parties, they act in effect as one party with a weight of 36. (This is known formally as the "Liberal-National Coalition".) With no quarrels then, we have \[ [39;36,26,9,2,1,1,1] \] of which the Banzhaf values have been found to be 52, 12, 12, 10, 4, 4, 4 and the Banzhaf power indices \[ 0.5306, 0.1224, 0.1224, 0.102, 0.0408, 0.0408, 0.0408. \] Suppose that the Government and Opposition are quarrelling, as are also the Greens and One Nation. (This is a reasonable assumption, given current politics and the platforms of the respective parties.)

sage: q = 39
sage: w = [36,26,9,2,1,1,1]
sage: n = len(w)
sage: R = PolynomialRing(QQ,'x',n)
sage: xs = R.gens()
sage: qu = R.ideal([xs[0]*xs[1],xs[2]*xs[3]])
sage: pr = prod(1+xs[j]^w[j] for j in range(n)).reduce(qu)

As before, we extract the degrees of those monomials whose sum is at least \(q\), and determine which party in each is critical:

sage: wcs = [m.degrees() for m in pr.monomials() if sum(m.degrees()) >= q]
sage: beta = [0]*n
sage: for c in wcs:
	  sc = sum(c)
	  for i in range(n):
	      if sc-c[i] < q:
		  beta[i] += 1
sage: beta


which can be normalized to

\([0.4571, 0.0, 0.2, 0.1714, 0.0571, 0.0571, 0.0571]\).

The remarkable result is that Labor - the Opposition party, with the second largest number of seats in the Senate - loses all its power! This means that Labor cannot afford to be in perpetual quarrel with the government parties.

A simple program

And of course, all of the above can be written into a simple Sage program:

def qbanzhaf(q,w,r):
    n = len(w)
    R = PolynomialRing(QQ,'x',n)
    xs = R.gens()
    qu = R.ideal([xs[y[0]]*xs[y[1]] for y in r])
    pr = prod(1+xs[j]^w[j] for j in range(n)).reduce(qu)
    wcs = [m.degrees() for m in pr.monomials() if sum(m.degrees()) >= q]
    beta = [0]*n
    for c in wcs:
	sc = sum(c)
	for i in range(n):
	    if sc-c[i] < q:
		beta[i] += 1

All the quarrelling pairs are given as a list, so that the Australian Senate computation could be entered as

sage: qbanzahf(q,w,[[0,1],[2,3]])

Voting power (6): Polynomial rings

As we have seen previously, it's possible to compute power indices by means of polynomial generating functions. We shall extend previous examples to include the Deegan-Packel index, in a way somewhat different to that of Alonso-Meijide et al (see previous post for reference).

Again, suppose we consider the voting game

\[ [30;28,16,5,4,3,3] \]

What we'll do here though, rather than using just one variable, we'll have a variable for each voter. We'll use Sage for this, as it's open source, and provides a very rich environment for computing with polynomial rings.

We first create the polynomial

\[ p = \prod_{k=0}^5(1+x_k^{w_k}) \]

where \(w_i\) are the weights given above. We are using the Python (and hence Sage) convention that indices are numbered startng at zero.

sage: q = 30
sage: w = [28,16,5,4,3,3]
sage: n = len(w)
sage: R = PolynomialRing(QQ,'x',n)
sage: xs = R.gens()
sage: pr = prod(1+xs[i]^w[1] for i in range(n)

Now we can extract all the monomials, and consider only those for which the degree sum is not less than the quota \(q\):

sage: pm = pr.monomials()
sage: pw = [x for x in pr if sum(x.degrees()) >= q]
sage: pw = pw[::-1]; pw

\begin{aligned} &\left[x_{1}^{16} x_{2}^{5} x_{3}^{4} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{5}^{3},\; x_{0}^{28} x_{4}^{3},\; x_{0}^{28} x_{3}^{4},\; x_{0}^{28} x_{2}^{5},\; x_{0}^{28} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{3}^{4} x_{5}^{3},\; x_{0}^{28} x_{3}^{4} x_{4}^{3},\right.\cr &\qquad x_{0}^{28} x_{2}^{5} x_{5}^{3},\; x_{0}^{28} x_{2}^{5} x_{4}^{3},\; x_{0}^{28} x_{2}^{5} x_{3}^{4},\; x_{0}^{28} x_{3}^{4} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{2}^{5} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{2}^{5} x_{3}^{4} x_{5}^{3},\cr &\qquad x_{0}^{28} x_{2}^{5} x_{3}^{4} x_{4}^{3},\; x_{0}^{28} x_{2}^{5} x_{3}^{4} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{1}^{16},\; x_{0}^{28} x_{1}^{16} x_{5}^{3},\; x_{0}^{28} x_{1}^{16} x_{4}^{3},\; x_{0}^{28} x_{1}^{16} x_{3}^{4},\cr &\qquad x_{0}^{28} x_{1}^{16} x_{2}^{5},\; x_{0}^{28} x_{1}^{16} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{1}^{16} x_{3}^{4} x_{5}^{3},\; x_{0}^{28} x_{1}^{16} x_{3}^{4} x_{4}^{3},\; x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{5}^{3},\cr &\qquad x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{4}^{3},\; x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{3}^{4},\; x_{0}^{28} x_{1}^{16} x_{3}^{4} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{4}^{3} x_{5}^{3},\cr &\qquad \left. x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{3}^{4} x_{5}^{3},\; x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{3}^{4} x_{4}^{3},\; x_{0}^{28} x_{1}^{16} x_{2}^{5} x_{3}^{4} x_{4}^{3} x_{5}^{3}\right] \end{aligned}

As we did with subsets, we can winnow out the monomials which are multiples of others, by writing a recursive function:

def mwc(q,t,p):
    if len(p)==0:
	for x in p[1:]:
	    if p[0].divides(x):

We can apply it:

sage: pw2 = pw.copy()
sage: mwc1 = mwc(q,[],pw2)
sage: mwc1

\[ \left[x_{1}^{16} x_{2}^{5} x_{3}^{4} x_{4}^{3} x_{5}^{3},\; x_{0}^{28} x_{5}^{3},\; x_{0}^{28} x_{4}^{3},\; x_{0}^{28} x_{3}^{4},\; x_{0}^{28} x_{2}^{5},\; x_{0}^{28} x_{1}^{16}\right] \]

Now it's just a matter of working out the indices from the variables, and this is most easily done in Python with a dictionary, with the variables as keys and their indices as values.

sage: dp = {xs[i]:0 for i in range(n)}
sage: for m in mcw1:
	  mv = m.variables()
	  nm = len(mv)
	  for x in mv:
	      dps[x] += 1/nm
sage: dp

\[ \left\{x_{0} : \frac{5}{2}, x_{1} : \frac{7}{10}, x_{2} : \frac{7}{10}, x_{3} : \frac{7}{10}, x_{4} : \frac{7}{10}, x_{5} : \frac{7}{10}\right\} \]

And this of course can be normalized:

sage: s = sum(dps.values())
      for x in dps.keys():
	  dps[x] = dps[x]/s

\[ \left\{x_{0} : \frac{5}{12}, x_{1} : \frac{7}{60}, x_{2} : \frac{7}{60}, x_{3} : \frac{7}{60}, x_{4} : \frac{7}{60}, x_{5} : \frac{7}{60}\right\} \]

And these are the values we found in the previous post, using Julia and working with subsets.

Back to Banzhaf

Recall that the Banzhaf power indices could be computed from polynomials in two steps. For voter \(i\), define

\[ f_i(x) = \prod_{j\ne i}(1+x^{w_j}) \]

and suppose that the coefficient of \(x^k\) is \(a_k\). Then define

\[ \beta_i=\sum_{j=q-w_i}^{q-1}a_j \]

and these values are normalized for the Banzhaf power indices.

Suppose we define, as for the Deegan-Packel indices,

\[ p = \prod_{k=1}^n(1+x_i^{w_i}). \]

Then reduce \(p\) modulo \(x_i^{w_i}\). This has the effect of setting \(x_i^{w_i}=0\) in \(p\) so simply removes all monomials containing \(x_i^{w_i}\). Then \(\beta_i\) is computed by adding the coefficients of all monomials whose degree sum lies between \(q-w_i\) and \(q-1\).

First define the polynomial ring and the polynomial \(p\):

sage: w = [28,16,5,4,3,3]
sage: q = 30
sage: n = len(w)
sage: R = PolynomialRing(QQbar,'x',n)
sage: xs = R.gens()
sage: p = prod(1+xs[i]^w[i] for i in range(n))

Now for the reeducation and computation:

sage: ids = [R.ideal(xs[i]^w[i]) for i in range(n)]
sage: for i in range(n):
	  pri = pr.reduce(ids[i])
	  ms = pri.monomials()
	  cs = pri.coefficients()
	  cds = [(x,sum(y.degrees())) for x,y in zip(cs,ms)]
	  beta += [sum(x for (x,y) in cds if y>=(q-w[i]) and (y<q))]
sage: print(beta)

\[ [30,2,2,2,2,2] \]

from which the Banzhaf power indices can be computed as

\[ \left[\frac{3}{4},\;\frac{1}{20},\;\frac{1}{20},\;\frac{1}{20},\;\frac{1}{20},\;\frac{1}{20}\right]. \]

Note that this is of course unnecessarily clumsy, as the Banzhaf indices can be easily and readily computed using univariate polynomials. We may thus consider this approach as a proof of concept, rather than a genuine alternative.

In Sage, Banzhaf indices can be computed with polynomials very neatly, given weights and a quota:

sage: def banzhaf(q,w):
	  n = len(w)
	  beta = []
	  for i in range(n):
	      pp = prod(1+x^w[j] for j in range(n) if j != i)
	      cc = pp.list()
	      beta += [sum(cc[j] for j in range(len(cc)) if j>=(q-w[i]) and (j<q))]

These values can then be easily normalized to produce the Banzhaf power indices.

Voting power (5): The Deegan-Packel and Holler power indices

We have explored the Banzhaf and Shapley-Shubik power indices, which both consider the ways in which any voter can be pivotal, or critical, or necessary, to a winning coalition.

A more recent power index, which takes a different approach, was defined by Deegan and Packel in 1976, and considers only minimal winning coalitions. A winning coalition \(S\) is minimal if every member of \(S\) is critical to it, or alternatively, that \(S\) does not contain any winning coalition as a proper subset. It is easy to see that these are equivalent, for if \(i\in S\) was not critical, then \(S-\{i\}\) would be a winning coalition which is a proper subset.

Given \(N=\{1,2,\ldots,n\}\), let \(W\subset 2^N\) be the set of all minimal winning coalitions, and let \(W_i\subset W\) be those that contain the voter \(i\). Then we define \[ DP_i=\sum_{S\in W_i}\frac{1}{|S|} \] where \(|S|\) is the cardinality of \(S\).

For example, consider the voting game \[ [16;10,9,6,5] \] Using the indicies 1 to 4 for the voters, the minimal winning coalitions are \[ \{1,2\},\{1,3\},\{2,3,4\}. \] and hence

\begin{align*} DP_1 &= \frac{1}{2}+\frac{1}{2} = 1 \cr DP_2 &= \frac{1}{2}+\frac{1}{3} = \frac{5}{6} \cr DP_3 &= \frac{1}{2}+\frac{1}{3} = \frac{5}{6} \cr DP_4 &= \frac{1}{3} \end{align*}

and these values can be normalized so that their sum is unity: \[ [1/3,5/18,5/18,1/9]\approx [0.3333, 0.2778,0.2778, 0.1111]. \] In comparison, both the Banzhaf and Shapley-Shubik indices return \[ [0.4167, 0.25, 0.25, 0.0833]. \]

Allied to the Deegan-Packel index is Holler's public-good index, also called the Holler-Packel index, defined as \[ H_i=\frac{|W_i|}{\sum_{j\in N}|W_j|}. \] In other words, this index first counts the number of minimal wining coalitions that contain \(i\), and then normalises those values for the sum to be unity.

In the example above, we have voters 1, 2, 3 and 4 being members of 2, 2, 2, 1 minimal winning coalitions respectively, and so the power indices are

\[ [2/7, 2/7, 2/7, 1/7] \approx [0.2857,0.2857,0.2857,0.1429]. \]

Implementation (1): Python

We can implement the Deegan-Packel in Python, either by using itertools, or simply rolling our own little functions:

def all_subsets(X):
    T = [[]]
    for x in X:
	T += [t+[x] for t in T]

def is_subset(A,B):
    out = True
    for a in A:
	if B.count(a) == 0:
	    out = False

def mwc(q,S,T):
    if len(T) == 0:
	if sum(T[0]) >= q:
	    S += [T[0]]
	    temp = T.copy()
	    for t in temp:
		if is_subset(T[0],t):
def prod(A):
    m = len(A)
    n = len(A[0])
    p = 1
    for i in range(m):
	for j in range(n):
	    p *= A[i][j]

Of the three functions above, the first simply returns all subsets (as lists); the second tests whether one list is a subset of another (treating both as sets), and the final routine returns all minimal winning coalitions using an elementary recursion. The function starts off considering all subsets of the set of weights, and goes through the list until it finds one whose sum is at least equal to the quota. Then it removes all other subsets which are supersets of the found one. The calls the routine on this smaller list.

For example:

>>> wts = [10,9,6,5]
>>> T = all_subsets(wts)[1:]
>>> q = 16
>>> mwc(q,[],T)

[[10, 9], [10, 6], [9, 6, 5]]

It is an easy matter now to obtain the Deegan-Packel indices:

def dpi(q,wts):
    m = mwc(q,[],all_subsets(wts)[1:])
    dp = []
    for w in wts:
	d = 0
	for x in m:
	    d += x.count(w)/len(x)
	dp += [d]

And as an example:

>>> wts = [10,9,6,5]
>>> q = 16
>>> dpi(q,wts)

[1.0, 0.8333333333333333, 0.8333333333333333, 0.3333333333333333]

and of course these can be normalized so that their sum is unity.

Implementation (2): Julia

Now we'll use Julia, and its Combinatorics library. Because Julia implements JIT compiling, its speed is generally faster than that of Python.

Just to be different, we'll develop two functions, one which first produces all winning coalitions, and the second which winnows that set to just the minimal winning coalitions:

using Combinatorics

function wcs(q,w)
    S = powerset(w)
    out = []
    for s in S
	if sum(s) >= q

function mwc(q,out,wc)
    if isempty(wc)
	f = wc[1]
	temp = []   # temp finds all supersets of f = wc[1]
	for w in wc
	    if issubset(f,w)

Now we can try it out:

julia> q = 16;
julia> w = [10,9,6,5];
julia> cs = wcs(q,w)

7-element Array{Any,1}:
 [10, 9]
 [10, 6]
 [10, 9, 6]
 [10, 9, 5]
 [10, 6, 5]
 [9, 6, 5]
 [10, 9, 6, 5]

julia> mwc(q,[],cs)

3-element Array{Any,1}:
 [10, 9]
 [10, 6]
 [9, 6, 5]

Repeated elements

Although both Julia and Python work with multisets, this becomes tricky in terms of the power indices. A simple expedient is to change repeated indices by small amounts so that they are all different, but that the sum will not affect any quota. If we have for example four indices which are the same, we can add 0.1, 0.2, 0.3 to three of them.

So we consider the example

\[ [30;28,16,5,4,3,3] \]

given as an example of a polynomial method in the article "Computation of several power indices by generating functions" by J. M. Alonso-Meijide et al; you can find the article on Science Direct.


julia> q = 30;
julia> w = [28,16,5,4,3.1,3];
julia> cs = wcs(q,w);
julia> ms = mwc(q,[],cs)

6-element Array{Any,1}:
 [28.0, 16.0]
 [28.0, 5.0]
 [28.0, 4.0]
 [28.0, 3.1]
 [28.0, 3.0]
 [16.0, 5.0, 4.0, 3.1, 3.0]

From here it's an easy matter to compute the Deegan-Packel power indices:

julia> dp = []
       for i = 1:6
	   x = 0//1
	   for m in mw
	       x = x + count(j -> j == w[i],m)//length(m)
	   append!(dp, [x])

julia> print(dp)

Any[5//2, 7//10, 7//10, 7//10, 7//10, 7//10]

julia> print([x/sum(dp) for x in dp])

Rational{Int64}[5//12, 7//60, 7//60, 7//60, 7//60, 7//60]

and these are the values obtained by the authors (but with a lot less work).

Three-dimensional impossible CAD

Recently I friend and I wrote a semi-serious paper called "The geometry of impossible objects" to be delivered at a mathematics technology conference. The reviewer was not hugely complimentary, saying that there was nothing new in the paper. Well, maybe not, but we had fun pulling together some information about impossible shapes and how to draw them.

You can see some of our programs hosted at

But my interest here is to look at the three-dimensional versions of such objects; how to use a programmable CAD to create 3d objects which, from a particular perspective, look impossible.

Here's an example in Perth, Western Australia:

(I think the sides are a bit too thin to give a proper feeling for the shape, though.)

Another image, neatly showing how such a Penrose triangle can be created, is this of a "Unique vase":

With this in mind, we can easily create such a shape in any 3D CAD program we like. I've spoken before about programmable CAD but then using OpenJSCAD. However, I decided to switch to Python's CADquery, because it offered the option of orthographic viewing instead of just perspective viewing. This meant that all objects retained their size even if further away from the viewpoint, which made cutting out from the foreground objects much easier.

For making shapes available online, the best solution I found was the online and free viewer and widget provided by LAI4D and its online Laboratory. This provides not only a nice environment to explore 3D shapes, but the facility to export the shape as an html widget for viewing in an iframe.

Penrose triangle

In fact, this first shape is created with OpenJSCAD which I was using before discovering Cadquery. The only problem with OpenJSCAD - of which there's a new version called simply JSCAD - is that it doesn't seem to allow orthographic viewing. This is preferable to perspective viewing for impossible figures, as it's much easier to work out how to line everything up as necessary.

You can see the version of my OpenJSCAD Penrose triangle by opening up up this link:

This should open up OpenJSCAD with the shape in it. You should be able to move it around until you get the impossible effect. You'll see the JavaScript file that produced it on the right.

This shows the starting shape and what you should aim to produce:

And here is the same construction in CadQuery, but working with an orthographic projection. As before, we create the cut-away beam as a two-dimensional shape given by the coordinates of its vertices, then "extrude" it perpendicular to its plane. The remaining beam is a rectangular box.

pts = [

L_shape = cq.Workplane("front").polyline(pts).close().extrude(10)
upright = cq.Workplane("front").transformed(offset=(5,45,-15)).box(10,10,50)

pt = Assembly(
    Part(L_shape, "L shape"),
    Part(upright, "Upright"),
"Penrose triangle")

exportSTL(pt,\"penrose_triangle.stl\", linear_deflection=0.01, angular_deflection=0.1)


The saved STL file can then be imported into the LAI4D online widget, and saved as an iframe for viewing.

The Reutersvärd triangle, which is named for the Swedish graphics designer Oscar Reutersvärd seems to predate the Penrose tirangle, and is a thing of great beauty.

Anyway, to see it in LAI4d, go to:

STL Reutersvard triangle here

There are many hundreds of clever and witty impossible figures at the Impossible Figure Library, which I strongly recommend you visit!

Penrose Staircase

There's a nice video on youtube here which I used as the basis for my model, but since then I've disovered other CAD models, for example on Anyway, the model is made up of square prisms of different heights and bases, with a final shape jutting out at the top:

The Python code is :

transforms = [(0,0,0),(0,-1,0),(0,-2,5),(0,-3,5),(1,-3,5),(2,-3,5),(3,-3,5),(4,-3,5),
vscale = 0.2
vtransforms = [(t[0],t[1],t[2]\*vscale) for t in transforms]

heights = [11,12,8,9,10,11,12,13,14,15,16,2]

boxes = [cq.Workplane("front).transformed(\\
    offset = vtransforms[i]).box(1,1,heights[i]\*vscale,centered=(True,True,False)) for i in range(11)]\n",

pts1 = [(0,0),(0,1),(-1,1),(-1,0.2),(-0.2,0.2)]
box12 = cq.Workplane("front").polyline(pts1).close().extrude(0.4)

pts2 = [(0.2,0),(0.6,0),(0.2,0.4)]
prism = cq.Workplane("YZ").transformed(offset=(0,0,-1)).polyline(pts2).close().extrude(0.8)
shape = box12.cut(prism).translate((2.5,-1.5,4))

boxes += [shape]

ps = Assembly(
    [Part(boxes[i],"Box "+str(i),"#daa520") for i in range(12)],
    "Penrose staircase")

exportSTL(ps,\\"penrose\_staircase2.stl\\", linear\_deflection=0.01, angular\_deflection=0.1)

Below is an iframe containing the LAI4D widget, it may take a few seconds to load, and you may need to refresh the page:

If you get the shape into a state from which you can't get the stair affect working, click on the little circular arrows in the bottom left, which will reset the object to its initial orientation.

Impossible box

This was just a matter of creating the edges, finding a nice view, and using some trial and error to slice through the front most beams so that it seemed that hhe read beams were at the front.

Here is another widget, again you may have to wait for it to load.

Voting power (4): Speeding up the computation

Introduction and recapitulation

Recall from previous posts that we have considered two power indices for computing the power of a voter in a weighted system; that is, the ability of a voter to influence the outcome of a vote. Such systems occur when the voting body is made up of a number of "blocs": these may be political parties, countries, states, or any other groupings of people, and it is assumed that within every bloc all members will vote the same way. Indeed, in some legislatures, voting according to "the party line" is a requirement of membership.

Examples include the American Electoral College, in which the "voters" are the states; the Australian Senate, the European Union, the International Monetary Fund, and many others.

Given a set \(N=\{1,2,\ldots,n\}\) of voters and their weights \(w_i\), and a quota \(q\) required to pass any motion, we have represented this as

\[ [q;w_1,w_2,\ldots,w_n] \]

and we define a winning coalition as any subset \(S\subset N\) of voters for which

\[ \sum_{i\in S}w_i\ge q. \]

It is convenient to define a characteristic function \(v\) on all subsets of \(N\) so that \(v(S)=1\) if \(S\) is a winning coalition, and 0 otherwise. Given a winning coalition \(S\), a voter \(i\in S\) is necessary if \(v(S)-v(S-\{i\})=1\).

For any voter \(i\), the number of winning coalitions for which that voter is necessary is

\[ \beta_i = \sum_S\left(v(S)-v(S-\{i\})\right) \]

where the sum is taken over all winning coalitions. Then the Banzhaf power index is this value normalized so that the sum of all indices is unity:

\[ B_i=\frac{\beta_i}{\sum_{i=1}^n \beta_i}. \]

The Shapley-Shubik power index is defined by considering all permutations \(p\) of \(N\). Taken cumulative sums from the left, a voter \(p_k\) is pivotal if this is the first voter for which te cumulative sum is at least \(q\). For each voter \(i\), let \(\sigma_i\) be the number of permutations for which \(i\) is pivotal. Then

\[ S_i=\frac{1}{n!}\sigma_i \]

which ensures that the sum of all indices is unity.

Although these two indices seem very different, there is in fact a deep connection. Consider any permutation \(p\) and suppose that \(i=p_k\) is the pivotal voter, This voter will also be pivotal in all permutations for which \(i=p_k\) and the values to the right and left of \(i\) stay there: there will be \((k-1)!(n-k)!\) such permutations. However, we can consider the values up to and including \(i=p_k\) as a winning coalition for which \(i\) is necessary, which means we can write

\[ S_i=\sum_S\frac{(n-k)!(k-1)!}{n!}\left(v(S)-v(S-\{i\})\right) \]

which can be compared to the Banzhaf index above as being similar and with a different weighting function. Note that the above expression can be written as

\[ S_i=\sum_S\left(k{\dbinom n k}\right)^{-1}\left(v(S)-v(S-\{i\})\right) \]

which uses smaller numbers. For example, if \(n=50\) then \(n!\approx 3.0149\times 10^{64}\) but the largest binomial value is only approximately \(1.264\times 10^{14}\).

Computing with polynomials

We have also seen that if we define

\[ f_i(x) = \prod_{m\ne i}(1+x^{w_m}) \]


\[ \beta_i = \sum_{j=q-w_i}^{q-1}a_j \]

where \(a_j\) is the coefficient of \(x_j\) in \(f_i(x)\).

Similarly, if

\[ f_i(x,y) = \prod_{m\ne i}(1+yx^{w_m}) \]


\[ S_i=\sum_{k=0}^{n-1}\frac{k!(n-1-k)!}{n!}\sum_{j=q-w_i}^{q-1}c_{jk} \]

where \(c_{jk}\) is the coefficient of \(x^jy^k\) in the expansion of \(f_i(x,y)\).

In a previous post we have shown how to implement these in Python, using the Sympy library. However, Python can be slow, and using Cython is not trivial. We thus here show how to use Julia and its Polynomials package.

Using Julia for speed

The Banzhaf power indices can be computed almost directly from the definition:

using Polynomials

function px(n)

function banzhaf(q,w)
    n = length(w)
    inds = vec(zeros(Int64,1,n))
    for i in 1:n
	p = Polynomial([1])
	for j in 1:i-1
	    p = p * px(v[j])
	for j in i+1:n
	    p = p*px(v[j])
	inds[i] = sum(coeffs(p)[k] for k in q-w[i]+1:q)

This will actually return the vector of \(\beta_i\) values which can then be easily normalized.

The function px is a "helper function" that simply returns the polynomial \(x^n\).

For the Shapley-Shubik indices, the situation is a bit trickier. There are indeed some Julia libraries for multivariate polynomials, but they seem (at the time of writing) to be not fully functional. However, consider the polynomials

\[ f_i(x,y)=\prod_{m\ne i}(1+yx^{w_m}) \]

from above. We can consider this as a polynomial of degree \(n-1\) in \(y\), whose coefficients are polynomials in \(x\). So if

\[ f_i(x,y) = 1 + p_1(x)y + p_2(x)y^2 +\cdots + p_{n-1}(x)y^{n-1} \]

then \(f_i(x,y)\) can be represented as a vector of polynomials

\[ [1,p_1(x),p_2(x),\ldots,p_{n-1}(x)]. \]

With this representation, we need to perform a multiplication by \(1+yx^p\) and also determine coefficients.

Multiplication is easy, noting at once that \(1+yx^p\) is linear in \(y\), and so we use the expansion of the product

\[ (1+ay)(1 + b_1y + b_2y^2 + \cdots + b_{n-1}y^{n-1}) \]


\[ 1 + (a+b_1)y + (ab_1+b_2)y^2 + \cdots + (ab_{n-2}+b_{n-1})y^{n-1} + ab_{n-1}y^n. \]

This can be readily programmed as:

function mulp1(n,p)
    p0 = Polynonial(0)
    px = Polynomial([0,1])
    c1 = cat(p,p0,dims=1)
    c2 = cat([p0,p,dims=1)
    return(c1 + px^n .* c2)

The first two lines aren't really necessary, but they do make the procedure easier to read.

And we need a little program for extracting coefficients, with a result of zero if the power is greater than the degree of the polynomial (Julia's coeff function simply produces a list of all the coefficients in the polynomial.)

function one_coeff(p,n)
    d = degree(p)
    pc = coeffs(p)
    if n <= d

Now we can put all of this together in a function very similar to the Python function for computing the Shapley-Shubik indices with polynomials:

function shapley(q,w)
    n = length(w)
    inds = vec(zeros(Float64,1,n))
    for i in 1:n
	p = Polynomial(1)
	for j in 1:i-1
	    p = mulp1(w[j],p)
	for j in i+1:n
	    p = mulp1(w[j],p)
	B = vec(zeros(Float64,1,n))
	for j in 1:n
	    B[j] = sum(one_coeff(p[j],k) for k in q-w[i]:q-1)
	inds[i] = sum(B[j+1]/binomial(n,j)/(n-j) for j in 0:n-1)

And a quick test (with timing) of the powers of the states in the Electoral College; here ecv is the number of electors of all the states, in alphabetical order (the first states are Alabama, Alaska, Arizona, and the last states are West Virginia, Wisconsin, Washington):

ecv = [9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, 6, 8, 8, 4, 10, 11, 16, 10, 6,
10, 3, 5, 6, 4, 14, 5, 29, 15, 3, 18, 7, 7, 20, 4, 9, 3, 11, 38, 6, 3, 13, 12, 5, 10, 3]
@time(s = shapley(270,ecv));
0.722626 seconds (605.50 k allocations: 713.619 MiB, 7.95% gc time)

This is running on a Lenovo X1 Carbon, 3rd generation, using Julia 1.5.3. The operating system is a very recently upgraded version of Arch Linux, and currently using kernel 5.10.3.

Voting power (3): The American swing states

As we all know, American Presidential elections are done with a two-stage process: first the public votes, and then the Electoral College votes. It is the Electoral College that actually votes for the President; but they vote (in their respective states) in accordance with the plurality determined by the public vote. This unusual system was devised by the Founding Fathers as a compromise between mob rule and autocracy, of which both they were determined to guard against. The Electoral College is not now an independent body: in all states but two all electoral college votes are given to the winner in that state. This means that the Electoral College may "amplify" the public vote; or it may return a vote which differs from the public vote, in that a candidate may receive a majority of public votes, and yet still lose the Electoral College vote. This means that there are periodic calls for the Electoral College to be disbanded, but in reality that seems unlikely. And in fact as far back as 1834 the then President, Andrew Jackson, was demanding its disbanding: a President, according to Jackson, should be a "man of the people" and hence elected by the people, rather than by an elite "College". This is one of the few instances where Jackson didn't get his way.

The initial idea of the Electoral College was that voters in their respective states would vote for Electors who would best represent their interests in a Presidential vote: these Electors were supposed to be wise and understanding men who could be relied on to vote in a principled manner. Article ii, Section 1 of the USA Constitution describes how this was to be done. When it became clear that electors were not in fact acting impartially, but only at the behest of the voters, some of the Founding Fathers were horrified. And like so many political institutions the world over, the Electoral College does not now live up to its original expectations, but is also too entrenched in the political process to be removed.

The purpose of this post is to determine the voting power of the "swing states", in which most of a Presidential campaign is conducted. It has been estimated that something like 75% of Americans are ignored in a campaign; this might be true, but that's just plain politics. For example California (with 55 Electoral College cotes) is so likely to return a Democrat candidate that it may be considered a "safe state" (at least, for the Democrats); it would be a waste of time for a candidate to spend too much time there. Instead, a candidate should stump in Florida, for example, which is considered a swing state, and may go either way: we have seen how close votes in Florida can be.

For discussion about measuring voting power using power indices check out the previous two blog posts.

The American Electoral College

According to the excellent site 270 to win and their very useful election histories, we can determine which states have voted "the same" for any election post 1964. Taking 2000 as a reasonable starting point, we have the following results. Some states have voted the same in every election from 2000 onwards; others have not.

Safe Democrat Safe Republican Swing
California 55 Alabama 9 Colorado 9
Connecticut 7 Alaska 3 Florida 29
Delaware 3 Arizona 11 Idaho 4
DC 3 Arkansas 6 Indiana 11
Hawaii 4 Georgia 16 Iowa 6
Illinois 20 Kansas 6 Michigan 16
Maine 3 Kentucky 8 Nevada 6
Maryland 10 Louisiana 8 New Hampshire 4
Massachusetts 11 Mississippi 6 New Mexico 5
Minnesota 10 Missouri 10 North Carolina 15
New Jersey 14 Montana 3 Ohio 18
New York 29 Nebraska 4 Pennsylvania 20
Oregon 7 North Dakota 3 Virginia 13
Rhode Island 4 Oklahoma 7 Wisconsin 10
Vermont 3 South Carolina 9 Maine CD 2 1
Washington 12 South Dakota 3 Nebraska CD 2 1
Tennessee 11
Texas 38
Utah 6
West Virginia 5
Wyoming 3
195 175 168

From the table, we see that since 2000, we can count on 195 "safe" Electoral College votes for the Democrats, and 175 "safe" Electoral College votes for the Republicans. Thus of the 168 undecided votes, for a Democrat win the party must obtain at least 75 votes, and for a Republican win, the party needs to amass 95 votes.

Note that according to the site, of the votes in Maine and Nebraska, all but one are considered safe - remember that these are the only two states to apportion votes by Congressional district. Of Maine's 4 Electoral College votes, 3 are safe Democrat and one is a swing vote; for Nebraska, 4 of its votes are safe Republican, and 1 is a swing vote.

All this means is that a Democrat candidate should be campaigning considering the power given by

\[ [75; 9,29,4,11,6,16,6,4,5,15,18,20,13,10,1,1] \]

and a Republican candidate will be working with

\[ [95; 9,29,4,11,6,16,6,4,5,15,18,20,13,10,1,1] \]

A Democrat campaign

So let's imagine a Democrat candidate who wishes to maximize the efforts of the campaign by concentrating more on states with the greatest power to influence the election.

In [1]: q = 75; w = [9,29,4,11,6,16,6,4,5,15,18,20,13,10,1,1]

In [2]: b = banzhaf(q,w); bn = [sy.Float(x/sum(b)) for x in b]; [sy.N(x,4) for x
in bn]
Out[2]: [0.05192, 0.1867, 0.02271, 0.06426, 0.03478, 0.09467, 0.03478, 0.02271,
0.02870, 0.08801, 0.1060, 0.1196, 0.07515, 0.05800, 0.005994, 0.005994]

In [3]: s = shapley(q,w); [sy.N(x/sum(s),4) for x in s]
out[3]: [0.05102, 0.1902, 0.02188, 0.06375, 0.03367, 0.09531, 0.03367, 0.02188,
0.02770, 0.08833, 0.1073, 0.1217, 0.07506, 0.05723, 0.005662, 0.005662]

The values are not the same, but they are in fact quite close, and in this case they are comparable to the numbers of Electoral votes in each state. To compare values, it will be most efficient to set up a Dataframe using Python's data analysis library pandas. We shall also convert the Banzhaf and Shapley-Shubik values from sympy floats int ordinary python floats.

In [4]: import pandas as pd
In [5]: bf = [float(x) for x in bn]
In [6]: sf = [float(x) for x in s]

In [5]: d = {"States":states, "EC Votes":ec_votes, "Banzhaf indices":bf, "Shapley-Shubik indices:sf}

In [6]: swings = pd.DataFrame(d)

In [7]: swings.sort_values(by = "EC Votes", ascending = False)
In [8]: ssings.sort_values(by = "Banzhaf indices", ascending = False)
In [9]: swings.sort_values(by = "Shapley-Shubik indices", ascending = False)

We won't show the results of the last three expressions, but they all give rise to the same ordering.

We can still get some information by not looking so much at the values of the power indices, but their relative values to the number of Electoral votes. To do this we need a new column which normalizes the Electoral votes so that their sum is unity:

In [10]: swings["Normalized EC Votes"] = swings["EC Votes"]/168.0
In [11]: swings["Ratio B to N"] = swings["Banzhaf indices"]/swings["Normalized EC Votes"]
In [12]: swings["Ratio S to N"] = swings_d["Shapley-Shubik indices"]/swings["Normalized EC Votes"]
In [13]: swings.sort_values(by = "EC Votes", ascending = False)

The following table shows the result.

States EC Votes Banzhaf indices Shapley-Shubik indices EC Votes Normalized Ratio B to N Ratio S to N
Florida 29 0.186702 0.190164 0.172619 1.081585 1.101640
Pennsylvania 20 0.119575 0.121732 0.119048 1.004430 1.022552
Ohio 18 0.106034 0.107289 0.107143 0.989655 1.001360
Michigan 16 0.094671 0.095309 0.095238 0.994049 1.000743
North Carolina 15 0.088014 0.088330 0.089286 0.985754 0.989293
Virginia 13 0.075149 0.075057 0.077381 0.971155 0.969966
Indiana 11 0.064261 0.063752 0.065476 0.981447 0.973660
Wisconsin 10 0.058004 0.057227 0.059524 0.974471 0.961422
Colorado 9 0.051922 0.051017 0.053571 0.969215 0.952318
Iowa 6 0.034777 0.033670 0.035714 0.973770 0.942774
Nevada 6 0.034777 0.033670 0.035714 0.973770 0.942774
New Mexico 5 0.028695 0.027704 0.029762 0.964169 0.930862
Idaho 4 0.022714 0.021877 0.023810 0.953972 0.918823
New Hampshire 4 0.022714 0.021877 0.023810 0.953972 0.918823
Maine CD 2 1 0.005994 0.005662 0.005952 1.007058 0.951282
Nebraska CD 2 1 0.005994 0.005662 0.005952 1.007058 0.951282

We can thus infer that a Democrat candidate should indeed campaign most vigorously in the states with the largest number of Electoral votes. This might seem to be obvious, but as we have shown in previous posts, there is not always a correlation between voting weight and voting power, and that a voter with a low weight might end up having considerable power.

A Republican candidate

Going through all of the above, but with a quota of 95, produces in the end the following:

States EC Votes Banzhaf indices Shapley-Shubik indices EC Votes Normalized Ratio B to N Ratio S to N
Florida 29 0.186024 0.190086 0.172619 1.077658 1.101190
Pennsylvania 20 0.119789 0.121871 0.119048 1.006230 1.023718
Ohio 18 0.106258 0.107258 0.107143 0.991741 1.001075
Michigan 16 0.094453 0.095156 0.095238 0.991756 0.999140
North Carolina 15 0.088106 0.088410 0.089286 0.986789 0.990194
Virginia 13 0.075362 0.074940 0.077381 0.973906 0.968460
Indiana 11 0.064064 0.063568 0.065476 0.978439 0.970862
Wisconsin 10 0.058073 0.057394 0.059524 0.975628 0.964219
Colorado 9 0.052133 0.051209 0.053571 0.973140 0.955892
Iowa 6 0.034692 0.033612 0.035714 0.971363 0.941142
Nevada 6 0.034692 0.033612 0.035714 0.971363 0.941142
New Mexico 5 0.028776 0.027715 0.029762 0.966885 0.931235
Idaho 4 0.022912 0.021963 0.023810 0.962300 0.922436
New Hampshire 4 0.022912 0.021963 0.023810 0.962300 0.922436
Maine CD 2 1 0.005877 0.005621 0.005952 0.987357 0.944289
Nebraska CD 1 0.005877 0.005621 0.005952 0.987357 0.944289

and we see a similar result as for the Democrat version, an obvious difference though being that Michigan has decreased its relative power, at least as measured using the Shapley-Shubik index. \

Voting power (2): computation

Naive implementation of Banzhaf power indices

As we saw in the previous post, computation of the power indices can become unwieldy as the number of voters increases. However, we can very simply write a program to compute the Banzhaf power indices simply by looping over all subsets of the weights:

def banzhaf1(q,w):
    n = len(w)
    inds = [0]*n
    P = [[]]             # these next three lines creates the powerset of 0,1,...,(n-1)
    for i in range(n):
	P += [p+[i] for p in P]
    for S in P[1:]:
	T = [w[s] for s in S]
	if sum(T) >= q:
	    for s in S:
		T = [t for t in S if t != s]
		if sum(w[j] for j in T)<q:

And we can test it:

In [1]: q = 51; w = [49,49,2]

In [2]: banzhaf(q,w)
Out[2]: [2, 2, 2]

In [3]: banzhaf(12,[4,4,4,2,2,1])
Out[3]: [10, 10, 10, 6, 6, 0

The origin of the Banzhaf power indices was when John Banzhaf explored the fairness of a local voting system where six bodies had votes 9, 9, 7, 3, 1, 1 and a majority of 16 was required to pass any motion:

In [4]: banzhaf(16,[9,9,7,3,1,1])
Out[4]: [16, 16, 16, 0, 0, 0]

This result led Banzhaf to campaign against this system as being manifestly unfair.

Implementation using polynomials

In 1976, the eminent mathematical political theorist Steven Brams, along with Paul Affuso, in an article "Power and Size: A New Paradox" (in the Journal of Theory and Decision) showed how generating functions could be used effectively to compute the Banzhaf power indices.

For example, suppose we have

\[ [6; 4,3,2,2] \]

and we wish to determine the power of the first voter. We consider the formal polynomial

\[ q_1(x) = (1+x^3)(1+x^2)(1+x^2) = 1 + 2x^2 + x^3 + x^4 + 2x^5 + x^7. \]

The coefficient of \(x^j\) is the number of ways all the other voters can combine to form a weight sum equal to \(j\). For example, there are two ways voters can join to create a sum of 5: voters 2 and 3, or voters 2 and 4. But there is only one way to create a sum of 4: with voters 3 and 4.

Then the number of ways in which voter 1 will be necessary can be found by adding all the coefficients of \(x^{6-4}\) to \(x^5\). This gives a value p(1) = 6. In general, define

\[ q_i(x) = \prod_{j\ne i}(1-x^{w_j}) = a_0 + a_1x + a_2x^2 +\cdots + a_kx^k. \]

Then it is easily shown that

\[ p(i) = \sum_{j=q-w_i}^{q-1}a_j. \]

As another example, suppose we use this method to compute Luxembourg's power in the EEC:

\[ q_6(x) = (1+x^4)^3(1+x^2)^2 = 1 + 2x^2 + 4x^4 + 6x^6 + 6x^8 + 6x^{10} + 4x^{12} + 2x^{14} + x^{16} \]

and we find \(b(6)\) by adding the coefficients of \(x^{12-w_6}\) to \(x^{12-1}\), which produces zero.

This can be readily implemented in Python, using the sympy library for symbolic computation.

import sympy as sy

def banzhaf(q,w):
    n = len(w)
    inds = []
    for i in range(n):
	p = 1
	for j in range(i):
	    p *= (1+x**w[j])
	for j in range(i+1,n):
	    p *= (1+x**w[j])
	p = p.expand()
	inds += [sum(p.coeff(x,k) for k in range(q-w[i],q))]

Computation of Shapley-Shubik index

The use of permutations will clearly be too unwieldy. Even for say 15 voters, there are \(2^{15}=32768\) subsets, but \(1,307,674,368,000\) permutations, which is already too big for enumeration (except possibly on a very fast machine, or in parallel, or using a clever algorithm).

The use of polynomials for computations in fact precedes the work of Brams and Affuso; it was published by Irwin Mann and Lloyd Shapley in 1962, in a "memorandum" called Values of Large Games IV: Evaluating the Electoral College Exactly which happily you can find as a PDF file here.

Building on some previous work, they showed that the Shapley-Shubik index corresponding to voter \(i\), could be defined as

\[ \Phi_i=\sum_{k=0}^{n-1}\frac{k!(n-1-k)!}{n!}\sum_{j=q-w_i}^{q-1}c_{jk} \]

where \(c_{jk}\) is the coefficient of \(x^jy^k\) in the expansion of

\[ f_i(x,y)=\prod_{m\ne i}(1+x^{w_m}y). \]

This of course has many similarities to the polynomial definition of the Banzhaf power index, and can be computed similarly:

def shapley(q,w):
    n = len(w)
    inds = []
    for i in range(n):
	p = 1
	for j in range(i):
	    p *= (1+y*x**w[j])
	for j in range(i+1,n):
	    p *= (1+y*x**w[j])
	p = p.expand()
	B = []
	for j in range(n):
	    pj = p.coeff(y,j)
	    B += [sum(pj.coeff(x,k) for k in range(q-w[i],q))]
	inds += [sum(sy.Float(B[j]/sy.binomial(n,j)/(n-j)) for j in range(n))]

A few simple examples

The Australian (federal) Senate consists of 76 members, of which a simple majority is required to pass a bill. It is unusual for the current elected government (which will have a majority in the lower house: the House of Representatives) also to have a majority in the Senate. Thus it is quite possible for a party with small numbers to wield significant power.

A case in point is that of the "Australian Democrats" party, founded in 1977 by a disaffected ex-Liberal politician called Don Chipp, with the uniquely Australian slogan "Keep the Bastards Honest". For nearly two decades they were a vital force in Australian politics; they have pretty much lost all power they once had, although the party still exists.

Here's a little table showing the Senate composition in various years:

Party 1985 2000 2020
Government 34 35 36
Opposition 33 29 26
Democrats 7 9
Independent 1 1 1
Nuclear Disarmament 1
Greens 1 9
One Nation 1 2
Centre Alliance 1
Lambie Party 1

This composition in 1985 can be described as

\[ [39; 34,33,7,1,1]. \]

And now:

In [1]: b = banzhaf(39,[34,33,7,1,1])
	[sy.N(x/sum(b),4) for x in b]

Out[1]: [0.3333, 0.3333, 0.3333, 0, 0]

In [2]: s = shapley(39,[34,33,7,1,1])
	[sy.N(x,4) for x in s]

Out[2]: [0.3333, 0.3333, 0.3333, 0, 0]

Here we see that both power indices give the same result: that the Democrats had equal power in the Senate to the two major parties, and the other two senate members had no power at all.

In 2000, we have \([39;35,29,9,1,1,1]\) and:

In [1]: b = banzhaf(39,[31,29,9,1,1,1])
	[sy.N(x/sum(b),4) for x in b]

Out[1]: [0.34, 0.3, 0.3, 0.02, 0.02, 0.02]

In [2]: s = shapley(39,[31,29,9,1,1,1])
	[sy.N(x,4) for x in s]

Out[2]: [0.35, 0.3, 0.3, 0.01667, 0.01667, 0.01667]

We see here that the two power indices give two slightly different results, but in each case the power of the Democrats was equal to that of the opposition, and this time the parties with single members had real (if small) power.

By 2020 the Democrats have disappeared as a political force, their place being more-or-less taken (at least numerically) by the Greens:

In [1]: b = banzhaf(39,[36,26,9,2,1,1,1]
	[sy.N(x/sum(b),4) for x in b]

Out[1]: [0.5306, 0.1224, 0.1224, 0.102, 0.04082, 0.04082, 0.04082]

In [2]: s = shapley(39,[36,26,9,2,1,1,1]
	[sy.N(x,4) for x in s]

Out[2]: [0.5191, 0.1357, 0.1357, 0.1024, 0.03571, 0.03571, 0.03571]

This shows a very different sort of power balance to previously: the Government has much more power in the Senate, partly to having close to a majority and partly because of the fracturing of other Senate members through a host of smaller parties. Note that the Greens, while having more members that the Democrats did in 1985, have far less power. Note also that One Nation, while only having twice as many members as the singleton parties, has far more power: 2.5 times by Banzhaf, 2.8667 times by Shapley-Shubik.

Voting power

After the 2020 American Presidential election, with the usual post-election analyses and (in this case) vast numbers of lawsuits, I started looking at the Electoral College, and trying to work out how it worked in terms of power. Although power is often conflated simply with the number of votes, that's not necessarily the case. We consider power as the ability of any state to affect the outcome of an election. Clearly a state with more votes: such as California with 55, will be more powerful than a state with fewer, for example Wyoming with 3. But often power is not directly correlated with size.

For example, imagine a version of America with just 3 states, Alpha, Beta, and Gamma, with electoral votes 49, 49, 2 respectively, and 51 votes needed to win.

The following table shows the ways that the states can join to reach (or exceed) that majority, and in each case which state is "necessary" for the win:

Winning Coalitions Votes won Necessary States
Alpha, Beta 98 Alpha, Beta
Alpha, Gamma 51 Alpha, Gamma
Beta, Gamma 51 Beta, Gamma
Alpha, Beta, Gamma 100 No single state

By "necessary states" we mean a state whose votes are necessary for the win. And in looking at that table, we see that in terms of influencing the vote, Gamma, with only 2 electors, is equally as powerful as the other two states.

To give another example, the Treaty Of Rome in the 1950's established the first version of the European Common Market, with six member states, each allocated a number of votes for decision making:

Member Votes
1 France 4
2 West Germany 4
3 Italy 4
4 The Netherlands 2
5 Belgium 2
6 Luxembourg 1

The treaty determined that a quota of 12 votes was needed to pass any resolution. At first this table might seem manifestly unfair: West Germany with a population of over 55 million compared with Luxembourg's roughly 1/3 of a million, thus with something like 160 times the population, West Germany got only 4 times the number of votes of Luxembourg.

But in fact it's even worse: since 12 votes are required to win, and all the other numbers of votes are even, there is no way that Luxembourg can influence any vote at all: its voting power was zero. If another state joined, also with a vote of 1, then it and Luxembourg together can influence a vote, and so Luxembourg's voting power would increase.

A power index is some numerical value attached to a weighted vote which describes its power in this sense. Although there are many such indices, there are two which are most widely used. The first was developed by Lloyd Shapley (who would win the Nobel Prize for Economics in 2012) and Martin Shubik in 1954; the second by John Banzhaf in 1965.

Basic definitions

First, some notation. In general we will have \(n\) voters each with a weight \(w_i\), and a quota \(q\) to be reached. For the American Electoral College, the voters are the states, the weights are the numbers of Electoral votes, and \(q\) is the number of votes required: 238. This is denoted as

\[ [q; w_1, w_2,\ldots,w_n]. \]

The three state example above is thus denoted

\[ [51; 49, 49, 2] \]

and the EEC votes as

\[ [12; 4,4,4,2,2,1]. \]

The Shapley-Shubik index

Suppose we have \(n\) votes with weights \(w_1\), \(w_2\) up to \(w_n\), and a quote \(q\) required. Consider all permutations of \(1,2, \ldots,n\). For each permutation, add up the weights starting at the left, and designate as the pivot voter the first voter who causes the cumulative sum to equal or exceed the quota. For each voter \(i\), let \(s_i\) be the number of times that voter has been chosen as a pivot. Then its power index is \(s_i/n!\). This means that the sum of all power indices is unity.

Consider the three state example above, where \(w_1=w_2=49\) and \(w_3=2\), and where we compute cumulative sums only up to reaching or exceeding the quota:

Permutation Cumulative sum of weights Pivot Voter
1 2 3 49, 98 2
1 3 2 49, 51 3
2 1 3 49, 98 1
2 3 1 49, 51 3
3 1 2 2, 51 1
3 2 1 2, 51 2

We see that \(s_1=s_2=s_3=2\) and so the Shapley-Shubik power indices are all \(1/3\).

The Banzhaf index

For the Banzhaf index, we consider the winning coalitions: these are any subset \(S\) of voters for which the sum of weights is not less than \(q\). It's convenient to define a function for this:

\[ v(S) = \begin{cases} 1 & \text{if } \sum_{i\in S}w_i\ge q \cr 0 & \text{otherwise} \end{cases} \]

A voter \(i\) is necessary for a winning coalition \(S\) if \(S-\{i\}\) is not a winning coalition; that is, if \(v(S)-v(S-\{i\})=1\). If we define

\[ p(i) =\sum_S v(S)-v(s-\{i\}) \]

then \(b(i)\) is a measure of power, and the (normalized) Banzhaf power indices are defined as

\[ b(i) = \frac{p(i)}{\sum_i p(i)} \]

so that the sum of all indices (as for the Shapley-Shubik index) is again unity.

Considering the first table above, we see that \(p(1)=p(2)=p(3)=2\) and the Banzhf power indices are all \(1/3\). For this example the Banzhaf and Shapley-Shubik values agree. This is not always the case.

For the EEC example, the winning coalitions are, with necessary voters:

Winning Coalition Votes Necessary voters
1,2,3 12 1,2,3
1,2,4,5 12 1,2,4,5
1,3,4,5 12 1,3,4,5
2,3,4,5 12 2,3,4,5
1,2,3,6 13 1,2,3
1,2,4,5,6 13 1,2,4,5
1,3,4,5,6 13 1,3,4,5
2,3,4,5,6 13 2,3,4,5
1,2,3,4 14 1,2,3
1,2,3,5 14 1,2,3
1,2,3,4,6 15 1,2,3
1,2,3,5,6 15 1,2,3
1,2,3,4,5 16 No single voter
1,2,3,4,5,6 17 No single voter

Counting up the number of times each voter appears in the rightmost column, we see that

\[ p(1) = p(2) = p(3) = 10,\quad p(4) = p(5) = 6,\quad p(6) = 0 \]

and so

\[ b(1) = b(2) = b(3) = \frac{5}{21},\quad b(4) = b(5) = \frac{1}{7}. \]

Note that the power of the three biggest states is in fact only 5/3 times that of the smaller states, in spite of having twice as many votes. This is a striking example of how power is not proportional to voting weight.

Note that computing the Shapley-Shubik index could be unwieldy; there are

\[ \frac{6!}{3!2!} = 60 \]

different permutations of the weights, and clearly as the number of weights increases, possibly with very few repetitions, the number of permutations will be excessive. For the Electoral College, with 51 members, and a few states with the same numbers of voters, the total number of permutations will be

\[ \frac{51!}{(2!)^4(3!)^3(4!)(5!)(6!)(8!)} = 5368164393879631593058456306349344975896576000000000 \]

which is clearly far too large for enumeration. But as we shall see, there are other methods.

Electing a president

Every four years (barring death or some other catastrophe), the USA goes through the periodic madness of a presidential election. Wild behaviour, inaccuracies, mud-slinging from both sides have been central since George Washington's second term. And the entire business of voting is muddied by the Electoral College, the 538 members of which do the actual voting: the public, in their own voting, merely instruct the College what to do. Although it has been said that the EC "magnifies" the popular vote, this is not always the case, and quite often a president will be elected with a majority (270 or more) of Electoral College votes, in spite of losing the popular vote. This dichotomy encourages periodic calls for the College to be disbanded.

As you probably know, each of the 50 states and the District of Columbia has Electors allocated to it, roughly proportional to population. Thus California, the most populous state, has 55 electors, and several smaller states (and DC) only 3.

In all states except Maine and Nebraska, the votes are allocated on a "winner takes all" principle: that is, all the Electoral votes will be allocated to whichever candidate has obtained a plurality in that state. For only two candidates then, if a states' voters produce a simple majority of votes for one of them, that candidate gets all the EC votes.

Maine and Nebraska however, allocate their EC votes by congressional district. In each state, 2 EC votes are allocated to the winner of the popular vote in the state, and for each congressional district (2 in Maine, 3 in Nebraska), the other votes are allocated to the winner in that district.

It's been a bit of a mathematical game to determine the theoretical lowest bound on a popular vote for a president to be elected. To show how this works, imagine a miniature system with four states and 14 electoral college votes:

State Population Electors
Abell 100 3
Bisly 100 3
Champ 120 4
Dairy 120 4

Operating on the winner takes all principle in each state, 8 EC votes are required for a win. Suppose that in each state, the votes are cast as follows, for the candidates Mr Man and Mr Guy:

State Mr Man Mr Guy EC Votes to Man EC Votes to Guy
Abell 0 100 0 3
Bisly 0 100 0 3
Champ 61 59 4 0
Dairy 61 59 4 0
Total 122 310 8 6

and Mr Man wins with 8 EC votes but only about 27.3% of the popular vote. Now you might reasonably argue that this situation would never occur in practice, and probably you're right. But extreme examples such as this are used to show up inadequacies in voting systems. And sometimes very strange things do happen.

So: what is the smallest percentage of the popular vote under which a president could be elected? To experiment, we need to know the number of registered voters in each state (and it appears that the percentage of eligible citizens enrolled to vote differs markedly between the states), and the numbers of electors. The first I ran to ground here and the few states not accounted for I found information on their Attorney Generals' sites. The one state for which I couldn't find statistics was Illinois, so I used the number 7.8 million, which has been bandied about on a few news sites. The numbers of electors per state is easy to find, for example on the wikipedia page.

I make the following simplifying assumptions: all registered voters will vote; and all states operate on a winner takes all principle. Thus, for simplicity, I am not using the apportionment scheme of Maine and Nebraska. (I suspect that taking this into account wouldn't effect the result much anyway.)

Suppose that the registered voting population of each state (including DC) is \(v_i\) and the number of EC votes is \(c_i\). For any state, either the winner will be chosen by a bare majority, or all the votes will go to the loser. This becomes then a simple integer programming problem; in fact a knapsack problem. For each state, define

\[ m_i = \lfloor v_i/2\rfloor +1 \]

for the majority votes needed.

We want to minimize

\[ V = \sum_{i=1}^{51}x_im_i \]

subject to the constraint

\[ \sum_{k=1}^{51}c_ix_i \ge 270 \]

and each \(x_i\) is zero or one.

Now all we need to is set up this problem in a suitable system and solve it! I chose Julia and its JuMP modelling language, and for actually doing the dirty work, GLPK. JuMP in fact can be used with pretty much any optimisation software available, including commercial systems.

using JuMP, GLPK

states = ["Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","DC","Florida","Georgia","Hawaii",
    "Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire","New Jersey","New Mexico","New York","North Carolina",
    "North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","Rhode Island","South Carolina","South Dakota","Tennessee","Texas","Utah",
    "Vermont","Virginia","Washington","West Virginia","Wisconsin","Wyoming"]
reg_voters = [3560686,597319,4281152,1755775,22047448,4238513,2375537,738563,504043,14065627,7233584,795248,1010984,7800000,4585024,
majorities = [Int(floor(x/2+1)) for x in reg_voters]
ec_votes = [9,3,11,6,55,9,7,3,3,29,16,4,4,20,11,6,6,8,8,4,10,11,16,10,6,10,3,5,6,4,14,5,29,15,3,18,7,7,20,4,9,3,11,38,6,3,13,12,5,10,3]

potus = Model(GLPK.Optimizer)
@variable(potus, x[i=1:51], Bin)
@constraint(potus, sum(ec_votes .* x) >= 270)
@objective(potus, Min, sum(majorities .* x));

Solving the problem is now easy:


Now let's see what we've got:

vx = value.(x)
sum(ec_votes .* x)


votes = Int(objective_value(potus))




and we see we have elected a president with slightly less than 21.6% of the popular vote.

Digging a little further, we first find the states in which a bare majority voted for the winner:

f = findall(x -> x == 1.0, vx)
for i in f
    print(states[i],", ")

Alabama, Alaska, Arizona, Arkansas, California, Connecticut, Delaware, DC, Hawaii, Idaho,
llinois, Indiana, Iowa, Kansas, Louisiana, Maine, Minnesota, Mississippi, Montana, Nebraska,
Nevada, New Hampshire, New Mexico, North Dakota, Oklahoma, Oregon, Rhode Island, South Carolina,
South Dakota, Tennessee, Utah, Vermont, West Virginia, Wisconsin, Wyoming,

and the other states, in which every voter voted for the loser:

nf = findall(x -> x == 0.0, vx)
for i in nf
    print(states[i],", ")

Colorado, Florida, Georgia, Kentucky, Maryland, Massachusetts, Michigan,
Missouri, New Jersey, New York, North Carolina, Ohio, Pennsylvania, Texas,
Virginia, Washington,

In point of history, the election in which the president-elect did worst was in 1824, when John Quincy Adams was elected over Andrew Jackson; this was in fact a four-way contest, and the decision was in the end made by the House of Representatives, who elected Adams by one vote. And Jackson, never one to neglect an opportunity for vindictiveness, vowed that he would destroy Adams's presidency, which he did.

More recently, since the Electoral College has sat at 538 members, in 2000 George W. Bush won in spite of losing the popular vote by 0.51%, and in 2016 Donald Trump won in spite of losing the popular vote by 2.09%.

Plenty of numbers can be found on wikipedia and elsewhere.

Enumerating the rationals

The rational numbers are well known to be countable, and one standard method of counting them is to put the positive rationals into an infinite matrix \(M=m_{ij}\), where \(m_{ij}=i/j\) so that you end up with something that looks like this:

\[ \left[\begin{array}{ccccc} \frac{1}{1}&\frac{1}{2}&\frac{1}{3}&\frac{1}{4}&\dots\\[1ex] \frac{2}{1}&\frac{2}{2}&\frac{2}{3}&\frac{2}{4}&\dots\\[1ex] \frac{3}{1}&\frac{3}{2}&\frac{3}{3}&\frac{3}{4}&\dots\\[1ex] \frac{4}{1}&\frac{4}{2}&\frac{4}{3}&\frac{4}{4}&\dots\\[1ex] \vdots&\vdots&\vdots&\vdots&\ddots \end{array}\right] \]

It is clear that not only will each positive rational appear somewhere in this matrix, but its value will appear an infinite number of times. For example \(2 / 3\) will appear also as \(4 / 6\), as \(6 / 9\) and so on.

Then we can enumerate all the elements of this matrix by traversing all the SW--NE diagonals:

This provides an enumeration of all the positive rationals: \[ \frac{1}{1}, \frac{1}{2}, \frac{2}{1}, \frac{3}{1}, \frac{2}{2}, \frac{1}{3}, \frac{1}{4}, \frac{2}{3},\ldots \] To enumerate all rationals (positive and negative), we simply place the negative of each value immediately after it: \[ \frac{1}{1}, -\frac{1}{1}, \frac{1}{2}, -\frac{1}{2}, \frac{2}{1}, -\frac{2}{1}, \frac{3}{1}, -\frac{3}{1}, \frac{2}{2}, -\frac{2}{2}, \frac{1}{3}, -\frac{1}{3}, \frac{1}{4}, \frac{1}{4}, \frac{2}{3}, -\frac{2}{3}\ldots \] This is all standard, well-known stuff, and as far as countability goes, pretty trivial.

One might reasonably ask: is there a way of enumerating all rationals in such a way that no rational is repeated, and that every rational appears naturally in its lowest form?

Indeed there is; in fact there are several, of which one of the newest, most elegant, and simplest, is using the Calkin-Wilf tree. This is named for its discoverers (or creators, depending on which philosophy of mathematics you espouse), who described it in an article happily available on the archived web site of the second author. Herbert Wilf died in 2012, but the Mathematics Department at the University of Pennsylvania have maintained the page as he left it, as an online memorial to him.

The Calkin-Wilf tree is a binary tree with root \(a / b = 1 / 1\). From each node \(a / b\) the left child is \(a / (a+b)\) and the right child is \((a+b) / b\). From each node \(a / b\), the path back to the root contains the fractions which encode, as it were, the Euclidean algorithm for determining the greatest common divisor of \(a\) and \(b\). It is not hard to show that every fraction in the tree is in its lowest terms, and appears only once; also that every rational appears in the tree.

The enumeration of the rationals can thus be made by a breadth-first transversal of the tree; in other words listing each level of the tree one after the other:

\[ \underbrace{\frac{1}{1}}_{\text{The root}},; \underbrace{\frac{1}{2},; \frac{2}{1}}_{\text{first level}},; \underbrace{\frac{1}{3},; \frac{3}{2},; \frac{2}{3},; \frac{3}{1}}_{\text{second level}},; \underbrace{\frac{1}{4},; \frac{4}{3},; \frac{3}{5},; \frac{5}{2},; \frac{2}{5},; \frac{5}{3},; \frac{3}{4},; \frac{4}{1}}_{\text{third level}};\ldots \]

Note that the denominator of each fraction is the numerator of its successor (again, this is not hard to prove in general); thus given the sequence

\[ b_i=0,1,1,2,1,3,2,3,1,4,3,5,2,5,3,4,\ldots \]

(indexed from zero), the rationals are enumerated by \(b_i/b_{i+1}\). This sequence pre-dates Calkin and Wilf; is goes back to an older enumeration now called the Stern-Brocot tree named for the mathematician Moritz Stern and the clock-maker Achille Brocot (who was investigating gear ratios), who discovered this tree independently in the early 1860's.

The sequence \(b_i\) is called Stern's diatomic sequence and can be generated recursively:

\[ b_i=\left\{\begin{array}{ll} i,&\text{if $i\le 1$}\\\
b_{i/2},&\text{if $i$ is even}\\\
b_{(i-1)/2}+b_{(i+1)/2},&\text{if $i$ is odd} \end{array} \right. \]

Alternatively: \[ b_0=0,;b_1=1,;b_{2i}=b_i,b_{2i+1}=b_i+b_{i+1}\text{ for }i\ge 1. \] This is the form in which it appears as sequence 2487 in the OEIS.

So we can generate Stern's diatomic sequence \(b_i\), and then the successive fractions \(b_i/b_{i+1}\) will generate each rational exactly once.

If that isn't remarkable enough, sometime prior to 2003, Moshe Newman showed that the Calkin-Wilf enumeration of the rationals can in fact be done directly: \[ x_0 = 1,\quad x_{i+1}=\frac{1}{2\lfloor x_i\rfloor -x_i +1};\text{for};i\ge 1 \] will generate all the rationals. I can't find anything at all about Moshe Newman; he is always just mentioned as having "shown" this result. Never where, or to whom. There is a proof for this in an article "New Looks at Old Number Theory" by Aimeric Malter, Dierk Schleicher and Don Zagier, published in The American Mathematical Monthly , Vol. 120, No. 3 (March 2013), pp. 243-264. The part of the article relating to enumeration of rationals is based on a prize-winning mathematical essay by the first author (who at the time was a high school student in Bremen, Germany), when he was only 13. Here is the skeleton of Malter's proof:

If \(x\) is any node, then its left and right hand children are \(L = x / (x+1)\) and \(R = 1+x = 1 / (1-L)\) respectively. And clearly \(R = 1/(2\lfloor L\rfloor -L +1)\). Suppose now that \(A\) is a right child, and \(B\) is its successor rational. Then \(A\) and \(B\) will have a common ancestor \(z=p/q\), say \(k\) generations ago. To get from \(z\) to \(A\) will require one left step and \(k-1\) right steps. It is easy to show (by induction if you like), that

\[ A = k-1+\frac{p}{p+q} \] and for its successor \(B\), obtained by one right step from \(z\) and \(k-1\) left steps: \[ B = \frac{1}{\frac{q}{p+q}+k-1}. \] Since \(k-1=\lfloor A\rfloor\), and since \[ \frac{p}{p+q} = A-\lfloor A\rfloor \] it follows that \[ B=\frac{1}{1-(A-\lfloor A\rfloor))+\lfloor A\rfloor}=\frac{1}{2\lfloor A\rfloor-A+1}. \] The remaining case is moving from the end of one row to the beginning of the next, that is, from \(n\) to \(1 / (n+1)\). And this is trivial.

What's more, we can write down the isomorphisms between this sequence of positive rationals and in positive integers. Define \(N:\Bbb{Q}\to\Bbb{Z}\) as follows:

\[ N(p/q)=\left\{\begin{array}{ll} 1,&\text{if $p=q$}\\\
2 N(p/(q-p)),&\text{if $p\lt q$}\\\
2 N((p-q)/q)+1,&\text{if $p\gt q$} \end{array} \right. \]

Without going through a formal proof, what this does is simply count the number of steps taken to perform the Euclidean algorithm on \(p\) and \(q\). The extra factors of 2 ensure that rationals in level \(k\) have values between \(2^k\) and \(2^{k+1}\), and the final "\(+1\)" differentiates left and right children. This function assumes that \(p\) and \(q\) are relatively prime; that is, that the fraction \(p/q\) is in its lowest terms.

(The isomorphism in the other direction is given by \(k\mapsto b_k/b_{k+1}\) where \(b_k\) are the elements of Stern's diatomic sequence discussed above.)

This is just the sort of mathematics I like: simple, but surprising, and with depth. What's not to like?

Fitting the SIR model of disease to data in Julia

A few posts ago I showed how to do this in Python. Now it's Julia's turn. The data is the same: spread of influenza in a British boarding school with a population of 762. This was reported in the British Medical Journal on March 4, 1978, and you can read the original short article here.

As before we use the SIR model, with equations

\begin{align*} \frac{dS}{dt}&=-\frac{\beta IS}{N}\\\
\frac{dI}{dt}&=\frac{\beta IS}{N}-\gamma I\\\
\frac{dR}{dt}&=\gamma I \end{align*}

where \(S\), \(I\), and \(R\) are the numbers of susceptible, infected, and recovered people. This model assumes a constant population - so no births or deaths - and that once recovered, a person is immune. There are more complex models which include a changing population, as well as other disease dynamics.

The above equations can be written without the population; since it is constant, we can just write \(\beta\) instead of \(\beta/N\). The values \(\beta\) and \(\gamma\) are the parameters which affect the working of this model, their values and that of their ratio \(\beta/\gamma\) provide information of the speed of the disease spread.

As with Python, our interest will be to see if we can find values of \(\beta\) and \(\gamma\) which model the school outbreak.

We will do this in three functions. The first sets up the differential equations:

using DifferentialEquations

function SIR!(du,u,p,t)
    S,I,R = u
    β,γ = p
    du[1] = dS = -β*I*S
    du[2] = dI = β*I*S - γ*I
    du[3] = dR = γ*I

The next function determines the sum of squares between the data and the results of the SIR computations for given values of the parameters. Since we will put all our functions into one file, we can create the constant values outside any functions which might need them:

data = [1, 3, 6, 25, 73, 222, 294, 258, 237, 191, 125, 69, 27, 11, 4]
tspan = (0.0,14.0)
u0 = [762.0,1.0,0.0]

function ss(x)
    prob = ODEProblem(SIR!,u0,tspan,(x[1],x[2]))
    sol = solve(prob)
    sol_data = sol(0:14)[2,:]
    return(sum((sol_data - data) .^2))

Note that we don't have to carefully set up the problem to produce values at each of the data points, which in our case are the integers from 0 to 14. Julia will use a standard numerical technique with a dynamic step size, and values corresponding to the data points can then be found by interpolation. All of this functionality is provided by the DifferentialEquations package. For example, R[10] will return the 10th value of the list of computed R values, but R(10) will produce the interpolated value of R at \(t=10\).

Finally we use the Optim package to minimize the sum of squares, and the Plots package to plot the result:

using Optim
using Plots

function run_optim()
    opt = optimize(ss,[0.001,0.01],NelderMead())
    beta,gamma = opt.minimizer
    prob = ODEProblem(SIR!,u0,tspan,(beta,gamma))
    sol = solve(prob)
	    xaxis="Time in days",
	    label=["Susceptible" "Infected" "Recovered"])

Running the last function will produce values

\begin{align*} \beta&=0.0021806887934782853\\\
\gamma&=0.4452595474326912 \end{align*}

and the final plot looks like this:

!Julia SIR plot

This was at least as easy as in Python, and with a few extra bells and whistles, such as interpolation of data points. Nice!

The Butera-Pernici algorithm (2)

The purpose of this post will be to see if we can implement the algorithm in Julia, and thus leverage Julia's very fast execution time.

We are working with polynomials defined on nilpotent variables, which means that the degree of any generator in a polynomial term will be 0 or 1. Assume that our generators are indexed from zero: \(x_0,x_1,\ldots,x_{n-1}\), then any term in a polynomial will have the form \[ cx_{i_1}x_{i_2}\cdots x_{i_k} \] where \(\{x_{i_1}, x_{i_2},\ldots, x_{i_k}\}\subseteq\{0,1,2,\ldots,n-1\}\). We can then express this term as an element of a dictionary {k => v} where \[ k = 2^{i_1}+2^{i_2}+\cdots+2^{i_k}. \] So, for example, the polynomial term \(7x_2x_3x_5\) would correspond to the dictionary term

44 => 7

since \(44 = 2^2+2^3+2^5\). Two polynomial terms {k1 => v1} and {k2 => v2} with no variables in common can then be multiplied simply by adding the k terms, and multiplying the v values, to obtain {k1+k2 => v1*v2} . And we can check if k1 and k2 have a common variable easily by evaluating k1 & k2; a non-zero value indicates a common variable. This leads to the following Julia function for multiplying two such dictionaries:

function poly_dict_mul(p1, p2)
  p3 = Dict{BigInt,BigInt}()
  for (k1, v1) in p1
    for (k2, v2) in p2
	  if k1 & k2 > 0
	    if k1 + k2 in keys(p3)
		  p3[k1+k2] += v1 * v2
		  p3[k1+k2] = v1 * v2
  return (p3)

As you see, this is a simple double loop over the terms in each polynomial dictionary. If two terms have a non-zero conjunction, we simply move on. If two terms when added already exist in the new dictionary, we add to that term. If the sum of terms is new, we create a new dictionary element. The use of BigInt is to ensure that no matter how big the terms and coefficients become, we don't suffer from arithmetic overflow.

For example, suppose we consider the product \[ (x_0+x_1+x_2)(x_1+x_2+x_3). \] A straightforward expansion produces \[ x_0x_3 + x_1x_3 + x_1^2 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2^2 + x_2x_3. \] which by nilpotency becomes \[ x_0x_3 + x_1x_3 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2x_3. \] The dictionaries corresponding to the two polynomials are

{1 => 1, 2 => 1, 4 => 1}


{2 => 1, 4 => 1, 8 => 1}


julia> poly_dict_mul(Dict(1=>1,2=>1,4=>1),Dict(2=>1,4=>1,8=>1))
Dict{BigInt,BigInt} with 6 entries:
  9  => 1
  10 => 1
  3  => 1
  5  => 1
  6  => 2
  12 => 1

If we were to rewrite the keys as binary numbers, we would have

{1001 => 1, 1010 => 1, 11 => 1, 101 => 1, 110 => 2, 1100 => 1}

in which you can see that each term corresponds with the term of the product above.

Having conquered multiplication, finding the permanent should then require two steps:

  1. Turning each row of the matrix into a polynomial dictionary.
  2. Starting with \(p=1\), multiply all rows together, one at a time.

For step 1, suppose we have a row \(i\) of a matrix \(M=m_{ij}\). Then starting with an empty dictionary p, we move along the row, and for each non-zero element \(m_{ij}\) we add the term p[BigInt(1)<<j] = M[i,j]. For speed we use bit operations instead of arithmetic operations. This means we can create a list of all polynomial dictionaries:

function mat_polys(M)
  (n,ncols) = size(M)
  ps = []
  for i in 1:n
    p = Dict{BigInt,BigInt}()
    for j in 1:n
      if M[i,j] == 0
	    p[BigInt(1)<<(j-1)] = M[i,j]

Step 2 is a simple loop; the permanent will be given as the value in the final step:

function poly_perm(M)
  (n,ncols) = size(M)
  mp = mat_polys(M)
  p = Dict{BigInt,BigInt}(0=>1)
  for i in 1:n
    p = poly_dict_mul(p,mp[i])

We don't in fact need two separate functions here; since the polynomial dictionary for each row is only used once, we could simply create each one as we needed. However, given that none of our matrices will be too large, the saving of time and space would be minimal.

Now for a few tests:

julia> n = 10; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> poly_perm(M)

julia> n = 20; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.003214 seconds (30.65 k allocations: 690.875 KiB)

julia> n = 40; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.014794 seconds (234.01 k allocations: 5.046 MiB)

julia> n = 100; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.454841 seconds (3.84 M allocations: 83.730 MiB, 27.98% gc time)

julia> lucasnum(n)+2

This is extraordinarily fast, especially compared with our previous attempts: naive attempts using all permutations, and using Ryser's algorithm.

  • A few comparisons
Over the previous blog posts, we have explored various different methods of
computing the permanent:

-   `permanent`, which is the most naive method, using the formal definition, and
    summing over all the permutations \\(S\_n\\).

-   `perm1`, Ryser's algorithm, using the Combinatorics package and iterating over
    all non-empty subsets of \\(\\{1,2,\ldots,n\\}\\).

-   `perm2`, Same as `perm1` but instead of using subsets, we use all non-zero
    binary vectors of length n.

-   `perm3`, Ryser's algorithm using Gray codes to speed the transition between
    subsets, and using a lookup table.

All these are completely general, and aside from the first function, which is
the most inefficient, can be used for any matrix up to size about \\(25\times 25\\).

So consider the \\(n\times n\\) circulant matrix with three ones in each row, whose
permanent is \\(L(n)+2\\).  The following table shows times in seconds (except where
minutes is used) for each calculation:

<style>.table-nocaption table { width: 75%;  text-align: left;  }</style>

<div class="ox-hugo-table table-nocaption">

|             | 10     | 12    | 15    | 20    | 30    | 40   | 60   | 100  |
| `permanent` | 9.3    | -     | -     | -     | -     | -    | -    | -    |
| `perm1`     | 0.014  | 0.18  | 0.72  | 47    | -     | -    | -    | -    |
| `perm2`     | 0.03   | 0.105 | 2.63  | 166   | -     | -    | -    | -    |
| `perm3`     | 0.004  | 0.016 | 0.15  | 12.4  | -     | -    | -    | -    |
| `poly_perm` | 0.0008 | 0.004 | 0.001 | 0.009 | 0.008 | 0.02 | 0.05 | 0.18 |


Assuming that the time taken for `permanent` is roughly proportional to \\(n!n\\),
then we would expect that the time for matrices of sizes 23 and 24 would be
about \\(1.5\times 10^{17}\\) and \\(3.8\times 10^{18}\\) seconds respectively.  Note
that the age of the universe is approximately \\(4.32\times 10^{17}\\) seconds, so
my laptop would need to run for about the third of the universe's age to compute
the permanent of a \\(23\times 23\\) matrix.  That's about the time since the solar
system and the Earth were formed.

Note also that `poly_perm` will slow down if the number of non-zero values in
each row increases.  For example, with four consecutive ones in each row, it
takes over 10 seconds for a \\(100\times 100\\) matrix.  With five ones in each row,
it takes about 2.7 and 21.6 seconds respectively for matrices of size 40 and 60.
Extrapolating indicates that it would take about 250 seconds for the \\(100\times
100\\) matrix.  In general, an \\(n\times n\\) matrix with \\(k\\) non-zero elements in
each row will have a time complexity approximately of order \\(n^k\\).  However,
including the extra optimization (which we haven't done) that allows for
elements to be set to one before the multiplication, produces an algorithm whose
complexity is \\(O(2^{\text{min}(2w,n)}(w+1)n^2)\\) where \\(n\\) is the size of the
matrix, and \\(w\\) its band-width.  See the [original
paper](<>) for details.

The Butera-Pernici algorithm (1)


We know that there is no general sub-exponential algorithm for computing the permanent of a square matrix. But we may very reasonably ask -- might there be a faster, possibly even polynomial-time algorithm, for some specific classes of matrices? For example, a sparse matrix will have most terms of the permanent zero -- can this be somehow leveraged for a better algorithm?

The answer seems to be a qualified "yes". In particular, if a matrix is banded, so that most diagonals are zero, then a very fast algorithm can be applied. This algorithm is described in an online article by Paolo Butera and Mario Pernici called Sums of permanental minors using Grassmann algebra. Accompanying software (Python programs) is available at github. This software has been rewritten for the SageMath system, and you can read about it in the documentation. The algorithm as described by Butera and Pernici, and as implemented in Sage, actually produces a generating function.

Our intention here is to investigate a simpler version, which computes the permanent only.

Basic outline

Let \(M\) be an \(n\times n\) square matrix, and consider the polynomial ring on \(n\) variables \(x_1,x_2,\ldots,x_n\). Each row of the matrix will correspond to an element of this ring; in particular row \(i\) will correspond to

\[ \sum_{j=1}^nm_{ij}a_j=m_{i1}a_1+m_{i2}a_2+\cdots+m_{in}a_n. \]

Suppose further that all the generating elements \(x_i\) are nilpotent of order two, so that \(x_i^2=0\).

Now if we take all the row polynomials and multiply them, each term of the product will have order \(n\). But by nilpotency, all terms which contain a repeated element will vanish. The result will be only those terms which contain each generator exactly once, of which there will be \(n!\). To obtain the permanent all that is required is to set \(x_i=1\) for each generator.

Here's an example in Sage.

sage: R.<a,b,c,d,e,f,g,h,i,x1,x2,x3> = PolynomialRing(QQbar)
sage: M = matrix([[a,b,c],[d,e,f],[g,h,i]])
sage: X = matrix([[x1],[x2],[x3]])
sage: MX = M*X

[a*x1 + b*x2 + c*x3]
[d*x1 + e*x2 + f*x3]
[g*x1 + h*x2 + i*x3]

To implement nilpotency, it's easiest to reduce modulo the ideal defined by \(x_i^2=0\) for all \(i\). So we take the product of those row elements, and reduce:

sage: I = R.ideal([x1^2, x2^2, x3^2])
sage: pr = MX[0,0]*MX[1,0]*MX[2,0]
sage: pr.reduce(I)

c*e*g*x1*x2*x3 + b*f*g*x1*x2*x3 + c*d*h*x1*x2*x3 + a*f*h*x1*x2*x3 + b*d*i*x1*x2*x3 + a*e*i*x1*x2*x3

Finally, set each generator equal to 1:

sage: pr.reduce(I).subs({x1:1, x2:1, x3:1})

c*e*g + b*f*g + c*d*h + a*f*h + b*d*i + a*e*i

and this is indeed the permanent for a general \(3\times 3\) matrix.

Some experiments

Let's experiment now with the matrices we've seen in a previous post, which contain three consecutive super-diagonals of ones, and the rest zero.

Such a matrix is easy to set up in Sage:

sage: n = 10
sage: v = n*[0]; v[1:4] = [1,1,1]
sage: M = matrix.circulant(v)
sage: M

[0 1 1 1 0 0 0 0 0 0]
[0 0 1 1 1 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 0 1 1 1 0 0]
[0 0 0 0 0 0 1 1 1 0]
[0 0 0 0 0 0 0 1 1 1]
[1 0 0 0 0 0 0 0 1 1]
[1 1 0 0 0 0 0 0 0 1]
[1 1 1 0 0 0 0 0 0 0]

Similarly we can define the polynomial ring:

sage: R = PolynomialRing(QQbar,x,n)
sage: R.inject_variables()

Defining x0, x1, x2, x3, x4, x5, x6, x7, x8, x9

And now the polynomials corresponding to the rows:

sage: MX = M*matrix(R.gens()).transpose()
sage: MX

[x1 + x2 + x3]
[x2 + x3 + x4]
[x3 + x4 + x5]
[x4 + x5 + x6]
[x5 + x6 + x7]
[x6 + x7 + x8]
[x7 + x8 + x9]
[x0 + x8 + x9]
[x0 + x1 + x9]
[x0 + x1 + x2]

If we multiply them, we will end up with a huge expression, far too long to display:

sage: pr = prod(MX[i,0] for i in range(n))
sage: len(pr.monomials)


We could reduce this by the ideal, but that would be slow. Far better to reduce after each separate multiplication:

sage: I = R.ideal([v^2 for v in R.gens()])
sage: p =
sage: for i in range(n):
	      p = p*MX[i,0]
	      p = p.reduce(I)

sage: p.subs({v:1 for v in R.gens())

The answer is almost instantaneous. We can repeat the above list of commands starting with different values of n; for example with n=20 the result is 15129, as we expect.

This is not yet optimal; for n=20 on my machine the final loop takes about 7.8 seconds. Butera and Pernici show that the multiplication and setting the variables to one can sometimes be done in the opposite order; that is, some variables can be identified to be set to one before the multiplication. This can speed the entire loop dramatically, and this optimization has been included in the Sage implementation. For details, see their paper.

The size of the universe

As a first blog post for 2020, I'm dusting off one from my previous blog, which I've edited only slightly.

I've been looking up at the sky at night recently, and thinking about the sizes of things.  Now it's all very well to say something is for example a million kilometres away; that's just a number, and as far as the real numbers go, a pretty small one (all finite numbers are "small").  The difficulty comes in trying to marry very large distances and times with our own human scale.  I suppose if you're a cosmologist or astrophysicist this is trivial, but for the rest of us it's pretty daunting.

It's all a problem of scale.  You can say the sun has an average distance of 149.6 million kilometres from earth (roughly 93 million miles), but how big, really, is that?  I don't have any sense of how big such a distance is: my own sense of scale goes down to about 1mm in one direction, and up to about 1000km in the other.  This is hopelessly inadequate for cosmological measurements.

So let's start with some numbers:

  • Diameter of Earth:  12,742 km

  • Diameter of the moon: 3,475km

  • Diameter of Sun: 1,391,684 km

  • Diameter of Jupiter: 139,822 km

  • Average distance of Earth to the sun: 149,597,870km

  • Average distance of Jupiter to the sun: 778.5 million km

  • Average distance of Earth to the moon: 384,400 km

Of course since all orbits are elliptical, distances will both exceed and be less than the average at different times. However, for our purposes of scale, an average is quite sufficient.

By doing a bit of division, we find that the moon is about 0.27 the width of the earth, Jupiter is about 11 times bigger (in linear measurements) and the Sun about 109.2 times bigger than the Earth.

Now for some scaling.  We will scale the earth down to the size of a mustard seed, which is about 1mm in diameter.  On this scale, the Sun is about the size of a large grapefruit (which happily is large, round, and yellow), and the moon is about the size of a dust mite:

On this new scale, with 12742 km equals 1 millimetre, the above distances become:

  • Diameter of Earth:  1mm

  • Diameter of the moon: 0.27mm

  • Diameter of Sun: 109.2m

  • Diameter of Jupiter: 10.97mm

  • Average distance of Earth to the sun: 11740mm = 11.74m

  • Average distance of Jupiter to the sun: 61097.2mm = 61.1m

  • Average distance of Earth to the moon: 30.2mm = 3cm

So how long is the distance from the sun to the Earth?  Well, a cricket pitch is 22 yards long, so 11 yards from centre to end, which is about 10.1 metres.  So imagine our grapefruit placed at the centre of a cricket pitch.  Go to an end of the pitch, and about 1.5 metres (about 5 feet) beyond.  Place the mustard seed there.  What you now have is a scale model of the sun and earth.

Here's a cricket pitch to give you an idea of its size:

Note that in this picture, the yellow circle is not drawn to size.  If you look just left of centre, you'll see the cricket ball, which has a diameter of about 73mm.  Our "Sun" grapefruit should be about half again as wide as that.

If you don't have a sense of a cricket pitch (even with this picture), consider instead a tennis court: the distance from the net to the baseline is 39 feet, or 11.9m. At our scale, this is nearly exact:

(Note that on this scale the sun is somewhat bigger than a tennis ball, and the Earth would in fact be too small to see on this picture.)

So we now have a scale model of the Sun and Earth.  If we wanted to include the Moon, start with its average distance from Earth (384,400 km), then we'd have a dust mite circling our mustard seed at a distance of 3cm.

How about Jupiter?  Well, we noted before that it is about 61m away. Continuing with our cricket pitch analogy, imagine three pitches laid end to end, which is 66 yards, or 60.35 metres.  Not too far off, really!  So place the grapefruit at the end of the first pitch, the mustard seed a little away from centre, and at the end of the third pitch place an 11mm ball for Jupiter: a glass marble will do nicely for this.

And the size of the solar system?  Assuming the edge is given by the heliopause (where the Sun's solar wind is slowed down by interstellar particles); this is at a distance of about 18,100,000,000 km from the Sun, which in our scale is about 1.42 km, or a bit less than a mile (0.88 miles).  Get that?  With Earth the size of a mustard seed, the edge of the solar system is nearly a mile away!

Onwards and outwards

So with this scaling we have got the solar system down to a reasonably manageable size.  If 149,600,000 km seems too vast a distance to make much sense of, scaling it down to 11.7 metres is a lot easier.  But let's get cosmological here, and start with a light year, which is 9,460,730,472,580,800 m, or more simply (and inexactly) 9.46× 1015m.   In our scale, that becomes 742,483,948.562 mm, or about 742 km, which is about 461 miles. That's about the distance from New York city to Greensboro, NC, or from Melbourne to Sydney.  The nearest star is Proxima Centauri, which is 4.3 light years away: at our Earth=mustard seed scale, that's about 3192.6 km, or 1983.8 miles.  This is the flight distance from Los Angeles to Detroit.  Look at that distance on an atlas, imagine our planet home mustard seed at one place and consider getting to the other.

The furthest any human has been from the mustard seed Earth is to the dust-mite Moon: 3cm, or 1.2 inches away.  To get to the nearest star is, well, a lot further!

The nearest galaxy to the Milky Way is about 0.025 mly away.  ("mly" = "millions of light years").  Now we're getting into the big stuff.  At our scale, this distance will be 18,500,000 kilometres, which means that at our mustard seed scale, the nearest galaxy is about 18.5 million kilometres away.   And there are lots of other galaxies, and much further away than this.  For example, the Andromeda Galaxy is 2,538,000 light years away, which at our scale is 1,884,465,000 km -- nearly two billion kilometres!

What's remarkable is that even scaling the Earth down to a tiny mustard seed speck, we are still up against distances too vast for human scale.  We could try scaling the Earth down to a ball whose diameter is the thickness of the finest human hair -- about 0.01 mm -- which is the smallest distance within reach of our own scale.  But even at this scale distances are only reduced by a factor of 100, so the nearest galaxy is still 18,844,650 km away.

One last try: suppose we scale the entire Solar System, out to the heliopause, down to a mustard seed.  This means that the diameter of the heliopause: 36,200,000,000 km, is scaled down to 1mm.  Note that the heliopause is about three times further away from the sun than the mean distance of Pluto.  At this scale, one light year is a happily manageable 261mm, or about ten and a quarter inches.  So the nearest star is 1.12m away, or about 44 inches.  And the nearest galaxy?  Well, it's 25000 light years away, which puts it at about 6.5 km.  The Andromeda Galaxy is somewhat over 663 km away.  The furthest galaxy, with the enticing name of GN-z11 is said to be about 34 billion light years away.  On our heliopause=mustard seed scale, that's about 9.1 million kilometres.

There's no escaping it, the Universe is big, and the scales need to describe it, no matter how you approach them, quickly leap out of of our own human scale.

Permanents and Ryser's algorithm

As I discussed in my last blog post, the permanent of an \(n\times n\) matrix \(M=m_{ij}\) is defined as \[ \text{per}(M)=\sum_{\sigma\in S_n}\prod_{i=1}^nm_{i,\sigma(i)} \] where the sum is taken over all permutations of the \(n\) numbers \(1,2,\ldots,n\). It differs from the better known determinant in having no sign changes. For example: \[ \text{per} \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix} =aei+afh+bfg+bdi+cdi+ceg. \] By comparison, here is the determinant: \[ \text{det} \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix} =aei - afh + bfg - bdi + cdi - ceg. \] The apparent simplicity of the permanent definition hides the fact that there is no known sub-exponential algorithm to compute it, nor does it satisfy most of the nice properties of determinants. For example, we have

\[ \text{det}(AB)=\text{det}(A)\text{det}(B) \]

but in general \(\text{per}(AB)\ne\text{per}(A)\text{per}(B)\). Nor is the permanent zero if two rows are equal, or if any subset of rows is linearly dependent.

Applying the definition and summing over all the permutations is prohibitively slow; of \(O(n!n)\) complexity, and unusable except for very small matrices.

In the small but excellent textbook "Combinatorial Mathematics" by Herbert J. Ryser and published in 1963, one chapter is devoted to the inclusion-exclusion principle, of which the computation of permanents is given as an example. The permanent may be considered as a sum of products, where in each product we choose one value from each row and one value from each column.

Suppose we start by adding the rows together and multiplying them: \[ P = (a+d+g)(b+e+h)(c+f+i). \] This will certainly contain all elements of the permanent, but it also includes products we don't want, such as for example \(aef\) where the elements are chosen from only two rows, and \(ghi\) where the elements are all in one row.

To eliminate all possible products from only two rows we subtract them:

\begin{align*} P & -(a+d)(b+e)(c+f)\text{ rows $1$ and $2$}\\\
& - (a+g)(b+h)(c+i)\text{ rows $1$ and $3$}\\\
& - (d+g)(e+h)(f+i)\text{ rows $2$ and $3$} \end{align*}

The trouble with that subtraction is that we are subtracting products of each individual row twice; for example \(abc\) is in the first and second products. But we only want to subtract those products once from \(P\). So we have to add them again:

\begin{align*} P & -(a+d)(b+e)(c+f)\qquad\text{ rows $1$ and $2$}\\\
& - (a+g)(b+h)(c+i)\qquad\text{ rows $1$ and $3$}\\\
& - (d+g)(e+h)(f+i)\qquad\text{ rows $2$ and $3$}\\\
& + abc\\\
& + def\\\
& + ghi. \end{align*}

Computing the permanent of a \(4\times 4\) matrix would start by adding all the rows and multiplying the sums. Then we would subtract all products of rows taken three at a time. But this would subtract all products of rows taken two at a time twice for each pair, so we add those products back in again. Finally we find that the number of times we've subtracted products of a single row have cancelled out, so we need to subtract them again:

\begin{array}{ccccc} &1&2&3&4\\\
-&&&&4 \end{array}

After all of this we are left with only those products which include all fours rows and columns. And as you see, this is a standard inclusion-exclusion approach.

For an \(n\times n\) matrix, let \(S=\{1,2,\ldots,n\}\), and let \(X\subseteq S\). Then define \(R(X)\) to be the product of the sums of elements of rows indexed by \(X\). For example, with \(S=\{1,2,3\}\) and \(X=\{1,3\}\), then

\[ R(X)=(a+g)(b+h)(c+i). \] We can thus write the above method for obtaining the permanent as:

\begin{align*} \text{per}(M)&=\sum_{\emptyset\ne X\subseteq S}(-1)^{n-|X|}R(X)\\\
&= (-1)^n\sum_{\emptyset\ne X\subseteq S}(-1)^{|X|}R(X) \end{align*}

This is Ryser's algorithm.

Naive Implementation

A naive implementation would be to simply iterate through all the non-empty subsets \(X\) of \(S=\{1,2,\ldots,n\}\), and for each subset add those rows, and multiply the resulting sums.

Here's one such Julia function:

using Combinatorics

function perm1(M)
    n, nc = size(M)
    S = 1:n
    P = 0
    for X in setdiff(powerset(S),powerset([]))
	    P += (-1)^length(X)*prod(sum(M[i,:] for i in X))
    return((-1)^n * P)

Alternatively, we can manage without any extra packages, and use the fact that the subsets of \(S\) correspond to the binary digits of integers between 1 and \(2^n-1\):

function perm2(M)
    n,nc = size(m)
    P = 0
    for i in (1:2^n-1)
	    indx = digits(i,base=2,pad=n)
	    P += (-1)^sum(indx)*prod(sum(M .* indx,dims=1))
    return((-1)^n * P)

Now for some tests. There are very few matrices for which the permanent has a known value; however there are some circulant matrices of zeros and ones whose permanent is known. One such is the \(n\times n\) matrix \(M=m_{ij}\) whose first, second, and third circulant superdiagonals are ones; that is, for which

\[ m_{ij}=1 \Leftrightarrow\bmod(j-1,n)\in\{1,2,3\}. \]

(Note that since the permanent is trivially unchanged by any permutation of the rows, \(M\) can be also defined as being a circulant matrix each row of which has three consecutive ones.)


\[ \text{per}(M)=F_{n-1}+F_{n+1}+2 \]

where \(F_k\) is the \(k\) -th Fibonacci number indexed from 1, so that \(F_1=F_2=1\). Note that the sums of Fibonacci numbers whose indices differ by two form the Lucas numbers \(L_n\).


\[ \text{per}(M)=\text{trace}(C_2^n)+2 \]


\[ C_2=\begin{bmatrix}0&1\\1&1\end{bmatrix} \]

This result, and some others, can be found in the article "Permanents" by Marvin Marcus and Henryk Minc, in The American Mathematical Monthly, Vol. 72, No. 6 (Jun. - Jul., 1965), pp. 577-591.

julia> n = 10;

julia> M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n]
10×10 Array{Int64,2}:
 0  1  1  1  0  0  0  0  0  0
 0  0  1  1  1  0  0  0  0  0
 0  0  0  1  1  1  0  0  0  0
 0  0  0  0  1  1  1  0  0  0
 0  0  0  0  0  1  1  1  0  0
 0  0  0  0  0  0  1  1  1  0
 0  0  0  0  0  0  0  1  1  1
 1  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  1
 1  1  1  0  0  0  0  0  0  0

julia> perm1(M)

julia> perm2(M)

julia> fibonaccinum(n-1)+fibonaccinum(n+1)+2

julia> lucasnum(n)+2

julia> C2=[0 1; 1 1];

julia> tr(C2^n)+2

However, it seems that using the machinery of subsets adds to the time, which becomes noticeable if \(n\) is large:

julia> using Benchmarktools

julia> n = 20; M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n];

julia> @time perm1(M)
  6.703187 seconds (31.46 M allocations: 5.097 GiB, 17.93% gc time)

julia> @time perm2(M)
  1.677242 seconds (3.21 M allocations: 3.721 GiB, 8.69% gc time)

That is, the perm2 implementation is about four times faster than perm1.

Implementation with Gray codes

Here is where we can show some cleverness. Recall that a Gray code is a listing of all numbers 0 through \(2^n-1\) whose binary expansion changes in only one bit between consecutive values. It is also known that for \(1\le k\le 2^n-1\) then the Gray code corresponding to \(k\) is given by \(k\oplus (k\gg 1)\). For example, for \(n=4\):

julia> for i in 1:15
       j = xor(i,i>>1)
 1:  [1, 0, 0, 0]        1:  [1, 0, 0, 0]
 2:  [0, 1, 0, 0]        3:  [1, 1, 0, 0]
 3:  [1, 1, 0, 0]        2:  [0, 1, 0, 0]
 4:  [0, 0, 1, 0]        6:  [0, 1, 1, 0]
 5:  [1, 0, 1, 0]        7:  [1, 1, 1, 0]
 6:  [0, 1, 1, 0]        5:  [1, 0, 1, 0]
 7:  [1, 1, 1, 0]        4:  [0, 0, 1, 0]
 8:  [0, 0, 0, 1]       12:  [0, 0, 1, 1]
 9:  [1, 0, 0, 1]       13:  [1, 0, 1, 1]
10:  [0, 1, 0, 1]       15:  [1, 1, 1, 1]
11:  [1, 1, 0, 1]       14:  [0, 1, 1, 1]
12:  [0, 0, 1, 1]       10:  [0, 1, 0, 1]
13:  [1, 0, 1, 1]       11:  [1, 1, 0, 1]
14:  [0, 1, 1, 1]        9:  [1, 0, 0, 1]
15:  [1, 1, 1, 1]        8:  [0, 0, 0, 1]

Note that in the rightmost column of binary expansions, there is only one bit shift between consecutive values: either a single 1 is added, or removed.

For Ryser's algorithm, we can consider this in terms of our sum of rows: if the subsets are given in Gray code order, then moving from one subset to the next is a matter of just adding or subtracting one row from the current sum.

We are not so much interested in the codes themselves, as in their successive differences. For example here are the differences for \(n=4\):

julia> [xor(k,k>>1)-xor(k-1,(k-1)>>1) for k in 1:15]'
1×14 Adjoint{Int64,Array{Int64,1}}:
 1  2  -1  4  1  -2  -1  8  1  2  -1  -4  1  -2  -1

This sequence is A055975 in the Online Encyclopaedia of Integer Sequences, and can be computed as

\[ a[n] = \begin{cases} n,&\text{if $n\le 2$}\\\
2a[n/2],&\text{if $n$ is even}\\\
(-1)^{(n-1)/2},&\text{if $n$ is odd} \end{cases} \]

We can interpret each term in this sequence as the row that should be added or subtracted from the current sum. A value of \(2^m\) means adding row \(m+1\); a value of \(-2^m\) means subtracting row \(m+1\). From any difference, we can obtain the bit position by taking the logarithm to base 2, and whether to add or subtract by its sign. But in fact the number of differences is very small: only \(2n\), so it will be much easier to create a lookup table, using a dictionary:

v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
lut = Dict(x[1] => (x[2],x[3]) for x in v)

For example, for \(n=3\) the lookup dictionary is

4  => (3, 1)
-4 => (3, -1)
2  => (2, 1)
-2 => (2, -1)
-1 => (1, -1)
1  => (1, 1)

Now we can implement the algorithm by first pre-computing the sequences of differences, and for each element of the sequence, use the lookup dictionary to determine what row is to be added or subtracted. We start with the first row.

Putting all this together gives another implementation of the permanent:

function perm3(M) # Gray code version with lookup tables
    n,nc = size(M)
    if n != nc
	    error("Matrix must be square")
    gd = zeros(Int64,1,2^n-1)
    gd[1] = 1

    v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
    lut = Dict(x[1] => (x[2],x[3]) for x in v)

    r = M[1,:] # r will contain the sum of rows
    s = -1
    pm = s*prod(r)
    for i in (2:2^n-1)
	    if iseven(i)
		    gd[i] = 2*gd[div(i,2)]
		    gd[i] = (-1)^((i-1)/2)
	    r += M[lut[gd[i]][1],:]*lut[gd[i]][2]
	    s *= -1
	    pm += s*prod(r)
    return(pm * (-1)^n)

This can be timed as before:

julia> @time perm3(M)
  0.943328 seconds (3.15 M allocations: 728.004 MiB, 15.91% gc time)

This is our best time yet.

Speeds of Julia and Python


Python is of course one of the world's currently most popular languages, and there are plenty of statistics to show it. Of all languages in current use, Python is one of the oldest (in the very quick time-scale of programming languages) dating from 1990 - only C and its variants are older. However, it seems to keep its eternal youth by being re-invented, and by its constantly increasing libraries. Indeed, one of Python's greatest strength is its libraries, and pretty much every Python user will have worked with numpy, scipy, matplotlib, pandas, to name but four. In fact, aside from some specialized applications (mainly involving security, speed, or memory) Python can be happily used for almost everything.

Julia on the other hand is newer, dating from 2012. (Only Swift is newer.) It was designed to have the speed of C, the power of Matlab, and the ease of use of Python. Note the comparison with Matlab - Julia was designed as a language for technical computing, although it is very much a general purpose language. It can even be used for low-level systems programming.

Like Python, Julia can be extended through packages, of which there are many: according to Julia's package repository there are 2554 at the time of writing. Some of the packages are big, mature, and robust, others are smaller or represent a niche interest. You can go to Julia Observer to get a sense of which packages are the most popular, largest, have the most commits on github, and so on. Because Julia is still relatively new, packages are still being actively developed. However, some such as Plots, JuMP for optimization, Differential Equations, to name but three, are very much ready for the Big Time.

The purpose of this post is to do a single comparison of Julia and Python for speed.

Matrix permanents

Given a square matrix, its determinant is a well-known and useful construct (in spite of Sheldon Axler).

The determinant of an \(n\times n\) matrix \(M=m_{ij}\) can be formally defined as

\[ \det(M)=\sum_{\sigma\in S_n}\left(\text{sgn}(\sigma)\prod_{i=1}^nm_{i,\sigma(i)}\right) \]

where the sum is taken over all permutations \(\sigma\) of \(1,2,\ldots,n\), and where \(\text{sgn}(\sigma)\) is the sign of the permutation; which is defined in terms of the number of digit swaps to get to it: an even number of swaps has a sign of 1, and an odd number a sign of \(-1\). The determinant can be effectively computed by Gaussian elimination of a matrix into triangular form, which takes in the order of \(n^3\) operations; the determinant is then the product of the diagonal elements.

The permanent is defined similarly, except for the sign:

\[ \text{per}(M)=\sum_{\sigma\in S_n}\prod_{i=1}^nm_{i,\sigma(i)}. \]

Remarkably enough, this simple change renders the permanent impossible to be computed effectively; all known algorithms have exponential orders. Computing by expanding each permutation takes \(O(n!n)\) operations, some better algorithms (such as Ryser's algorithm) have order \(O(2^{n-1}n)\).

The permanent has some applications, although not as many as the determinant. An easy and immediate result is that if \(M\) is a matrix consisting entirely of ones, except for the main diagonal of zeros (so that it is the "ones complement" of the identity matrix), its permanent is the number of derangements of \(n\) objects; that is, the number of permutations in which there are no fixed points.

First Python. Here is a simple program, saved as to compute the permanent from its definition:

import itertools as it
import numpy as np

def permanent(m):
    nr,nc = np.shape(m)
    if nr != nc:
	raise ValueError("Matrix must be square")
    pm = 0
    for p in it.permutations(range(nr)):
	pm += np.product([m[i,p[i]] for i in range(nr)])
    return pm

I am not interested in optimizing speed; simply to implement the same algorithm in Python and Julia to see what happens. Now lets run this in a Python REPL (I'm using IPython here):

In [1]: import permanent as pt

In [2]: import numpy as np

In [3]: M = (1 - np.identity(4)).astype(np.intc)

In [4]: pt.permanent(M)
Out[4]: 9

and this is correct. This result was practically instantaneous, but it slows down appreciably, as you'd expect, for larger matrices:

In [5]: from timeit import default_timer as timer

In [6]: M = (1 - np.identity(8)).astype(np.intc)

In [7]: t = timer();print(pt.permanent(M));timer()-t
Out[7]: 0.7398275199811906

In [8]: M = (1 - np.identity(9)).astype(np.intc)

In [9]: t = timer();print(pt.permanent(M));timer()-t
Out[9]: 10.244881154998438

In [10]: M = (1 - np.identity(10)).astype(np.intc)

In [11]: t = timer();print(pt.permanent(M));timer()-t
Out[11]: 86.57762016600464

Now no doubt this could be speeded up in numerous ways, but that is not my point: I am simply implementing the same algorithm in each language. At any rate, my elementary program becomes effectively unusable for matrices bigger than about \(8\times 8\).

Now for Julia. Again, we start with a simple program:

using Combinatorics

function permanent(m)
    nr,nc = size(m)
    if nr != nc
	error("Matrix must be square")
    pm = 0
    for p in permutations(1:nr)
	pm += prod(m[i,p[i]] for i in 1:nr)

You can see this program and the Python one above are, to all intents and purposes, identical. There are no clever optimizing tricks, it is a raw implementation of the basic definition.

First, a quick test:

julia> using LinearAlgebra

julia> M = 1 .- Matrix(1I,4,4);

julia> include("permanent.jl")

julia> permanent(M)

So far, so good. Now for some time trials:

julia> using BenchmarkTools

julia> M = 1 .- Matrix(1I,8,8);

julia> @time permanent(M)
0.020514 seconds (201.61 k allocations: 14.766 MiB)

julia> M = 1 .- Matrix(1I,9,9);

julia> @time permanent(M)
  0.245049 seconds (1.81 M allocations: 143.965 MiB, 33.73% gc time)

julia> M = 1 .- Matrix(1I,10,10);

julia> @time permanent(M)
  1.336724 seconds (18.14 M allocations: 1.406 GiB, 3.20% gc time)

You'll see that Julia, thanks to its JIT compiler, is much much faster than Python. The point is that I didn't have to do anything here to access that speed, it's just a splendid part of the language.

Winner: Julia, by a country mile.

A few words at the end

The timings given above are not absolute - running on a different system or with different versions of Python, Julia, and their libraries, will give different results. But the point is not the exact times taken, but the comparison of time between Julia and Python.

For what it's worth, I'm running a fairly recently upgraded version of Arch Linux on a Lenovo Thinkpad X1 Carbon, generation 3. I'm running Julia 1.3.0 and Python 3.7.4. The machine has 8Gb of memory, of which about 2Gb were free.

Poles of inaccessibility

Just recently there was a news item about a solo explorer being the first Australian to reach the Antarctic "Pole of Inaccessibility". Such a Pole is usually defined as that place on a continent that is furthest from the sea. The South Pole is about 1300km from the nearest open sea, and can be reached by specially fitted aircraft, or by tractors and sleds along the 1600km "South Pole Highway" from McMurdo Base. However, it is only about 500km from the nearest coast line on the Ross Ice Shelf. McMurdo Base is situated on the outside of the Ross Ice Shelf, so that it is accessible from the sea.

The Southern Pole of Inaccessibility is about 870km further inland from the South Pole, and is very hard to reach---indeed the first people there were a Russian party in 1958, whose enduring legacy is a bust of Lenin at that Pole. Unlike at the South Pole, there is no base or habitation there; just a frigid wilderness. The Southern Pole of Inaccessibility is 1300km from the nearest coast.

A pole of inaccessibility on any landmass can be defined as the centre of the largest circle that can be drawn entirely within it. You can see all of these for the world's continents at an ArcGIS site. If you don't want to wait for the images to load, here is Antarctica:

You'll notice that this map of Antarctica is missing the ice shelves, which fill up most of the bays. If the ice shelves are included, then we can draw a larger circle.

As an image processing exercise, I decided to experiment, using the distance transform to measure distances from the coasts, and Julia as the language. Although Julia has now been in development for a decade, it's still a "new kid on the block". But some of its libraries (known as "packages") are remarkably mature and robust. One such is its imaging package Images.

In fact, we need to install and use several packages as well as Images: Colors, FileIO, ImageView, Plots. These can all be added with Pkg.add("packagename") and brought into the namespace with using packagename. We also need an image to use, and for Antarctica I chose this very nice map of the ice surface:

taken from a BBC report. The nice thing about this map is that it shows the ice shelves, so that we can experiment with and without them. We start by reading the image, making it gray-scale, and thresholding it so as to remove the ice-shelves:

julia> ant = load("antarctica.jpg");
julia> G = Gray.(ant);
julia> B = G .> 0.8;
julia> imshow(B)

which produces this:

which as you see has removed the ice shelves. Now we apply the distance transform and find its maximum:

julia> D = distance_transform(feature_transform(B));
julia> findmax(D)
(116.48175822848829, CartesianIndex(214, 350))

This indicates that the largest distance from all edges is about 116, at pixel location (214,350). To show this circle on the image it's easiest to simply plot the image, and then plot the circle on top of it. We'll also plot the centre, as a smaller circle:

julia> plot(ant,aspect_ratio = 1)
julia> x(t) = 214 + 116*cos(t)
julia> x(t) = 350 + 116*sin(t)
julia> xc(t) = 214 + 5*cos(t)
julia> xc(t) = 350 + 5*sin(t)
julia> plot!(y,x,0,2*pi,lw=3,lc="red",leg=false)
julia> plot!(yc,xc,0,2*pi,lw=2,lc="white",leg=false)

This shows the Pole of Inaccessibility in terms of the actual Antarctic continent. However, in practical terms the ice shelves, even though not actually part of the landmass, need to be traversed just the same. To include the ice shelves, we just threshold at a higher value:

julia> B1 = opening(G .> 0.95);
julia> D1 = distance_transform(feature_transform(B1));
julia> findmax(D1)
(160.22796260328596, CartesianIndex(225, 351))

The use of opening in the first line is to fill in any holes in the image: the distance transform is very sensitive to holes. Then we plot the continent again with the two circles as above, but using this new centre and radius. This produces:

The position of the Pole of Inaccessibility has not in fact changed all that much from the first one.

An interesting sum

I am not an analyst, so I find the sums of infinite series quite mysterious. For example, here are three. The first one is the value of \(\zeta(2)\), very well known, sometimes called the "Basel Problem" and first determined by (of course) Euler: \[ \sum_{n=1}^\infty\frac{1}{n^2}=\frac{\pi^2}{6}. \] Second, subtracting one from the denominator: \[ \sum_{n=2}^\infty\frac{1}{n^2-1}=\frac{3}{4} \] This sum is easily demonstrated by partial fractions: \[ \frac{1}{n^2-1}=\frac{1}{2}\left(\frac{1}{n-1}-\frac{1}{n+1}\right) \] and so the series can be expanded as: \[ \frac{1}{2}\left(\frac{1}{1}-\frac{1}{3}+\frac{1}{2}-\frac{1}{4}+\frac{1}{3}-\frac{1}{5}\cdots\right) \] This is a telescoping series in which every term in the brackets is cancelled except for \(1+1/ 2\), which produces the sum immediately.

Finally, add one to the denominator: \[ \sum_{n=2}^\infty\frac{1}{n^2+1}=\frac{1}{2}(\pi\coth(\pi)-1). \] And this sum is obtained from one of the series representations for \(\coth(z)\):

\[ \coth(z)=\frac{1}{z}+2z\sum_{n=1}^\infty\frac{1}{\pi^2n^2+z^2} \]

(for all \(z\) except for when \(\pi^2n^2+z^2=0\)).

I was looking around for infinite series to give my numerical methods students to test their powers of approximation, and I came across [this beauty](^2%2Bn-1)%2C+n%3D1+to+infinity): \[ \sum_{n=2}^\infty\frac{1}{n^2+n-1}=1+\frac{\pi}{\sqrt{5}}\tan\left(\frac{\sqrt{5}\pi}{2}\right). \] This led me on a mathematical treasure hunt through books and all over the internet, until I had worked it out.

My starting place, after googling "sum quadratic reciprocal" was a very nice and detailed post on stackexchange. This post then referred to a previous one started with the infinite product expression for \(\sin(x)\) and turned it (by taking logarithms and differentiating) into a series for \(\cot(x)\).

However, I want an expression for \(\tan(x)\), which means starting with the infinite product form for \(\sec(x)\), which is:

\[ \sec(x)=\prod_{n=1}^\infty\frac{\pi^2(2n-1)^2}{\pi^2(2n-1)^2-4x^2}. \] Making a substitution simplifies the expression in the product: \[ \sec\left(\frac{\pi x}{2}\right)=\prod_{n=1}^\infty\frac{(2n-1)^2}{(2n-1)^2-x^2}. \] Now take logs of both sides:

\[ \log\left(\sec\left(\frac{\pi x}{2}\right)\right)= \sum_{n=1}^\infty\log\left(\frac{(2n-1)^2}{(2n-1)^2-x^2}\right) \]

and differentiate: \[ \frac{\pi}{2}\tan\left(\frac{\pi x}{2}\right)= \sum_{n=1}^\infty\frac{2x}{(2n-1)^2-x^2}. \] Now we have to somehow equate this new sum on the right with our original sum. So let's go back to it.

First of all, a bit of completing the square produces \[ \frac{1}{n^2+n-1}=\frac{1}{\left(n+\frac{1}{2}\right)^2-\frac{5}{4}}=\frac{4}{(2n+1)^2-5}. \] This means that \[ \sum_{n=1}^\infty\frac{1}{n^2+n-1}=\sum_{n=2}^\infty\frac{4}{(2n-1)^2-5}= \frac{2}{\sqrt{5}}\sum_{n=2}^\infty\frac{2\sqrt{5}}{(2n-1)^2-5}. \] We have changed the index from \(n=1\) to \(n=2\) which allows the rewriting of \(2n+1\) as \(2n-1\). This means we are missing a first term. Comparing the final sum with that for \(\tan(x)\) above, we have \[ \sum_{n=1}^\infty\frac{1}{n^2+n-1}=\frac{2}{\sqrt{5}}\left(\frac{\pi}{2}\tan\left(\frac{\pi \sqrt{5}}{2}\right)-\frac{-\sqrt{5}}{2}\right) \] where the last term is the missing first term: the summand for \(n=1\). Simplifying the right hand side produces \[ \sum_{n=1}^\infty\frac{1}{n^2+n-1}=1+\frac{\pi}{\sqrt{5}}\tan\left(\frac{\sqrt{5}\pi}{2}\right). \] Note that the above series for \(\tan(x)\) can be obtained directly, using a general technique discussed (for example) in that fine old text: "A Course in Modern Analysis", by E. T. Whittaker and G. N. Watson. If \(f(x)\) has only simple poles \(a_n\) with residues \(b_n\), then \[ f(x) = f(0)+\sum_{n=1}^\infty\left(\frac{1}{x-a_n}+\frac{1}{a_n}\right). \] Expressing a function as a series of such reciprocals is known as Mittag-Leffler's theorem and in fact the series for \(\tan(x)\) is given there as one of the examples.

Runge's phenomenon in Geogebra

Runge's phenomenon says roughly that a polynomial through equally spaced points over an interval will wobble a lot near the ends. Runge demonstrated this by fitting polynomials through equally spaced point in the interval \([-1,1]\) on the function \[ \frac{1}{1+25x^2} \] and this function is now known as "Runge's function".

It turns out that Geogebra can illustrate this extremely well.

Equally spaced vertices

Either open up your local version of Geogebra, or go to In the boxes on the left, enter the following expressions in turn:

  1. Start by entering Runge's function \[ f(x)=\frac{1}{1+25x^2} \] You should now either zoom in, or use the graph settings tool to display \(x\) between \(-1.5\) and \(1.5\).
  2. Create a list of \(x\) values: \[ x1 = \frac{\{-5..5\}}{5} \]
  3. Use those values to create a set of points on the curve: \[ p1 = (x1,f(x1)) \]
  4. Now create an interpolating polynomial through them: \[ \mathsf{Polynomial}[p1(1)] \]

The resulting graph looks like this:

Chebyshev vertices

For the purpose of this post, we'll take the Chebyshev vertices to be those points in the graph whose \(x\) coordinates are given by

\[ x_k = \cos\left(\frac{k\pi}{10}\right) \] for \(k = 0,1,2,\ldots 10\). These values are more clustered at the ends of the interval.

In Geogebra:

  1. As before, enter the \(x\) values first: \[ x2 = \cos(\frac{\{0..10\}\cdot\pi}{10}) \]
  2. Then turn them into a sequence of points on the curve \[ p2 = (x2,f(x2)) \]
  3. Finally create the polynomial through them: \(\mathsf{Polynomial}(p2(1))\).

And this graph looks like this:

You'll notice how better the second polynomial hugs the curve. The issue is even more pronounced with 21 points, either separated by \(0.1\), or with \(x\) values given by the cosine function again. All we need do is to change the definitions of the \(x\) value sequences \(x1\) and \(x2\) to:

\[ \eqalign{ x1 &= \frac{\{-10..10\}}{10}\\\
x2 &= \cos(\frac{\{0..20\}\pi}{20}) } \]

In fact, you can create a slider \(1\le N \le 20\), say, and then define

\[ \eqalign{ x1 &= \frac{\{-N..N\}}{N}\\\
x2 &= \cos(\frac{\{0..2N\}\pi}{2N}) } \]

and then see how as \(N\) increases, the "Chebyshev" interpolant fits the curve better than the equally spaced interpolant. For \(N=20\), the turning points of the equally spaced polynomial have \(y\) values as high as \(59.78\).


Using equally spaced values to create an interpolating polynomial and then integrating that polynomial is Newton-Cotes integration. Runge's phenomenon shows why it is better to partition the interval into small sub-intervals and apply a low-order rule to each one. For example, with 20 points on the curve, we would be better applying Simpson's rule to each pair of two sub-intervals, and adding the result. Using a 21-point polynomial is equivalent to a Newton-Cotes rule of order 20, which is far too inaccurate to use.

With our curve \(f(x)\), and our equal-spaced polynomial \(g(x)\), the integrals are

\[ \eqalign{ \int^1_{-1}\frac{1}{1+25x^2},dx&=\frac{2}{5}\arctan(5)\approx 0.5493603067780064\\\
\int^1_{-1}g(x),dx&\approx -5.369910417304622 } \]

However, using the polynomial through the Chebyshev nodes:

\[ \int^1_{-1}h(x)\approx 0.5498082303389538. \]

The absolute errors between the integral values and the exact values are thus (approximately) \(5.92\) and \(0.00045\) respectively.

Integrating an interpolating polynomial through Chebyshev nodes is one way of implementing Clenshaw-Curtis quadrature.

Note that using Simpson's rule on our 21 points produces a value of 0.5485816035037206, which has absolute error of about \(0.0012\).

Fitting the SIR model of disease to data in Python

Introduction and the problem

The SIR model for spread of disease was first proposed in 1927 in a collection of three articles in the Proceedings of the Royal Society by Anderson Gray McKendrick and William Ogilvy Kermack; the resulting theory is known as Kermack–McKendrick theory; now considered a subclass of a more general theory known as compartmental models in epidemiology. The three original articles were republished in 1991, in a special issue of the Bulletin of Mathematical Biology.

The SIR model is so named because it assumes a static population, so no births or deaths, divided into three mutually exclusive classes: those susceptible to the disease; those infected with the disease, and those recovered with immunity from the disease. This model is clearly not applicable to all possible epidemics: there may be births and deaths, people may be re-infected, and so on. More complex models take these and other factors into account.

The SIR model consists of three non-linear ordinary differential equations, parameterized by two growth factors \(\beta\) and \(\gamma\):

\begin{eqnarray*} \frac{dS}{dt}&=&-\frac{\beta IS}{N}\\\
\frac{dI}{dt}&=&\frac{\beta IS}{N}-\gamma I\\\
\frac{dR}{dt}&=&\gamma I \end{eqnarray*}

Here \(N\) is the population, and since each of \(S\), \(I\) and \(R\) represent the number of people in mutually exclusive sets, we should have \(S+I+R=N\). Note that the right hand sides of the equations sum to zero, hence \[ \frac{dS}{dt}+\frac{dI}{dt}+\frac{dR}{dt}=\frac{dN}{dt}=0 \] which indicates that the population is constant.

The problem here is to see if values of \(\beta\) and \(\gamma\) can be found which will provide a close fit to a well-known epidemiological case study: that of influenza in a British boarding school. This was described in a 1978 issue of the British Medical Journal.

This should provide a good test of the SIR model, as it satisfies all of the criteria. In this case there was a total population of 763, and an outbreak of 14 days, with infected numbers as:

Day: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Infections: 1 3 6 25 73 222 294 258 237 191 125 69 27 11 4

Some years ago, on my old (and now taken off line) blog, I explored using Julia for this task, which was managed easily (once I got the hang of Julia syntax). However, that was five years ago, and when I tried to recreate it I found that Julia has changed over the last five years, and my original comments and code no longer worked. So I decided to experiment with Python instead, in which I have more expertise, or at least, experience.

Using Python

  • Setup and some initial computation
We start by importing some of the modules and functions we need, and define the data
from the table above:

import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import solve_ivp

data = [1, 3, 6, 25, 73, 222, 294, 258, 237, 191, 125, 69, 27, 11, 4]

Now we enter the SIR system with some (randomly chosen) values of \\(\beta\\) and
\\(\gamma\\), using syntax conformable with the solver `solve_ivp`:

beta,gamma = [0.01,0.1]

def SIR(t,y):
    S = y[0]
    I = y[1]
    R = y[2]
    return([-beta*S*I, beta*S*I-gamma*I, gamma*I])

We can now solve this system using `solve\_ivp` and plot the results:

sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))

fig = plt.figure(figsize=(12,4))
plt.legend(["Susceptible","Infected","Removed","Original Data"])

Note that we have used the `t_eval` argument in our call to `solve_ivp` which
allows us to exactly specify the points at which the solution will be given.
This will allow us to align points in the computed values of \\(I\\) with the
original data.

The output is this plot:

\![First SIR plot](/first\_sir\_plot.png)

There is a nicer plot, with more attention paid to setup and colours,
This use of scipy to solve the SIR equations uses the `odeint` tool.  This
works perfectly well, but I believe it is being deprecated in favour of
  • Fitting the data
To find values \\(\beta\\) and \\(\gamma\\) with a better fit to the data, we start by
defining a function which gives the sum of squared differences between the data
points, and the corresponding values of \\(I\\).  The problem will then be minimize
that function.

def sumsq(p):
    beta, gamma = p
    def SIR(t,y):
    S = y[0]
    I = y[1]
    R = y[2]
    return([-beta*S*I, beta*S*I-gamma*I, gamma*I])
    sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))

As you see, we have just wrapped the SIR definition and its solution inside a
calling function whose variables are the parameters of the SIR equations.

To minimize this function we need to import another python function first:

from scipy.optimize import minimize

msol = minimize(sumsq,[0.001,1],method='Nelder-Mead')

with output:

array([0.00218035, 0.44553886])

To see if this does provide a better fit, we can simply run the solver with
these values, and plot them, as we did at the beginning:

beta,gamma = msol.x
def SIR(t,y):
    S = y[0]
    I = y[1]
    R = y[2]
    return([-beta*S*I, beta*S*I-gamma*I, gamma*I])

sol = solve_ivp(SIR,[0,14],[762,1,0],t_eval=np.arange(0,14.2,0.2))

fig = plt.figure(figsize=(10,4))
plt.legend(["Susceptible","Infected","Removed","Original Data"])

with output:

\![Second SIR plot](/second\_sir\_plot.png)

and as you see, a remarkably close fit!

Mapping voting gains between elections

So this goes back quite some time to the recent Australian Federal election on May 18. In my own electorate (known formally as a "Division") of Cooper, the Greens, who until recently had been showing signs of winning the seat, were pretty well trounced by Labor.

  • Some background asides
First, "Labor" as in "Australian Labor Party" is spelled the
American way; that is, without a "u", even though "labour" meaning work, is so
spelled in Australian English.  This is because much of Australia's pre-federal
political history has a large American influence; indeed one of the loudest
political voices in the 19th century was [King
O'Malley](<>) who was born in 1858 in
Kansas, and didn't come to Australia until he was about 30. He was responsible,
amongst other things, for selecting the site for Canberra (the nation's capital)
and in selecting Walter Burley Griffin as its architect.  As well, the
Australian Constitution shows a large American influence; the constitutions of
both countries bear a remarkably close resemblance, and Australia's parliament
is modelled on that of the United States Congress.

Second, my electorate was formerly known as "Batman" after [John
Batman](<>), the supposed founder of
Melbourne.  However, Batman was known in his lifetime as a contemptible figure,
and the historical record well bears out the description of him as "the vilest
man I have ever known".  He was responsible for the slaughter of indigenous
peoples, and his so called "treaty" with the people of the Kulin Nation in
exchange for land on what would would become Melbourne is considered invalid.
In respect for the local peoples ("who have never ceded sovereignty"), the
electorate was renamed last year in honour of
[William Cooper](<>), a
Yorta Yorta elder, a tireless activist for Aboriginal rights, and the only
individual in the world to lodge a formal protest to a German embassy on the
occasion of Kristallnacht.
  • Back to mapping
All I wanted to do was to map the size of Labor's gains (over the Greens)
between the last election in 2016 and this one, at
each polling booth in the electorate.  For this I used the following Python
packages: [matplotlib](<>),
[pandas](<>), [geopandas](<>),
[cartopy](<>). The nice thing about
Python, for me at least, is the ease of prototyping, and the availability of
packages for just about everything.  Indeed, for an amateur programmer like
myself, one of the biggest difficulties is finding the right package for the
job.  There's a score or more for GIS alone.

All information about the election can be downloaded from the [Australian
Electorial Commission
tallyroom](<>).  And the GIS
information can be obtained also from the
[AES](<>).  The
Victorian [shapefile](<>) needs to be
unzipped before using.

Then the map set-up looks like this:

shp = gpd.read_file('VicMaps/E_AUGFN3_region.shp')
cooper = shp.loc[shp['Elect_div']=='Cooper']
bounds = cooper.geometry.bounds
bg = bounds.values[0]  # latitude, longitude of extent of region
pad = 0.01             # padding for display
extent = [bg[0]-pad,bg[2]+pad,bg[1]-pad,bg[3]+pad]

The idea of the padding is simply to ensure that the map, once displayed,
extends beyond the area of the electorate.  The units are degrees of latitude
and longitude.  In Melbourne, at about 37.8 degrees south, 0.01 degrees latitude
corresponds to about 1.11km, and 0.01 degrees longitude corresponds to about 0.88km.

Now we need to determine the percentage of votes to Labor (on a two-party
preferred computation) which again involves reading material from the AEC site.
I downloaded it first, but it could also be read directly from the site into

# Get all votes by polling booths in Cooper for 2019
b19 = pd.read_csv('Elections/TCPByCandidateByPollingPlaceVIC-2019.csv')  # b19 = booths, 2019
v19 = b19.loc[b19.DivisionNm=='Cooper'] # v19 = Cooper Booths, 2019
v19 = v19[['PollingPlace','PartyAb','OrdinaryVotes']]
v19r = v19.loc[v19.PartyAb=='GRN']
v19k = v19.loc[v19.PartyAb=='ALP']
v19c = v19r.merge(v19k,left_on='PollingPlace',right_on='PollingPlace')  # Complete votes
v19c['Percent_x'] = (v19c['OrdinaryVotes_x']*100/(v19c['OrdinaryVotes_x']+v19c['OrdinaryVotes_y'])).round(2)
v19c['Percent_y'] = (v19c['OrdinaryVotes_y']*100/(v19c['OrdinaryVotes_x']+v19c['OrdinaryVotes_y'])).round(2)
v19c = v19c.dropna()
# note: suffix -x is GRN; suffix _y is ALP

The next step is to determine the positions of all polling places.  For
simplification, I'm only interested in places used in both this most recent
election, and the previous federal election in 2016:

v16 = pd.read_csv('Elections/Batman2016_TCP.csv')
v_both = v16.merge(v19c,right_on='PollingPlace',left_on='Booth')
c19 = pd.read_csv('Elections/Cooper_PollingPlaces.csv')
c19 = c19.drop(44).reset_index()  # Drop Special Hospital, which has index 44
lats = np.array(v_all['Latitude'])
longs = np.array(v_all['Longitude'])
booths = np.array(v_all['PollingPlaceNm'])
diffs = np.array(v_all['Percent_y']-v_all['ALP percent']) # change in ALP percentage

Having now got the boundary of the electorate, the positions of each polling
booth, the percentage change in votes for Labor, the map can now be created and

fig = plt.figure(figsize = (16,16))
tiler = GoogleTiles()
ax = plt.axes(
ax.add_image(tiler, 13, interpolation='hanning')
for i in range(34):
    if diffs[i]>0:

I found by trial and error that Hanning interpolation seemed to give the best

\![Swings 2019](/swings.jpg)

So this image shows not the size of Labor's _vote_, but the size of Labor's
_gain_ since the previous election.  The larger gains are in the southern part of
the electorate: the northern part has always been more Labor friendly, and so the
gains were smaller there.

Educational disciplines: size against market growth

Here is an interactive version of this diagram:

[!APF Figure 4](/APF_figure_4.png)

(click on the image to show a larger version.)

Tschirnhausen transformations and the quartic

Here we show how a Tschirnhausen transformation can be used to solve a quartic equation. The steps are:

  1. Ensure the quartic is missing the cubic term, and its initial coefficient is 1. We can do this by first dividing by the initial coefficient to obtain an equation \[ x^4+b_3x^3+b_2x^2+b_1x+b_0=0 \] and then replace the variable \(x\) with \(y=x-b_3/ 4\). This will produce a monic quartic equation missing the cubic term.

    We can thus take \[ x^4+bx^2+cx+d=0 \] as a completely general formulation of the quartic equation.

  2. Now apply the Tschirnhausen transformation \[ y = x^2+rx+s \] using the resultant, which will produce a quartic equation in \(y\).

  3. Chose \(r\) and \(s\) so that the coefficients of the linear and cubic terms are zero. This will require solving a cubic equation.

  4. Substitute those \(r,s\) values into the resultant, which will produce a biquadratic equation in \(y\): \[ y^4+Ay^2+B=0. \] This can be readily solved as it's a quadratic in \(y^2\). Finally, for each value of \(y\), and using the \(r,s\) values, solve \[ x^2+rx+s-y=0. \] This will in fact produce eight values, of which four are the solution to the original quartic.

An example

Consider the equation \[ x^4-94x^2-480x-671=0 \] which has solutions

\begin{align*} x &= -2 , \sqrt{5} \pm \sqrt{3} \sqrt{9 -4, \sqrt{5}},\\\
x &= 2 , \sqrt{5} \pm \sqrt{3} \sqrt{9 + 4 , \sqrt{5}} \end{align*}

Note that these relatively nice solutions arise from the polynomial being factorizable in the number field \(\mathbb{Q}[\sqrt{5}]\). We can show this using Sagemath:

N.<a> = NumberField(x^2-5)
K.<x> = N[]
factor(x^4 - 94*x^2 - 480*x - 671)

\[ (x^{2} - 4 a x - 12 a - 7) \cdot (x^{2} + 4 a x + 12 a - 7) \]

We shall continue to use Sagemath to perform all the dirty work; here's how this solution works:

qx = x^4 - 94*x^2 - 480*x - 671
res = qx.resultant(y-x^2-r*x-s,x).poly(y)

We now want to find the values of \(r\) and \(s\) to eliminate the linear and cubic terms. The cubic term is easy:


\[ -4s-188 \] and so

s_sol = solve(res.coefficient(y,3),s,solution_dict=True)

and we can substitute this into the linear coefficient:


\[ -480(r+10)(r+8)(r+6). \] In general the coefficient would not be as neatly factorizable as this, but we can still find the values of \(r\):

r_sol = solve(res.coefficient(y,1).subs(s_sol[0]),r,solution_dict=True)

We can choose any value we like; here let's choose the first value and substitute it into the resultant from above, first creating a dictionary to hold the \(r\) and \(s\) values:

rs = s_sol[0].copy()

\[ \lbrace s:-47,r:-8\rbrace \]


\[ y^4-256y^2+1024 \]

y_sol = solve(res.subs(rs),y,solution_dict=True)

This will produce four values of \(y\), and for each one we solve the equation \[ x^2+rx+s-y=0 \] for \(x\):

for ys in ysol:

\begin{align*} x &= -\sqrt{-4 , \sqrt{2 , \sqrt{15} + 8} + 63} + 4,& x &= \sqrt{-4 , \sqrt{2 , \sqrt{15} + 8} + 63} + 4\\\
x &= -\sqrt{4 , \sqrt{2 , \sqrt{15} + 8} + 63} + 4,& x& = \sqrt{4 ,\sqrt{2 , \sqrt{15} + 8} + 63} + 4\\\
x& = -\sqrt{-4 , \sqrt{-2 , \sqrt{15} + 8} + 63} + 4,& x& = \sqrt{-4 ,\sqrt{-2 , \sqrt{15} + 8} + 63} + 4\\\
x& = -\sqrt{4 , \sqrt{-2 , \sqrt{15} + 8} + 63} + 4,& x& = \sqrt{4 , \sqrt{-2 , \sqrt{15} + 8} + 63} + 4 \end{align*}

We can check these values to see which ones are actually correct. But to experiment, we can determine the minimal polynomial of each value given:

for ys in ysol:
    s1 = solve((x^2+r*x+s-y).subs(rs).subs(ys),x,solution_dict=True)
    ql = [QQbar(z[x]).minpoly() for z in s1]

\begin{align*} &x^{4} - 94 x^{2} - 480 x - 671,&& x^{4} - 32 x^{3} + 290 x^{2} - 64 x - 6431&\\\
&x^{4} - 94 x^{2} - 480 x - 671,&& x^{4} - 32 x^{3} + 290 x^{2} - 64 x - 6431&\\\
&x^{4} - 32 x^{3} + 290 x^{2} - 64 x - 6431,&& x^{4} - 94 x^{2} - 480 x - 671&\\\
&x^{4} - 94 x^{2} - 480 x - 671,&& x^{4} - 32 x^{3} + 290 x^{2} - 64 x - 6431& \end{align*}

Half of these are the original equation we tried to solve. And the others?


\[ x^{4} - 32 x^{3} + 290 x^{2} - 64 x - 6431 \] This is in fact what we should expect, from solving the equation \[ x^2+rx+s-y=0 \] If the roots are \(x_1\) and \(x_2\), then by Vieta's formulas \(x_1+x_2=-(-r)=r\).

Further comments

The trouble with this method is that it only works nicely on some equations. In general, the snarls of square, cube, and fourth roots become unwieldy very quickly. For example, consider the equation \[ x^4+6x^2-60x+36=0 \] which according to Cardan in Ars Magna (Chapter XXXIX, Problem V) was first solved by Ferrari.

Taking the resultant with \(y-x^2-rx-s\) as a polynomial on \(y\), we find that the coefficient of \(y^3\) is \(-4s+12\), and so \(s=3\). Substituting this in the linear coefficient, we obtain this cubic in \(r\): \[ 5r^3-9r^2-60r+300=0. \] The simplest (real) solution is: \[ r = \frac{1}{5} , {\left(50 , \sqrt{3767} - 3273\right)}^{\frac{1}{3}} + \frac{109}{5 , {\left(50 , \sqrt{3767} - 3273\right)}^{\frac{1}{3}}} + \frac{3}{5} \] Substituting these values of \(r\) and \(s\) into the resultant, we obtain the equation \[ y^4+c_2y^2+c_0=0 \] with \[ c_2=\frac{6 , {\left({\left(50 , \sqrt{3767} - 3273\right)}^{\frac{2}{3}} {\left(50 , \sqrt{3767} - 18969\right)} - {\left(7200 , \sqrt{3767} - 483193\right)} {\left(50 , \sqrt{3767} - 3273\right)}^{\frac{1}{3}} + 100 , \sqrt{3767} - 6546\right)}}{25 , {\left(50 , \sqrt{3767} - 3273\right)}} \] and \[ c_0=\frac{27\left(% \begin{array}{l} 2,(14353657451700 , \sqrt{3767} - 880869586608887), (50 , \sqrt{3767} - 3273)^{\frac{2}{3}}\\\
\quad+109541 , (2077754350 , \sqrt{3767} - 127532539917) (50 , \sqrt{3767} - 3273)^{\frac{1}{3}}\quad{}\\\
\hspace{18ex} - 5262543802068000 , \sqrt{3767} + 322980491997672634 \end{array}% \right)} {625 , {\left(2077754350 , \sqrt{3767} - 127532539917\right)} {\left(50 , \sqrt{3767} - 3273\right)}^{\frac{1}{3}}} \] Impressed?

Right, so we solve the equation for \(y\), to obtain \[ y=\pm\sqrt{-\frac{1}{2}c_2\pm\frac{1}{2}\sqrt{c_2^2-4c_0}}. \] For each of those values of \(y\), we solve the equation \[ x^2+rx+s-y=0 \] to obtain (for example) \[ x= = -\frac{1}{2} , r \pm \frac{1}{2} , \sqrt{r^{2} - 4 , s + 4 , \sqrt{-\frac{1}{2} , c_{2} + \frac{1}{2} , \sqrt{c_{2}^{2} - 4 , c_{0}}}} \] With \(r\) being the solution of a cubic equation, and \(c_0\), \(c_2\) being the appalling expressions above, you can see that this solution, while "true" in a logical sense, is hardly useful or enlightening.

Cardan again: "So progresses arithmetic subtlety, the end of which, it is said, is as refined as it is useless."

Tschirnhausen's solution of the cubic

A general cubic polynomial has the form \[ ax^3+bx^2+cx+d \] but a general cubic equation can have the form \[ x^3+ax^2+bx+c=0. \] We can always divide through by the coefficient of \(x^3\) (assuming it to be non-zero) to obtain a monic equation; that is, with leading coefficient of 1. We can now remove the \(x^2\) term by replacing \(x\) with \(y-a/3\): \[ \left(y-\frac{a}{3}\right)^{\negmedspace 3}+a\left(y-\frac{a}{3}\right)^{\negmedspace 2} +b\left(y-\frac{a}{3}\right)+c=0. \] Expanding and simplifying produces \[ y^3+\left(b-\frac{a^2}{3}\right)y+\frac{2}{27}a^3-\frac{1}{3}ab+c=0. \] In fact this can be simplified by writing the initial equation as \[ x^3+3ax^2+bx+c=0 \] and then substituting \(x=y-a\) to obtain \[ y^3+(b-3a^2)y+(2a^3-ab+c)=0. \] This means that in fact an equation of the form \[ y^3+Ay+B=0 \] is a completely general form of the cubic equation. Such a form of a cubic equation, missing the quadratic term, is known as a depressed cubic.

We could go even further by substituting \[ y=z\sqrt{A} \] to obtain

\[ A^{3/ 2}z^3+A\sqrt{A}z+B=0 \]

and dividing through by \(A^{3/ 2}\) to produce

\[ z^3+z+BA^{-3/ 2}=0. \]

This means that

\[ z^3+z+W=0 \]

is also a perfectly general form for the cubic equation.

Cardan's method

Although this is named for Gerolamo Cardano (1501-1576), the method was in fact discovered by Niccolò Fontana (1500-1557), known as Tartaglia ("the stammerer") on account of a injury obtained when a soldier slashed his face when he was a boy. In the days before peer review and formal dissemination of ideas, any new mathematics was closely guarded: mathematicians would have public tests of skill, and a new solution method was invaluable. After assuring Tartaglia that his new method was safe with him, Cardan then proceeded to publish it as his own in his magisterial Ars Magna in 1545. A fascinating account of the mix of Cardan, Tartaglia, and several other egotistic mathematicians of the time, can be read here.

Cardan's method solves the equation \[ x^3-3ax-2b=0 \] noting from above that this is a perfectly general form for the cubic, and where we have introduced factors of \(-3\) and \(-2\) to eliminate fractions later on. We start by assuming that the solution will have the form \[ x=p^{1/ 3}+q^{1/ 3} \] and so \[ x^3=(p^{1/ 3}+q^{1/ 3})^3=p+3p^{2/ 3}q^{1/ 3}+3p^{1/ 3}q^{2/ 3}+q. \] This last can be written as \[ p+q+3p^{1/ 3}q^{1/ 3}(p^{1/ 3}+q^{1/ 3}). \] We can thus write \[ x^3=3p^{1/ 3}q^{1/ 3}x+p+q \] and comparing with the initial cubic equation we have \[ 3p^{1/ 3}q^{1/ 3}=3a,\quad p+q=2b. \] These can be written as \[ pq=a^3,\quad p+q=2b \] for which the solutions are \[ p,q=b\pm\sqrt{b^2-a^3} \] and so \[ x = (b+\sqrt{b^2-a^3})^{1/ 3}+(b-\sqrt{b^2-a^3})^{1/ 3}. \] This can be written in various different ways.

For example, \[ x^3-6x-6=0 \] for which \(a=2\) and \(b=3\). Here \(b^2-a^3=1\) and so one solution is \[ x=4^{1/ 3}+2^{1/ 3}. \] Given that a cubic must have three solutions, the other two are \[ \omega p^{1/ 3}+\omega^2 q^{1/ 3},\quad \omega^2 p^{1/ 3}+\omega q^{1/ 3} \] where \(\omega\) is a cube root of 1, for example \[ \omega=\frac{1}{2}+i\frac{\sqrt{3}}{2}. \]

And so to Tschirnhausen

At the beginning we eliminated the \(x^2\) terms from a cubic equation by a linear substitution \(x=y-a/3\) or \(y=x+a/3\). Writing in the year 1680, the German mathematician Ehrenfried Walther von Tschirnhausen (1651-1708) began experimenting with more general polynomial substitutions, believing that it would be possible to eliminate other terms at the same time. Such substitutions are now known as Tschirnhausen transformations and of course the modern general approach places them squarely within field theory.

Tschirnhausen was only partially correct: it is indeed possible to remove some terms from a polynomial equation, and in 1858 the English mathematician George Jerrard (1804-1863) showed that it was possible to remove the terms of degree \(n-1\), \(n-2\) and \(n-3\) from a polynomial of degree \(n\). In particular, the general quintic equation can be reduced to \[ x^5+px+q=0 \] which is known as the Bring-Jerrard form; also honouring Jerrard's predecessor, the Swedish mathematician Erland Bring (1736-1798). Note that Jerrard was quite well aware of the work of Ruffini, Abel and Galois in proving the general unsolvability by radicals of the quintic equation.

Neither Bring nor Tschirnhausen had the advantage of this knowledge, and both were working towards a general solution of the quintic.

Happily, Tschirnhausen's work is available in an English translation, published in the ACM SIGSAM Bulletin by R. F. Green in 2003. For further delight, Jerrard's text, with the splendidly formal English title "An Essay on the Resolution of Equations", is also available online.

After that history lesson, let's explore how to remove both the quadratic and linear terms from a cubic equation using Tschirnhausen's method, and also using SageMath to do the heavy algebraic work. There is in fact nothing particularly conceptually difficult, but the algebra is quite messy and fiddly.

We start with a depressed cubic equation \[ x^3+3ax+2b=0 \] and we will use the Tschirnhausen transformation \[ y=x^2+rx+s. \]

This can be done by hand of course, using a somewhat fiddly argument, but for us the best approach is to compute the resultant of the two polynomials, which is a polynomial expression equal to zero if the two polynomials have a common root. The resultant can be computed as the determinant of the Sylvester matrix (named for its discoverer); but we can simply use SageMath:

cb = x^3 + 3*a*x + 2*b
res = cb.resultant(y-x^2-r*x-s,x).poly(y)

\[ \displaylines{ y^3+3(2a-s)y^{2}+3(ar^{2}+3a^{2}+2r-4as+s^{2})y\\\
{\ }\mspace4em -4b^{2}+2br^{3}-3ar^{2}s+6abr-9a^{2}s-6brs+6as^{2}-s^{3} } \]

Now we find values of \(r\) and \(s\) for which the coefficients of \(y^2\) and \(y\) will be zero:

sol = solve([res.coefficient(y,1),res.coefficient(y,2)],[r,s],solution_dict=True)

\[ \left[\left\lbrace s : 2 , a, r : -\frac{b + \sqrt{a^{3} + b^{2}}}{a}\right\rbrace, \left\lbrace s : 2 , a, r : -\frac{b - \sqrt{a^{3} + b^{2}}}{a}\right\rbrace\right] \]

We can now substitute say the second solution into the resultant from above, which should produce an expression of the form \(y^3+A\):

cby = res.subs(sol[1]).canonicalize_radical().poly(y)

\[ y^3-8 , a^{3} - 16 , b^{2} + 8 , \sqrt{a^{3} + b^{2}} b - \frac{8 , b^{4}}{a^{3}} + \frac{8 , \sqrt{a^{3} + b^{2}} b^{3}}{a^{3}} \] We can simply take the cube root of the constant term as our solution:

sol_y = solve(cby,y,solution_dict=True)

\[ \left\lbrace y : \frac{2 , {\left(a^{6} + 2 , a^{3} b^{2} - \sqrt{a^{3} + b^{2}} a^{3} b + b^{4} - \sqrt{a^{3} + b^{2}} b^{3}\right)}^{\frac{1}{3}}}{a}\right\rbrace \]

Now we solve the equation \(y=x^2+rx+s\) using the values \(r\) and \(s\) from above, and the value of \(y\) just obtained:

eq = x^2+r*x+s-y
eqrs = eq.subs(sol[1])
eqx = eqrs.subs(sol_y[2])
solx =  solve(eqx,x,solution_dict=True)

\[ \left\lbrace x : \frac{b - \sqrt{a^{3} + b^{2}} - \sqrt{-7 , a^{3} + 2 , b^{2} - 2 , \sqrt{a^{3} + b^{2}} b + 8 , {\left(a^{6} + 2 , a^{3} b^{2} + b^{4} - {\left(a^{3} b + b^{3}\right)} \sqrt{a^{3} + b^{2}}\right)}^{\frac{1}{3}} a}}{2 , a}\right\rbrace \]

A equation, of err... rare beauty, or if not beauty, then something else. It certainly lacks the elegant simplicity of Cardan's solution. On the other hand, the method can be applied to quartic (and quintic) equations, which Cardan's solution can't.

Finally, let's test this formula, again on the equation \(x^3-6x-6=0\), for which \(a=-2\) and \(b=-3\):

xs = solx[0][x].subs({a:-2, b:-3})

\[ \frac{1}{4} , \sqrt{-16 \cdot 4^{\frac{1}{3}} + 80} + 1 \]

This can clearly be simplified to

\[ 1+\sqrt{5-4^{1/ 3}} \] It certainly looks different from Cardan's result, but watch this:

xt = QQbar(xs)

\[ \frac{1}{2}4^{2/ 3}+4^{1/ 3} \]

which is Cardan's result, only very slightly rewritten. And finally:


\[ x^3-6x-6 \]

Colonial massacres, 1794 to 1928

The date January 26 is one of immense current debate in Australia. Officially it's the date of Australia Day, which supposedly celebrates the founding of Australia. To Aboriginal peoples it is a day of deep mourning and sadness, as the date commemorates over two centuries of oppression, bloodshed, and dispossession. To them and their many supporters, January 26 is Invasion Day.

The date commemorates the landing in 1788 of Arthur Phillip, in charge of the First Fleet and the first Governor of the colony of New South Wales.

The trouble is that "Australia" means two things: the island continent, and the country. The country didn't exist until Federation on January 1, 1901; before which time the land since 1788 was subdivided into independent colonies. Many people believe that Australia Day would be better moved to January 1; the trouble with that is that it's already a public holiday, and apparently you can't have a national day that doesn't have its own public holiday. And many other dates have been proposed.

My own preferred date is June 3; this is the date of the High Court "Mabo" decision in 1992 which formally recognized native title and rejected the doctrine of terra nullius under which the British invaded.

That the continent was invaded rather than settled is well established: all serious historians take this view, and it can be backed up with legal arguments. The Aboriginal peoples, numbering maybe close to a million in 1788, had mastered the difficult continent and all of its many ecosystems, and had done so for around 80,000 years. Aboriginal culture is the oldest continually maintained culture on the planet, and by an enormous margin.

Arthur Phillip did in fact arrive with formal instructions to create "amity" with the "natives" and indeed to live in "kindness" with them, but this soon went downhill. Although Phillip himself seems to have been a man of rare understanding for his time (when speared in the shoulder, for example, he refused to let his soldiers retaliate), he was no match for the many convicts and soldiers under his rule. When he retired back to England in 1792 the colony was ruled by a series of weak and ineffective governors, and in particular by the military, culminating in the governorship of Lachlan Macquarie who is seen as a mass murderer of Aboriginal peoples, although the evidence is not clear-cut. What is clear is that mass murders of Aboriginal peoples were common and indiscriminate, and often with appalling cruelty. On many occasions large groups were poisoned with strychnine: this works by affecting the nerves which control muscle movement, so that the body goes into agonizing spasms resulting in death by asphyxiation. Strychnine is considered by toxicologists to be one of the most painful acting of all poisons. Even though Macquarie himself ordered retribution only after "resistance"; groups considered harmless, or consisting only of old men, women and children, were brutally murdered.

People were routinely killed by gunfire, or by being hacked to death; there is at least one report of a crying baby - the only survivor of a massacre - being thrown onto the fire made to burn the victims.

Many more died of disease: smallpox and tuberculosis were responsible for deaths of over 50% of Aboriginal peoples. Their numbers today are thus tiny, and as in the past they are still marginalized.

Only recently has this harrowing part of Australia's past been formally researched; the casual nature of the massacres meant that many were not recorded, and it has taken a great deal of time and work to uncover their details. This work has been headed by Professor Lyndall Ryan at the University of Newcastle. The painstaking and careful work by her team has unearthed much detail, and their results are available at their site Colonial Frontier Massacres in Central and Eastern Australia 1788-1930

As a January 26 exercise I decided to rework one of their maps, producing a single map which would show the sites of massacres by markers whose size is proportional to the number of people killed. This turned out to be quite easy using Python and its folium library, but naturally it took me a long time to get it right.

I started by downloading the timeline from the Newcastle site as a csv file, and going through each massacre adding its location. The project historians point out that the locations are deliberately vague. Sometimes this is because the vagueness of the historical record; but also (from the Introduction):

In order to protect the sites from desecration, and respect for the wishes of Aboriginal communities to observe the site as a place of mourning, the points have been made purposefully imprecise by rounding coordinates to 3 digits, meaning the point is precise only to around 250m.

Given the database, the Python commands were:

import folium
import pandas as pd

mass = pd.read_csv('massacres.csv')

a = folium.Map(location=[-27,134],width=1000, height=1000,tiles='OpenStreetMap',zoom_start=4.5)

for i in range(0,len(mass)):
   number_killed = mass.iloc[i]['Estimated Aboriginal People Killed']
      location=[float(mass.iloc[i]['Lat']), float(mass.iloc[i]['Long'])],
      tooltip=mass.iloc[i]['Location']+': '+str(number_killed),

The result is shown below. You can zoom in and out, and hovering over a massacre site will produce the location and number of people murdered.

The research is ongoing and this data is incomplete

The data was extracted from: Ryan, Lyndall; Richards, Jonathan; Pascoe, William; Debenham, Jennifer; Anders, Robert J; Brown, Mark; Smith, Robyn; Price, Daniel; Newley, Jack Colonial Frontier Massacres in Eastern Australia 1788 – 1872, v2.0 Newcastle: University of Newcastle, 2017, (accessed 08/02/2019). This project has been funded by the Australian Research Council (ARC).

Note finally that Professor Ryan and team have defined a massacre to be a killing of at least six people. Thus we can assume there are many other killings of five or less people which are not yet properly documented, or more likely shall never been known. A shameful history indeed.

Vote counting in the Australian Senate

Recently we have seen senators behaving in ways that seem stupid, or contrary to accepted public opinion. And then people will start jumping up and down and complaining that such a senator only got a tiny number of first preference votes. One commentator said that one senator, with 19 first preference votes, "couldn’t muster more than 19 members of his extended family to vote for him". This displays an ignorance of how senate counting works. In fact first preference votes are almost completely irrelevant; or at least, far less relevant than they are in the lower house.

Senate counting works on a proportional system, where multiple candidates are elected from the same group of ballots. This is different from the lower house (the House of Representatives federally) where only one person is elected. For the lower house, first preference votes are indeed much more important. As for the lower house, senate voting is preferential: voters number their preferred candidates starting with 1 for their most preferred, and so on (but see below).

A full explanation is given by the Australian Electoral Commission on their Senate Counting page; this blog post will run through a very simple example to demonstrate how a senator can be elected with a tiny number of first preference votes.

An aside on micro parties and voting

One problem in Australia is the proliferation of micro parties, many of which hold racist, anti-immigration, or hard-line religious views, or who in some other ways represent only a tiny minority of the electorate. The problem is just as bad at State level; in my own state of Victoria we have the Shooters, Fishers and Farmers Party, the Aussie Battlers Party, and the Transport Matters Party (who represent taxi drivers) to name but three. This has the affect that the number of candidates standing for senate election has become huge, and the senate ballot papers absurdly unwieldy:

Initially the law required voters to number every box starting from 1: on a large paper this would mean numbering carefully from 1 up to at least 96 in one recent election. To save this trouble (and most Australian voters are nothing if not lazy), "above the line voting" was introduced. This gave voters the option to put just a single "1" in the box representing the party of choice: you will see from the image above that the ballot paper is divided: the columns represent all the parties; the boxes below the line represent all the candidates from that party, and the single box above just the party name. Here is a close up of a NSW senate ballot:

Almost all voters willingly took advantage of that and voted above the line. The trouble is then that voters have no control over where their preferences go: that is handled by the parties themselves. By law, all parties must make their preferences available before the election, and they are published on the site of the relevant Electoral Commission. But the only people who carefully check this site and the party's preferences are the sort of people who would carefully number each box below the line anyway. Most people don't care enough to be bothered.

This enables all the micro-parties to make "preference deals"; in effect they act as one large bloc, ensuring that at least some of them get a senate seat. This has been handled by a so-called "preference whisperer".

The current system in the state of Victoria has been to encourage voting below the line by allowing, instead of all boxes to be numbered, at least six. And there are strong calls for voting above the line to be abolished.

A simple example

To show how senate counting works, we suppose an electorate of 100 people, and three senators to be elected from five candidates. We also suppose that every ballot paper has been numbered from 1 to 5 indicating each voter's preferences.

Before the counting begins we need to determine the number of votes each candidate must amass to be elected: this is chosen as the smallest number of votes for which no more candidates can be elected. If there are \(n\) formal votes cast, and \(k\) senators to be elected, that number is clearly

\[\left\lfloor\frac{n}{k+1}\right\rfloor + 1.\]

This value is known as the Droop quota. In our example, this quota is

\[ \left\lfloor\frac{100}{3+1}\right\rfloor +1 = 26. \]

You can see that it is not possible for four candidates to obtain this value.

Suppose that the ballots are distributed as follows, where the numbers under the candidates indicate the preferences cast:

Number of votes A B C D E
20 1 2 3 4 5
20 1 5 4 3 2
40 2 1 5 4 3
5 2 3 5 1 4
4 4 3 1 2 5
1 2 3 4 5 1

Counting first preferences produces:

Candidate First Prefs
A 40
B 40
C 4
D 5
E 1

The first step in the counting is to determine if any candidate has amassed first preference votes equal to or greater than the Droop quota. In the example just given, both A and B have 40 first preferences each, so they are both elected.

Since only 26 votes are needed for election, for each of A and B there are 14 votes remaining which can be passed on to other candidates according to the voting preferences. Which votes are passed on? For B it doesn't matter, but which votes do we deem surplus for A? The Australian choice is to pass on all votes, but at a reduced value known as the transfer value. This value is simply the fraction of surplus votes over total votes; in our case it is


for each of A and B.

Looking at the first line of votes: the next highest preference from A to a non-elected candidate is C, so C gets 0.35 of those 20 votes. From the second line, E gets 0.35 of those 20 votes. From the third line, E gets 0.35 of all 40 votes.

The votes now allocated to the remaining candidates are as follows:

C: \(4 + 0.35\times 20 = 11\)

D: 5

E: \(1 + 0.35\times 20 + 0.35\times 40 = 22\)

At this stage no candidate has amassed a quota, so the lowest ranked candidate in the counting is eliminated - in this case D - and all of those votes are passed on to the highest candidate (of those that are left, which is now only C and E) in those preferences, which is E. This produces:

C: 11

E: \(22 + 5 = 27\)

which means E has achieved the quota and thus is elected.

This is of course a very artificial example, but it shows two things:

  1. How a candidate with a very small number of first preference votes can still be elected: in this case E had the lowest number of first preference votes.
  2. The importance of preferences.

So let's have no more complaining about the low number of first preference votes in a senate count. In a lower house count, sure, the candidate with the least number of first preference votes is eliminated, but in a senate count such a candidate might amass votes (or reduced values of votes) in such a way as to achieve the quota.

Concert review: Lixsania and the Labyrinth

This evening I saw the Australia Brandenburg Orchestra with guest soloist Lixsania Fernandez, a virtuoso player of the viola da gamba, from Cuba. (Although she studied, and now lives, in Spain.) Lixsania is quite amazing: tall, statuesque, quite absurdly beautiful, and plays with a technique that encompasses the wildest of baroque extravagances as well as the most delicate and refined tenderness.

The trouble with the viol, being a fairly soft instrument, is that it's not well suited to a large concert hall. This means that it's almost impossible to get any sort of balance between it and the other instruments. Violins, for example, even if played softly, can overpower it.

Thomas Mace, in his "Musick's Monument", published in 1676, complained vigorously about violins:

Mace has been described as a "conservative old git" which he certainly was, but I do love the idea of this last hold-out against the "High-Priz'd Noise" of the violin. And I can see his point!

But back to Lixsania. The concert started with a "pastiche" of La Folia, taking in parts of Corelli's well known set for solo violin, Vivaldi's for two, Scarlatti for harpsichord, and of course Marin Marais "32 couplets de Folies" from his second book of viol pieces. The Australian Brandenburgs have a nice line in stagecraft, and this started with a dark stage with only Lixsania lit, playing some wonderful arpeggiated figurations over all the strings, with a bowing of utter perfection. I was sitting side on to her position here, and I could see with what ease she moved over the fingerboard - the mark of a true master of their instrument - being totally at one with it. Little by little other instrumentalists crept in: a violinist here and there, Paul Dyer (leader of the orchestra) to the harpsichord, cellists and a bassist, until there was a sizable group on stage all playing madly. I thought it was just wonderful.

For this first piece Lixsania was wearing a black outfit with long and full skirts and sort of halter top which left her arms, sides and back bare. This meant I had an excellent view of her rib-cage, which was a first for me in a concert.

The second piece was the 12th concerto, the so called "Harmonic Labyrinth" from Locatelli's opus 3. These concertos contain, in their first and last movements, a wild "capriccio" for solo violin. This twelfth concerto contains capricci of such superhuman difficulty that even now, nearly 300 years after they were composed, they still stand at the peak of virtuosity. The Orchestra's concertmaster, Shaun Lee-Chen, was however well up to the challenge, and powered his way through both capricci with the audience hardly daring to breathe. Even though conventional concert behaviour does not include applause after individual movements, so excited was the audience that there was an outburst of clapping after the first movement. And quite right too.

The final piece of the first half was a Vivaldi concerto for two violins and cello, the cello part being taken by Lixsania on viol. I felt this didn't come across so well; the viol really couldn't be heard much, and you really do need the strength of the cello to make much sense of the music. However, it did give Lixsania some more stage-time.

After interval we were treated to a concerto for viol by Johann Gottlieb Graun, a court composer to Frederick the Great of Prussia. Graun wrote five concertos for the instrument - all monumentally difficult to play - which have been recorded several times. However, a sixth one has recently been unearthed in manuscript - and apparently we were hearing it for the first time in this concert series. The softness of the viol in the largeness of the hall meant that it was not always easy to hear: I solved that by closing my eyes, so I could focus on the sound alone. Lixsania played, as you would imagine, as though she owned it, and its formidable technical difficulties simply melted away under the total assurance of her fingers. She'd changed into a yellow outfit for this second half, and all the male players were wearing yellow ties.

Then came a short Vivaldi sinfonia - a quite remarkable piece; very stately and with shifting harmonies that gave it a surprisingly modern feel. Just when you think Vivaldi is mainly about pot-boilers, he gives you something like this. Short, but superb.

Finally, the fourth movement of a concerto written in 2001 for two viols by "Renato Duchiffre" (the pen name of René Schiffer, cellist and violist with Apollo's Fire): a Tango. Now my exposure to tangos has mainly been through that arch-bore Astor Piazolla. But this tango was magnificent. The other violist was Anthea Cottee, of whom I'd never heard, but she's no mean player. She and Lixsania made a fine pair, playing like demons, complementing each other and happily grinning at some of the finer passages. One of the many likeable characteristics of Lixsania is that she seems to really enjoy playing, and smiles a lot - I hate that convention of players who adopt a poker-face. And she has a great smile.

In fact the whole orchestra has a wonderful enjoyment about them, led by Paul Dyer who displays a lovely dynamism at the harpsichord. Not for him the expressionless sitting still; he will leap up if given half an opportunity and conduct a passage with whichever hand is free; sometimes he would play standing and sort of conduct with his body; between him and Lixsania there was a chemistry of heart and mind, both leaning towards each other, as if inspiring each other to reach higher musical heights. This was one of the most delightful displays of communicative musicianship I've ever seen.

Naturally there had to be an encore: and it was Lixsania singing a Cuban lullaby, accompanying herself by plucking the viol - which was stood on a chair for easier access - with Anthea Cottee providing a bowed accompaniment. Lixsania told us (of course she speaks English fluently, with a charming Cuban accent) that it was a lullaby of special significance, as it was the first song she'd ever sang to her son. There's no reason why instrumentalists should be able to sing well, but in fact Lixsania has a lovely, rich, warm, enveloping sort of voice, and the effect was breathtakingly lovely. Lucky son!

This was a great concert.

Linear programming in Python (2)

Here's an example of a transportation problem, with information given as a table:

300 360 280 340 220
750 100 150 200 140 35
Supplies  400 50 70 80 65 80
350 40 90 100 150 130

This is an example of a balanced, non-degenerate transportation problem. It is balanced since the sum of supplies equals the sum of demands, and it is non-degenerate as there is no proper subset of supplies whose sum is equal to that of a proper subset of demands. That is, there are no balanced "sub-problems".

In such a problem, the array values may be considered to be the costs of transporting one object from a supplier to a demand. (In the version of the problem I pose to my students it's cars between distributors and car-yards; in another version it's tubs of ice-cream between dairies and supermarkets.) The idea of course is to move all objects from supplies to demands while minimizing the total cost.

This is a standard linear optimization problem, and it can be solved by any method used to solve such problems, although generally specialized methods are used.

But the intention here is to show how easily this problem can be managed using myMathProg (and with numpy, for the simple use of printing an array):

import pymprog as py
import numpy as np
M = range(3)  # number of rows and columns
N = range(5)
A = py.iprod(M,N) # Cartesian product
x = py.var('x', A, kind=int) # all the decision variables are integers
costs = [[100,150,200,140,35],[50,70,80,65,80],[40,90,100,150,130]]
supplies = [750,400,350]
demands = [300,360,280,340,220]
py.minimize(sum(costs[i][j]*x[i,j] for i,j in A))
# the total sum in each row must equal the supplies
for k in M:
    sum(x[k,j] for j in N)==supplies[k]
# the total sum in each column must equal the demands
for k in N:
    sum(x[i,k] for i in M)==demands[k]
print('\nMinimum cost: ',py.vobj())
A = np.array([[x[i,j].primal for j in N] for i in M])

with solution:

GLPK Simplex Optimizer, v4.65
n8 rows, 15 columns, 30 non-zeros
      0: obj =   0.000000000e+00 inf =   3.000e+03 (8)
      7: obj =   1.789500000e+05 inf =   0.000e+00 (0)
*    12: obj =   1.311000000e+05 inf =   0.000e+00 (0)
GLPK Integer Optimizer, v4.65
8 rows, 15 columns, 30 non-zeros
15 integer variables, none of which are binary
Integer optimization begins...
Long-step dual simplex will be used
+    12: mip =     not found yet >=              -inf        (1; 0)
+    12: >>>>>   1.311000000e+05 >=   1.311000000e+05   0.0% (1; 0)
+    12: mip =   1.311000000e+05 >=     tree is empty   0.0% (0; 1)

Minimum cost:  131100.0

[[  0. 190.   0. 340. 220.]
 [  0. 120. 280.   0.   0.]
 [300.  50.   0.   0.   0.]]

As you see, the definition of the problem in Python is very straightforward.

Linear programming in Python

For my elementary linear programming subject, the students (who are all pre-service teachers) use Excel and its Solver as the computational tool of choice. We do this for several reasons: Excel is software with which they're likely to have had some experience, also it's used in schools; it also means we don't have to spend time and mental energy getting to grips with new and unfamiliar software. And indeed the mandated curriculum includes computer exploration, using either Excel Solver, or the Wolfram Alpha Linear Programming widget.

This is all very well, but I balk at the reliance on commercial software, no matter how widely used it may be. And for my own exploration I've been looking for an open-source equivalent.

In fact there are plenty of linear programming tools and libraries; two of the most popular open-source ones are:

  • The GNU Linear Programming Kit, GLPK
  • Coin-or Linear Programming, Clp

There's a huge list on wikipedia which includes open-source and proprietary software.

For pretty much any language you care to name, somebody has taken either GLPK or Clp (or both) and produced a language API for it. For Python there's PuLP; for Julia there's JuMP; for Octave there's the `glpk` command, and so on. Most of the API's include methods of calling other solvers, if you have them available.

However not all of these are well documented, and in particular some of them don't allow sensitivity analysis: computing shadow prices, or ranges of the objective coefficients. I discovered that JuMP doesn't yet support this - although to be fair sensitivity analysis does depend on the problem being solved, and the solver being used.

Being a Python aficionado, I thought I'd check out some Python packages, of which a list is given at an operations research page.

However, I then discovered the Python package PyMathProg which for my purposes is perfect - it just calls GLPK, but in a nicely "pythonic" manner, and the design of the package suits me very well.

A simple example

Here's a tiny two-dimensional problem I gave to my students:

A furniture workshop produces chairs and tables. Each day 30m2 of wood board is delivered to the workshop, of which chairs require 0.5m2 and tables 1.5m2. (We assume, of course, that all wood is used with no wastage.) All furniture needs to be laminated; there is only one machine available for 10 hours per day, and chairs take 15 minutes each, tables 20 minutes. If chairs are sold for $30 and tables for $60, then maximize the daily profit (assuming that all are sold).

Letting \(x\) be the number of chairs, and \(y\) be the number of tables, the problem is to maximize \[ 30x+60y \] given

\begin{align*} 0.5x+1.5y&\le 30\\\
15x+20y&\le 600\\\
x,y&\ge 0 \end{align*}

Problems don't get much simpler than this. In pyMathProg:

import pymathprog as pm
# pm.verbose(True)
x, y = pm.var('x, y') # variables
pm.maximize(30 * x + 60 * y, 'profit')
0.5*x + 1.5*y <= 30 # wood
15*x + 20*y <= 600 # laminate
print('\nMax profit:',pm.vobj())

with output:

GLPK Simplex Optimizer, v4.65
2 rows, 2 columns, 4 non-zeros
*     0: obj =  -0.000000000e+00 inf =   0.000e+00 (2)
*     2: obj =   1.440000000e+03 inf =   0.000e+00 (0)

Max profit: 1440.0

PyMathProg 1.0 Sensitivity Report Created: 2018/10/28 Sun 21:42PM
Variable            Activity   Dual.Value     Obj.Coef   Range.From   Range.Till
*x                        24            0           30           20           45
*y                        12            0           60           40           90
Constraint       Activity Dual.Value  Lower.Bnd  Upper.Bnd RangeLower RangeUpper
 R1                    30         24       -inf         30         20         45
 R2                   600        1.2       -inf        600        400        900

From that output, we see that the required maximum is $1440, obtained by making 24 chairs and 12 tables. We also see that the shadow prices for the constraints are 24 and 1.2. Furthermore, the ranges of objective coefficients which will not affect the results are \([20,45]\) for prices for chairs, and \([40,90]\) for table prices.

This is the simplest API I've found so far which provides that sensitivity analysis.

Note that if we just want a solution, we can use the linprog command from scipy:

from scipy.optimize import linprog

linprog automatically minimizes a function, so to maximize we use a negative function. The output is

    fun: -1440.0
message: 'Optimization terminated successfully.'
    nit: 2
  slack: array([0., 0.])
 status: 0
success: True
      x: array([24., 12.])

The negative value given as fun above simply reflects that we are entering a negative function. In respect of our problem, we simply negate that value to obtain the required maximum of 1440.

A test of OpenJSCAD

Here's an example of a coloured tetrahedron:

<div oncontextmenu="return false;"
     style = "width:640px;height:470px;"
<div id="tail" style="display: none;">
  <div id="statusdiv"></div>

The power of two irrational numbers being rational

There's a celebrated elementary result which claims that:

There are irrational numbers \(x\) and \(y\) for which \(x^y\) is rational.

The standard proof goes like this. Now, we know that \(\sqrt{2}\) is irrational, so let's consider \(r=\sqrt{2}^\sqrt{2}\). Either \(r\) is rational, or it is not. If it is rational, then we set \(x=\sqrt{2}\), \(y=\sqrt{2}\) and we are done. If \(r\) is irrational, then set \(x=r\) and \(y=\sqrt{2}\). This means that \[ x^y=\left(\sqrt{2}^\sqrt{2}\right)^{\sqrt{2}}=\sqrt{2}^2=2 \] which is rational.

This is a perfectly acceptable proof, but highly non-constructive, And for some people, the fact that the proof gives no information about the irrationality of \(\sqrt{2}^\sqrt{2}\) is a fault.

So here's a lovely constructive proof I found on reddit . Set \(x=\sqrt{2}\) and \(y=2\log_2{3}\). The fact that \(y\) is irrational follows from the fact that if \(y=p/q\) with \(p\) and \(q\) integers, then \(2\log_2{3}=p/q\) so that \(2^{p/2q}=3\), or \(2^p=3^{2q}\) which contradicts the fundamental theorem of arithmetic. Then:

\begin{eqnarray*} x^y&=&\sqrt{2}^{2\log_2{3}}\\\
&=&3. \end{eqnarray*}

Wrestling with Docker

For years I have been running a blog and other web apps on a VPS running Ubuntu 14.04 and Apache - a standard LAMP system. However, after experimenting with some apps - temporarily installing them and testing them, only to discard them, the system was becoming a total mess. Worst of all, various MySQL files were ballooning out in size: the ibdata1 file in /var/lib/mysql was coming in at a whopping 37Gb (39568015360 bytes to be more accurate).

Now, there are ways of dealing with this, but I don't want to have to become an expert in MySQL; all I wanted to do was to recover my system and make it more manageable.

I decided to use Docker. This is a "container system" where each app runs in its own container - a sort of mini system which contains all the files required to serve it up to the web. This clearly requires a certain amount of repetition between containers, but that's the price to be paid for independence. The idea is that you can start or stop any container without affecting any of the others. For web apps many containers are based on Alpine Linux which is a system designed to be as tiny as possible, along with the nginx web server.

There seems to be a sizable ecosystem of tools to help manage and deploy docker containers. Given my starting position of knowing nothing, I wanted to keep my extra tools to a minimum; I went with just two over and above docker itself: docker-compose, which helps design, configure, and run docker containers, and traefik, a reverse proxy, which handles all requests from the outside world to docker containers - thus managing things like ports - as well as interfacing with the certificate authority Lets Encrypt.

My hope was that I should be able to get these all set up so they would work as happily together as they were supposed to do. And so indeed it has turned out, although it took many days of fiddling, and innumerable questions to forums and web sites (such as reddit) to make it work.

So here's my traefik configuration:

defaultEntryPoints = ["http", "https"]

address = ":8080"
  users = ["admin:$apr1$v7kJtvT7$h0F7kxt.lAzFH4sZ8Z9ik."]

  address = ":80"
      entryPoint = "https"
  address = ":443"

  format = "json"

# Below here comes from
# with values adjusted for local use, of course

# Let's encrypt configuration
onHostRule = true
entryPoint = "https"
  # Use a HTTP-01 acme challenge rather than TLS-SNI-01 challenge
  entryPoint = "http"

  main = ""
  sans = ["", "", "", "", "",

# Connection to docker host system (docker.sock)
endpoint = "unix:///var/run/docker.sock"
domain = ""
watch = true
# This will hide all docker containers that don't have explicitly set label to "enable"
exposedbydefault = false

and (part of) my docker-compose configuration, the file docker-compose.yml:

version: "3"

    external: true
    external: false


    image: traefik:1.6.0-alpine
    container_name: traefik
    restart: always
    command: --web --docker --logLevel=DEBUG
      - /var/run/docker.sock:/var/run/docker.sock
      - $PWD/traefik.toml:/traefik.toml
      - $PWD/acme.json:/acme.json
      - proxy
      - "80:80"
      - "443:443"
      - traefik.enable=true
      - traefik.backend=traefik
      - traefik.port=8080

    image: blog
      - /home/amca/docker/whats_this/public:/usr/share/nginx/html
      - internal
      - proxy
      - traefik.enable=true
      - traefik.backend=blog
      - traefik.port=80

The way this works, at least in respect of this blog, is that files copied into the directory /home/amca/docker/whats_this/public on my VPS will be automatically served by nginx. So all I now need is a command on my local system (on which I do all my blog writing), which serves up these files. I've called it docker-deploy:

hugo -b "" -t "blackburn" && rsync -avz -e "ssh" --delete public/

Remarkably enough, it all works!

One issue I had at the beginning was that my original blog was served up at the URL and for some reason these links were still appearing in my new blog. It turned out (after a lot of anguished messages) that it was my mis-handling of rsync. I just ended up deleting everything except for the blog source files, and re-created everything from scratch.

Householder's methods

These are a class of root-finding methods; that is, for the numerical solution of a single nonlinear equation, developed by Alston Scott Householder in 1970. They may be considered a generalisation of the well known Newton-Raphson method (also known more simply as Newton's method) defined by

\[ x\leftarrow x-\frac{f(x)}{f'(x)}. \]

where the equation to be solved is \(f(x)=0\).

From a starting value \(x_0\) a sequence of iterates can be generated by

\[ x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}. \]

As is well known, Newton's method exhibits quadratic convergence; that is, if the sequence of iterates converges to a root value \(r\), then the limit

\[ \lim_{n\to\infty}\frac{x_{n+1}-r}{(x_n-r)^2} \]

is finite. This means, in effect, that the number of correct decimal places doubles at each step. Householder's method for a rate of convergence \(d+1\) is defined by

\[ x\leftarrow x-d\frac{(1/f)^{(d-1)}(x)}{(1/f)^{(d)}(x)}. \]

We show how this definition can be rewritten in terms of ratios of derivatives, by using Python and its symbolic toolbox SymPy.

We start by defining some variables and functions.

from sympy import *
x = Symbol('x')
f = Function('f')(x)

Now we can define the first Householder formula, with \(d=1\):

d = 1
H1 = x + d*diff(1/f,x,d-1)/diff(1/f,x,d)

\[ x-\frac{f(x)}{\frac{d}{dx}f(x)} \]

which is Newton's formula. Now for \(d=2\):

d = 2
H2 = x + d*diff(1/f,x,d-1)/diff(1/f,x,d)

\[ x - \frac{2 \frac{d}{d x} f{\left (x \right )}}{- \frac{d^{2}}{d x^{2}} f{\left (x \right )} + \frac{2 \left(\frac{d}{d x} f{\left (x \right )}\right)^{2}}{f{\left (x \right )}}} \]

This is a mighty messy formula, but it can be greatly simplified by using ratios of derivatives defined by

\[ r_k=\frac{f^{(d-1}(x)}{f^{(d)}(x)} \] This means that \[ r_1=\frac{f}{f'},\quad r_2=\frac{f'}{f^{\prime\prime}} \] To make the substitution into the current expression above, we can use the substitutions \[ f^{\prime\prime}=f'/r_2,\quad f'=f/r_1 \] to be done sequentially (first defining the new symbols)

r_1,r_2,r_3 = symbols('r_1,r_2,r_3')
H2r = H2s.subs([(Derivative(f,x,2), Derivative(f,x)/r_2), (Derivative(f,x), f/r_1)]).simplify()

\[ -\frac{2r_1r_1}{r_1-2r_2} \] Dividing the top and bottom by \(2r_2\) produces the formulation \[ \frac{r_1}{1-\displaystyle{\frac{r_1}{2r_2}}} \] and so Householder's method for \(d=2\) is defined by the recurrence \[ x\leftarrow x-\frac{r_1}{1-\displaystyle{\frac{r_1}{2r_2}}}. \] This is known as Halley's method, after Edmond Halley, also known for his comet. This method has been called the most often rediscovered iteration formula in the literature.

It would exhibit cubic convergence, which means that the number of correct figures roughly triples at each step.

Apply the same sequence of steps for \(d=3\), and including the substitution \[ f^{\prime\prime\prime} = f^{\prime\prime}/r_3 \] produces the fourth order formula \[ x\leftarrow x-\frac{3 r_{1} r_{3} \left(2r_{2} - r_{1}\right)}{r_{1}^{2} - 6 r_{1} r_{3} + 6 r_{2} r_{3}} \]

A test

We'll use the equation \[ x^5+x-1=0 \] which has a root close to \(0.7\). First Newton's method, which is the Householder method of order \(d=1\), and we start by defining the symbol \(x\) and the function \(f\):

x = Symbol('x')
f = x**5+x-1

Next define the iteration of Newton's method, which can be turned into a function with the handy tool lambdify:

nr = lambdify(x, x - f/diff(f,x))

Now, a few iterations, and print them as strings:

y = 0.7
ys = [y]
for i in range(10):
    y = N(nr(y),100)
    ys += [y]

for i in ys:


We can easily compute the number of correct decimal places each time by simply finding the first place in each string where it differs from the previous one:

for i in range(1,7):
    d = [ys[i][j] == ys[i+1][j] for j in range(102)]

\begin{array}{r} 2\cr 3\cr 8\cr 16\cr 32\cr 66 \end{array}

and we see a remarkable closeness with doubling of the number of correct values each iteration.

Now, the fourth order method, with \(d=3\):

r1 = lambdify(x,g(x)/diff(g(x),x))
r2 = lambdify(x,diff(g(x),x)/diff(g(x),x,2))
r3 = lambdify(x,diff(g(x),x,2)/diff(g(x),x,3))
h3 = lambdify(x,x-3*r1(x)*r3(x)*(2*r2(x)-r1(x))/(r1(x)**2-6*r1(x)*r3(x)+6*r2(x)*r3(x)))

Now we basically copy down the above commands, except that we'll use 1500 decimal places instead of 100:

y = 0.7
ys = [str(x)]
for i in range(10):
    y = N(h3(x),1500)
    ys += [str(y)]

for i in range(1,6):
    d = [xs[i][j] == xs[i+1][j] for j in range(1502)]

\begin{array}{r} 4\\\
1233 \end{array}

and we that the number of correct decimal places at each step is indeed increased by a factor very close to 4.

The Joukowsky Transform

The Joukowksy Transform is an elegant and simple way to create an airfoil shape.

Let \(C\) be a circle in the complex plane that passes through the point \(z=1\) and encompasses the point \(z=-1\). The transform is defined as

\[ \zeta=z+\frac{1}{z}. \]

We can explore the transform by looking at the circles centred at \((-r,0)\) with \(r<0\) and with radius \(1+r\):

\[ \|z-r\|=1+r \]

or in cartesian coordinates with parameter \(t\):


\begin{aligned} x &= -r+(1+r)\cos(t)\\\
y &= (1+r)\sin(t) \end{aligned}


so that \[ (x,y)\rightarrow \left(x+\frac{x}{x^2+y^2},y-\frac{y}{x^2+y^2}\right). \]

To see this in action, move the point \(P\) in this diagram about, ensuring that the point \((-1,0)\) always remains within the circle:

Double Damask

This was a comedy sketch initially performed in the revue "Clowns in Clover" which had its first performance at the Adelphi Theatre in London on December 1, 1927. This particular sketch was written by Dion Titheradge and starred the inimitable Cicely Courtneidge as the annoyed customer Mrs Spooner. It has been recorded and is available on many different collections; you can also hear it on youtube.

I have loved this sketch since I first heard it as a teenager on a three record collection called something like "Masters of Comedy", being a collection of classic sketches. Double Damask has also been performed by Beatrice Lillie, and you can search for this also on youtube. For example, here. I hope admirers of the excellent Ms Lillie will not be upset by my saying I far prefer Cicely Courtneidge, whose superb diction and impeccable comic timing are beyond reproach.

No doubt the original script is available somewhere, but in the annoying way of the internet, I couldn't find it. So here is my transcription of the Courtneidge version of "Double Damask".

Double Damask

written by

Dion Titheradge

A customer, Mrs Spooner\
A shop assistant (unnamed)\
A manager, Mr Peters

Scene: The linen department of a large store.

MRS SPOONER: I wonder if you could tell me if my order has gone off yet?

ASSISTANT: Not knowing your order, madam, I really couldn't say.

MRS SPOONER: But I was in here an hour ago and gave it to you.

ASSISTANT: What name, madam?

MRS SPOONER: Spooner, Mrs Spooner,

ASSISTANT: Have you an address?

MRS SPOONER: Do I look as if I live in the open air? I gave a large order for sheets and tablecloths, to be sent to Bacon Villa, Egham. (pronounced "Eg'm")


MRS SPOONER: I hope I speak plainly: Egg Ham!

ASSISTANT: Oh yes, yes I remember perfectly now, Madam. Let me see now... no, your order won't go through until tomorrow morning. Is there anything further?

MRS SPOONER: Yes, (very quickly) I want two dozen double damask dinner napkins.

ASSISTANT: I beg your pardon?

MRS SPOONER (as quicky as before): I said two dozen double damask dinner napkins.

ASSISTANT: I'm sorry madam, I don't quite catch -

MRS SPOONER: Dinner napkins, man! Dinner napkins!

ASSISTANT: Of course madam. Plain?

MRS SPOONER: Not plain, double damask.

ASSISTANT: Yes... would you mind repeating your order Madam? I'm not quite sure.

MRS SPOONER: I want two dozen dammle dubbuck; I want two dammle dubb... oh dear, stupid of me! I want two dozen dammle dizzick danner nipkins.

ASSISTANT: Danner nipkins Madam?


ASSISTANT: You mean dinner napkins.

MRS SPOONER: That's what I said.

ASSISTANT: No, pardon me, Madam, you said danner nipkins!

MRS SPOONER: Don't be ridiculous! I said dinner napkins, and I meant danner nipkins. Nipper know you're getting me muddled now.

ASSISTANT: I'm sorry Madam. You want danner nipkins, exactly. How many?

MRS SPOONER: Two duzzle.


MRS SPOONER: Oh, gracious, young man - can't you get it right? I want two dubbin duzzle damask dinner napkins.

ASSISTANT: Oh no, Madam, not two dubbin - you mean two dozen!

MRS SPOONER: I said two dozen! Only they must be dammle duzzick!

ASSISTANT: No, we haven't any of that in stock, Madam.

MRS SPOONER (in a tone of complete exasperation): Oh dear, of all the fools! Can't I find anybody, just anybody with a modicum of intelligence in this store?

ASSISTANT: Well, here is our Mr Peters, Madam. Now perhaps if you ask him he might-

MR PETERS (In an authoritative "we can fix anything" kind of voice): Can I be of any assistance to you, Madam?

MRS SPOONER: I'm sorry to say that your assistant doesn't appear to speak English. I'm giving an order, but it might just as well be in Esperanto for all he understands.

MR PETERS: Allow me to help you Madam. You require?

MRS SPOONER: I require (as quickly as before) two dozen double damask dinner napkins.

MR PETERS: I beg pardon, Madam?

MRS SPOONER: Oh heavens - can't you understand?

MR PETERS: Would you mind repeating your order, Madam.

MRS SPOONER: I want two dazzen -

MR PETERS: Two dozen!

MRS SPOONER: I said two dozen!

MR PETERS: Oh no no Madam - no, you said two dazzen. But I understand perfectly what you mean. You mean two dozen; in other words - a double dozen.

MRS SPOONER: That's it! A duzzle dubbin double damask dinner napkins.

MR PETERS: Oh no, pardon me, Madam, pardon me: you mean a double dozen double dummick dinner napkins.

ASSISTANT: Double damask, sir.

MR PETERS: I said double damask! It's... dapper ninkins you require, sir.

MRS SPOONER: Please get it right, I want dinner napkins, dinner napkins.

MR PETERS: I beg pardon, Madam. So stupid of gets so confused... (Laughs)

MRS SPOONER: It is not a laughing matter.

MR PETERS: Of course. Dipper nankins, madam.

ASSISTANT: Dapper ninkins, sir.

MRS SPOONER: Danner nipkins.

MR PETERS: I understand exactly what Madam wants. It is two d-d-d-d-..two d- Would you mind repeating your order please, Madam?

MRS SPOONER: Ohhh, dear.. I want two duzzle dizzen damask dinner dumplings!

MR PETERS: Allow me, Madam, allow me. The lady requires (quickly) two dubbin double damask dunner napkins.

ASSISTANT: Dunner napkins sir?

MR PETERS: Certainly! Two dizzen.

MRS SPOONER: Not two dizzen - I want two dowzen!

MR PETERS: Quite so, Madam, quite so. If I may say so we're getting a little bit confused, splitting it up, as it were. Now, the full order, the full order, is two dazzen dibble dummisk n'dipper dumkins.

ASSISTANT: Excuse me, sir, you mean two dummen dammle dimmick dizzy napkins.

(The next four four lines are spoken almost on top of each other)

MRS SPOONER: I do not want dizzy napkins, I want two dizzle dammen damask -

MR PETERS: No - two dizzle dammle dizzick!

ASSISTANT: Two duzzle dummuck dummy!

MRS SPOONER: Two damn dizzy diddle dimmer dipkins!

MR PETERS (Shocked): Madam, Madam! Please, please - your language!

MRS SPOONER: Oh, blast. Give me twenty four serviettes.

Graphs of Eggs

I recently came across some nice material on John Cook's blog about equations that described eggs.

It turns out there are vast number of equations whose graphs are egg-shaped: that is, basically ellipse shape, but with one end "rounder" than the other.

You can see lots at Jürgen Köller's Mathematische Basteleien page. (Although this blog is mostly in German, there are enough English language pages for monoglots such as me). And plenty of egg equations can be found in the 2dcurves pages.

Another excellent source of eggy equations is TDCC Laboratory from Japan (the link here is to their English language page). For the purposes of experimenting we will use equations from this TDCC, adjusted as necessary. Many of their equations are given in parametric form, which means they can be easily graphed and explored using JSXGraph.

The first set of parametric equations, whose author is given to be Nobuo Yamamoto, is:

\begin{align*} x&=(a+b+b\cos\theta)\cos\theta\\\
y&=(a+b\cos\theta)\sin\theta \end{align*}

If we divide these equations by \(a\), and use the parameter \(c\) for \(b/a\) we obtain slightly simpler equations:

\begin{align*} x&=(1+c+c\cos\theta)\cos\theta\\\
y&=(1+c\cos\theta)\sin\theta \end{align*}

Here you can explore values of \(c\) between 0 and 1:

Another set of equations is said to be due to Tadao Ito (whose surname is sometimes transliterated as Itou):

\begin{align*} x&=\cos\theta\\\
y&=c\cos\frac{\theta}{4}\sin\theta \end{align*}

Many more equations: parametric, implicit, can be found at the sites linked above.

Exploring JSXGraph

JSXGraph is a graphics package deveoped in Javascript, and which seems to be tailor-made for a static blog such as this. It consists of only two files: the javascript file itself, and an accompanying css file, which you can download. Alternaively you can simply link to the online files at the Javascript content delivery site cdnjs managed by cloudflare. There are cloudflare servers all over the world - even in my home town of Melbourne, Australia.

So I modified the head.html file of my theme to include a link to the necessary files:

So I downloaded the javascript and css files as described here and also, for good measure, added the script line (from that page) to the layouts/partials/head.html file of the theme. Then copied the following snippet from the JSXGraph site:

<div id="box" class="jxgbox" style="width:500px; height:500px;"></div>
<script type="text/javascript">
 var board = JXG.JSXGraph.initBoard('box', {boundingbox: [-10, 10, 10, -10], axis:true});

However, to make this work the entire script needs to be inside a <div>, </div> pair, like this:

<div id="box" class="jxgbox" style="width:500px; height:500px;">
<script type="text/javascript">
 var board = JXG.JSXGraph.initBoard('box', {boundingbox: [-10, 10, 10, -10], axis:true});

Just to see how well this works, here's Archimedes' neusis construction of an angle trisection: given an angle \(\theta\) in a unit semicircle, its trisection is obtained by laying against the circle a straight line with points spaced 1 apart (drag point A about the circle to see this in action):

For what it's worth, here is the splendid javascript code to produce the above figure:

<div id="box" class="jxgbox" style="width:500px; height:333.33px;">
<script type="text/javascript">
 JXG.Options.axis.ticks.insertTicks = false;
 JXG.Options.axis.ticks.drawLabels = false;
 var board = JXG.JSXGraph.initBoard('box', {boundingbox: [-1.5, 1.5, 3, -1.5],axis:true});
 var p = board.create('point',[0,0],{visible:false,fixed:true});
 var neg = board.create('point',[-0.67,0],{visible:false,fixed:true});
 var c = board.create('circle',[[0,0],1.0]);
 var a = board.create('glider',[-Math.sqrt(0.5),Math.sqrt(0.5),c],{name:'A'});
 var l1 = board.create('segment',[a,p]);
 var ang = board.create('angle',[a,p,neg],{radius:0.67,name:'θ'});
 var theta = JXG.Math.Geometry.rad(a,p,neg);
 var bb = board.create('point',[function(){return Math.cos(Math.atan2(a.Y(),-a.X())/3);},function(){return Math.sin(Math.atan2(a.Y(),-a.X())/3);}],{name:'B'});
 var w = board.create('point',[function(){return Math.cos(Math.atan2(a.Y(),-a.X())/3)/0.5;},0]);
 var l2 = board.create('line',[a,w]);
 var l3 = board.create('segment',[p,bb]);
 var l4 = board.create('segment',[bb,w],{strokeWidth:6,strokeColor:'#FF0000'});
 var ang2 = board.create('angle',[bb,w,neg],{radius:0.67,name:'θ/3'});

Quite wonderful, it is.

The trinomial theorem

When I was teaching the binomial theorem (or, to be more accurate, the binomial expansion) to my long-suffering students, one of them asked me if there was a trinomial theorem. Well, of course there is, although in fact expanding sums of greater than two terms is generally not classed as a theorem described by the number of terms. The general result is

\[ (x_1+x_2+\cdots+x_k)^n=\sum_{a_1+a_2+\cdots+a_k=n} {n\choose a_1,a_2,\ldots,a_k}x_1^{a_1}x_2^{a_2}\cdots x_k^{a_k} \]

so in particular a "trinomial theorem" would be

\[ (x+y+z)^n=\sum_{a+b+c=n}{n\choose a,b,c}x^ay^bz^c. \]

Here we define

\[ {n\choose a,b,c}=\frac{n!}{a!b!c!} \]

and this is known as a trinomial coefficient; more generally, for an arbitrary number of variables, it is a multinomial coefficient. It is guaranteed to be an integer if the lower values sum to the upper value.

So to compute \((x+y+z)^5\) we could list all integers \(a,b,c\) with \(0\le a,b,c\le 5\) for which \(a+b+c=5\), and put them all into the above sum.

But of course there's a better way, and it comes from expanding \((x+y+z)^5\) as a binomial \((x+(y+z))^5\) so that

\begin{array}{rcl} (x+(y+x))^5&=&x^5\\\
&&+(y+z)^5 \end{array}

Now we can expand each of those binomial powers:

\begin{array}{rcl} (x+(y+x))^5&=&x^5\\\
&&+(y^5+5y^4z+10y^3z^2+10y^2z^3+5yz^4+z^5) \end{array}

Expanding this produces

\begin{split} x^5&+5x^4y+5x^4z+10x^3y^2+20x^3yz+10x^3z^2+10x^2y^3+30x^2y^2z+30x^2yz^3\\\
&+10y^2z^3+5yz^4+z^5 \end{split}

which is an equation of rare beauty.

But there's a nice way of setting this up, which involves writing down Pascal's triangle to the fifth row, and putting a fifth row, as a column, on the side. Then multiply across:

\begin{array}{lcccccccccc} 1&&&&&&1&&&&&\\\
1&1&&5&&10&&10&&5&&1 \end{array}

to produce the final array of coefficients (with index numbers at the left):

\begin{array}{l*{10}{c}} 0\qquad{}&&&&&&1&&&&&\\\
5&1&&5&&10&&10&&5&&1 \end{array}

Row \(i\) of this array corresponds to \(x^{5-i}\) and all combinations of powers \(y^bz^c\) for \(0\le b,c\le i\). Thus for example the fourth row down, corresponding to \( i=3 \), may be considered as the coefficients of the terms

\[ x^2y^3,\quad x^2y^2z,\quad x^2yz^2,\quad xz^3. \]

Note that the triangle of coefficients is symmetrical along all three centre lines, as well as rotationally symmetric by 120°.

Playing with Hugo

I've been using wordpress as my blogging platform since I first started, about 10 years ago. (In fact the first post I can find is dated March 30, 2008.) I chose back then because it was (a) free, and (b) supported mathematics through a version (or subset) of LaTeX. As I have used LaTeX extensively for all my writing since the early 1990's, it's a standard requirement for me.

Some time later I decided to start hosting my own server (well, a VPS), on which I could use, which is the self-hosted version of wordpress. The advantages of a self hosted blog are many, but I particularly like the greater freedom, the ability to include a far greater variety of plugins, and the larger choice of themes. And one of the plugins I liked particularly was WP QuickLaTeX which provided a LaTeX engine far superior to the in-built one of Math bloggin heaven!

However, hosting my own wordpress site was not without difficulty. First I had to install it and get it up and running (even this was non-trivial), and then I had to manage all the users and passwords: myself as a standard user, wp-admin for accessing the Wordpress site itself, a few others. I have quite a long list containing all the commands I used, and all the users and passwords I created.

This served me well, but it was also slow to use. My VPS is perfectly satisfactory, but it is not fast (I'm too cheap to pay for much more than a low-powered one), and the edit-save-preview cycle of online blogging with my wordpress installation was getting tiresome.

Plus the issue of security. I've been hacked once, and I've since managed to secure my site with a free certificate from Let's Encrypt. In fact, in many ways Let's Encrypt is one of the best things to have happened for security. An open Certificate Authority is manna from heaven, as far as I'm concerned.

Wordpress is of course more than just blogging software. It now grandly styles itself as Site Building software and Content Management System, and the site claims that "30% of the web uses Wordpress". It is in fact hugely powerful and deservedly popular, and can be used for pretty much whatever sort of site you want to build. Add to that a seemingly infinite set of plugins, and you have an entire ecosystem of web-building.

However, all of that popularity and power comes at a cost: it is big, confusing, takes work to maintain, keep secure, and keep up-to-date, and is a target for hackers. Also for me, it has become colossal overkill. I don't need all those bells and whistles; all I want to do is host my blog and share my posts with the world (the \(1.5\times 10^{-7}%\) of the world who reads it).

The kicker for me was checking out a mathematics education blog by an author I admire greatly, to discover it was built with the static blog engine jekyll. So being the inventive bloke I am, I thought I'd do the same.

But a bit of hunting led me to Hugo, which apparently is very similar to jekyll, but much faster, and written in Go instead of Ruby. Since I know nothing about either Go or Ruby I don't know if it's the language which makes the difference, or something else. But it sure looks nice, and supports mathjax for LaTeX.

So my current plan is to migrate from wordpress to Hugo, and see how it goes!

Python GIS, and election results

Election mapping

A few weeks ago there was a by-election in my local electorate (known as an electoral division) of Batman here in Australia. I was interested in comparing the results of this election with the previous election two years ago. In this division it's become a two-horse race: the Greens against the Australian Labor Party. Although Batman had been a solid Labor seat for almost its entire existence - it used to be considered one of the safest Labor seats in the country - over the past decade or so the Greens have been making inroads into this Labor heartland, to the extent that is no longer considered a safe seat. And in fact for this particular election the Greens were the popular choice to win. In the end Labor won, but my interest is not so much tracing the votes, but trying to map them.

Python has a vast suite of mapping tools, so much so that it may be that Python has become the GIS tool of choice. And there are lots of web pages devoted to discussing these tools and their uses, such as this one.

My interest was producing maps such as are produced by pollbludger This is the image from that page:


As you can see there are basically three elements:

  • the underlying streetmap
  • the border of the division
  • the numbers showing the percentage wins of each party at the various polling booths.

I wanted to do something similar, but replace the numbers with circles whose sizes showed the strength of the percentage win at each place.

Getting the information

Because this election was in a federal division, the management of the polls and of the results (including counting the votes) was managed by the Australian Electoral Commission, whose pages about this by-election contain pretty much all publicly available information. You can copy and paste the results from their pages, or download them as CSV files.

Then I needed to find the coordinates (Longitude and Latitude) of all the polling places, of which there were 42 at fixed locations. There didn't seem to be a downloadable file for this, so for each booth address (given on the AEC site), I entered it into Google Maps and copied down the coordinates as given.

The boundaries of all the divisions can again be downloaded from the AEC GIS page. These are given in various standard GIS files.

Putting it all together

The tools I felt brave enough to use were:

  • Pandas: Python's data analysis library. I really only needed to read information from CSV files that I could then use later.
  • Geopandas: This is a GIS library with Pandas-like syntax, and is designed in part to be a GIS extension to Pandas. I would use it to extract and manage the boundary data of the electoral division.
  • Cartopy: which is a library of "cartographic tools".

And of course the standard matplotlib for plotting, numpy for array handling.

My guides were the London tube stations example from Cartopy and a local (Australian) data analysis blog which discussed the use of Cartopy including adding graphics to an map image.

There are lots of other GIS tools for Python, some of which seem to be very good indeed, and all of which I downloaded:

  • Fiona: which is a "nimble" API for handling maps
  • Descartes: which provides a means by which matplotlib can be used to manage geographic objects
  • geoplotlib: for "visualizing geographical data and making maps"
  • Folium: for visualizing maps using the leaflet.js library. It may be that the mapping I wanted to do with Python could have been done just as well in Javascript alone. And probably other languages. I stuck with Python simply because I knew it best.
  • QGIS: which is designed to be a complete free and open source GIS, and with APIs both for Python and C++
  • GDAL: the "Geospatial Data Abstraction Library" which has a Python package also called GDAL, for manipulating geospatial raster and vector data.

I suspect that if I was professionally working in the GIS area some or all of these packages would be at least as - and maybe even more - suitable than the ones I ended up using. But then, I was starting from a position of absolute zero with regards to GIS, and also I wanted to be able to make use of the tools I already knew, such as Pandas, matplotlib, and numpy.

Here's the start, importing the libraries, or the bits of them I needed:

import matplotlib.pyplot as plt
import numpy as np
import as ccrs
from import GoogleTiles
import geopandas as gpd
import pandas as pd

I then had to read in the election data, which was a CSV files from the AEC containing the Booth, and the final distributed percentage weighting to the ALP and Greens candidates, and heir percentage scores. As well, I read in the boundary data:

bb = pd.read_csv('Elections/batman_booths_coords.csv')  # contains all election info plus lat, long of booths
longs = np.array(bb['Long'])
lats = np.array(bb['Lat'])
v = gpd.read_file('VicMaps/VIC_ELB.MIF')  # all electoral divisions in MapInfo form
bg = v.loc[2].geometry                    # This is the Polygon representing Batman
b_longs = bg.exterior.xy[0]               # These next two lines are the longitudes and latitudes
b_lats = bg.exterior.xy[1]                #

Notice that bb uses Pandas to read in the CSV files which contains all the AEC information, as well as the latitude and longitude of each Booth, which I'd added myself. Here longs and lats are the coordinates of the polling booths, and b_longs and b-lats are all the vertices which form the boundary of the division.

Now it's all pretty straigtforward, especially with the examples mentioned above:

fig = plt.figure(figsize=(16,16))

tiler = GoogleTiles()
ax = plt.axes(

ax.set_extent((bg.bounds[0]-margin, bg.bounds[2]+margin,bg.bounds[1]-margin, bg.bounds[3]+margin))

for i in range(44):

plt.title('Booth results in the 2018 Batman by-election')

Here GoogleTiles provide the street map to be used as the "base" of our map. Open Streep Map (as OSM) is available too, but I thin in this instance, Google Maps is better. Because the map is rendered as an image (with some unavoidable blurring), I find that Google gave a better result than OSM.

Also, ga2 is a little array which simply produces plotting of the style ro (red circle) or go (green circle). Again, I make the program do most of the work.

And here is the result, saved as an image:

!Batman 2018

I'm quite pleased with this output.

Presentations and the delight of js-reveal

Presentations are a modern bugbear. Anybody in academia or business, or any professional field really, will have sat through untold hours of presentations. And almost all of them are terrible. Wordy, uninteresting, too many "transition effects", low information content, you know as well as I do.

Pretty much every speaker reads the words on their slides, as though the audience were illiterate. I went to a talk once which consisted of 60 -- yes, sixty -- slides of very dense text, and the presenter read through each one. I think people were gnawing their own limbs off out of sheer boredom by the end. Andy Warhol's "Empire" would have been a welcome relief.

Since most of my talks are technical and full of mathematics, I have naturally gravitated to the LaTeX presentation tool Beamer. Now Beamer is a lovely thing for LaTeX: as part of the LaTeX ecosystem you get all of LaTeX loveliness along with elegant slide layouts, transitions, etc. My only issue with Beamer (and this is not a new observation by any means), is that all Beamer presentations have a certain sameness to them. I suspect that this is because most Beamer users are mathematicians, who are rightly more interested in co[[][]]ntent than appearance. It is quite possible of course to make Beamer look like something new and different, but hardly anybody does.

However, I am not a mathematician, I am a mathematics educator, and I do like my presentations to look good, and if possible to stand out a little. I also have a minor issue in that I use Linux on my laptop, which sometimes means my computer won't talk to an external projector system. Or my USB thumb drive won't be recognized by the computer I'll be using, and so on. One way round all this is to use an online system; maybe one which can be displayed in a browser, and which can be placed on a web server somewhere. There are of course plenty of such tools, and I have had a brief dalliance with prezi, but for me prezi was not the answer: yes it was fun and provided a new paradigm for organizing slides, but really, when you took the whizz-bang aspect out, what was left? The few prezis I've seen in the wild showed that you can be as dull with prezi as with any other software. Also, at the time it didn't support mathematics.

In fact I have an abiding distrust of the whole concept of "presentations". Most are a colossal waste of time -- people can read so there's no need for wordiness, and most of the graphs and charts that make up the rest of most slides are dreary and lacklustre. Hardly anybody knows how to present information graphically in a way that really grabs people's attention. It's lazy and insulting to your audience to simply copy a chart from your spreadsheet and assume they'll be delighted by it. Then you have the large class of people who fill their blank spaces with cute cartoons and clip art. This sort of thing annoys me probably more than it should -- when I'm in an audience I don't want to be entertained with cute irrelevant additions, I want to learn. This comes to the heart of presenting. A presenter is acting as a teacher; the audience the learners. So presenting should be about engaging the audience. What's in your slides comes a distant second. I don't want new technology with clever animations and transitions, bookmarks, non-linear slide shows; I want presenters to be themselves interesting. (As an aside, some of the very worst presentations have been at education conferences.)

For a superb example of attention-grabbing graphics, check out the TED talk by the late Hans Rosling. Or you can admire the work of David McCandless.

I seem to have digressed, from talking about presentation software to banging on about the awfulness of presentations generally. So, back to the topic.

For a recent conference I determined to do just that: use an online presentation tool, and I chose reveal.js. I reckon reveal.js is presentations done right: elegant, customizable, making the best use of html for content and css for design; and with nicely chosen defaults so that even if you just put a few words on your slides the result will still look good. Even better, you can take your final slides and put them up on github pages so that you can access them from anywhere in the world with a web browser. And if you're going somewhere which is not networked, you can always take your slides on some sort of portable media. And it has access to almost all of LaTeX via MathJax.

One minor problem with reveal.js is that the slides are built up with raw html code, and so can be somewhat verbose and hard to read (at least for me). However, there is a companion software for emacs org mode called org-reveal, which enables you to structure your reveal.js presentation as an org file. This is presentation heaven. The org file gives you structure, and reveal.js gives you a lovely presentation.

To make it available, you upload all your presentations to github.pages, and you can present from anywhere in the world with an internet connection! You can see an example of one of my short presentations at

Of course the presentation (the software and what you do with it), is in fact the least part of your talk. By far the most important part is the presenter. The best software in the world won't overcome a boring speaker who can't engage an audience.

I like my presentations to be simple and effect-free; I don't want the audience to be distracted from my leaping and capering about. Just to see how it works

The Vigenere cipher in haskell

Programming the Vigenère cipher is my go-to problem when learning a new language. It's only ever a few lines of code, but it's a pleasant way of getting to grips with some of the basics of syntax. For the past few weeks I've been wrestling with Haskell, and I've now got to the stage where a Vigenère program is in fact pretty easy.

As you know, the Vigenère cipher works using a plaintext and a keyword, which is repeated as often as need be:


The corresponding letters are added modulo 26 (using the values A=0, B=1, C=2, and on up to Z=25), then converted back to letters again. So for the example above, we have these corresponding values:

19   7   8  18   8  18  19   7   4  15  11   0   8  13  19   4  23  19
10   4  24  10   4  24  10   4  24  10   4  24  10   4  24  10   4  24

Adding modulo 26 and converting back to letters:

3  11   6   2  12  16   3  11   2  25  15  24  18  17  17
D   L   G   C   M   Q   D   L   C   Z   P   Y   S   R   R

gives us the ciphertext.

The Vigenère cipher is historically important as it is one of the first cryptosystems where a single letter may be encrypted to different characters in the ciphertext. For example, the two "S"s are encrypted to "C" and "Q"; the first and last "T"s are encrypted to "D" and "R". For this reason the cipher was considered unbreakable - as indeed it was for a long time - and was known to the French as le chiffre indéchiffrable - the unbreakable cipher. It was broken in 1863. See the Wikipedia page for more history.

Suppose the length of the keyword is . Then the -th character of the plaintext will correspond to the character of the keyword (assuming a zero-based indexing). Thus the encryption can be defined as

\[ c_i = p_i+k_{i\pmod{n}}\pmod{26} \]

However, encryption can also be done without knowing the length of the keyword, but by shifting the keyword each time - first letter to the end - and simply taking the left-most letter. Like this:


so "T"+"K" (modulo 26) is the first encryption. Then we shift the keyword:

  E Y K

and "H"+"E" (modulo 26) is the second encrypted letter. Shift again:

    Y K E

for "I"+"Y"; shift again:

      K E Y

for "S"+"K". And so on.

This is almost trivial in Haskell. We need two extra functions from the module Data.Char: chr which gives the character corresponding to the ascii value, and ord which gives the ascii value of a character:

λ> ord 'G'
λ> chr 88

So here's what might go into a little file called vigenere.hs:

import Data.Char (ord,chr)

vige :: [Char] -> [Char] -> [Char]
vige [] k = []
vige p [] = []
vige (p:ps) (k:ks) = (encode p k):(vige ps (ks++[k]))
    encode a b = chr $ 65 + mod (ord a + ord b) 26

vigd :: [Char] -> [Char] -> [Char]
vigd [] k = []
vigd p [] = []
vigd (p:ps) (k:ks) = (decode p k):(vigd ps (ks++[k]))
    decode a b = chr $ 65 + mod (ord a - ord b) 26

And a couple of tests: the example from above, and the one on the Wikipedia page:


Analysis of a recent election

On November 18, 2017, a by-election was held in my suburb of Northcote, on account of the death by cancer of the sitting member. It turned into a two-way contest between Labor (who had held the seat since its inception in 1927), and the Greens, who are making big inroads into the inner city. The Greens candidate won, much to Labor's surprise. As I played a small part in this election, I had some interest in its result. And so I thought I'd experiment with the results and see how close the result was, and what other voting systems might have produced.

In Australia, the voting method used for almost all lower house elections (state and federal), is Instant Runoff Voting, also known as the Alternative Vote, and known locally as the "preferential method". Each voter must number the candidates sequentially starting from 1. All boxes must be filled in (except the last); no numbers can be repeated or missed. In Northcote there were 12 candidates, and so each voter had to number the boxes from 1 to 12 (or 1 to 11); any vote without those numbers is invalid and can't be counted. Such votes are known as "informal". Ballots are distributed according to first preferences. If no candidate has obtained an absolute majority, then the candidate with the lowest count is eliminated, and all those ballots distributed according to their second preferences. This continues through as many redistributions as necessary until one candidate ends up with an absolute majority of ballots. So at any stage the candidate with the lowest number of ballots is eliminated, and those ballots redistributed to the remaining candidates on the basis of the highest preferences. As voting systems go it's not the worst, although it has many faults. However, it is too entrenched in Australian political life for change to be likely.

Each candidate had prepared a How to Vote card, listing the order of candidates they saw as being most likely to ensure a good result for themselves. In fact there is no requirement for any voter to follow a How to Vote card, but most voters do. For this reason the ordering of candidates on these cards is taken very seriously, and one of the less savoury aspects of Australian politics is backroom "preference deals", where parties will wheel and deal to ensure best possible preference positions on other How to Vote cards.

Here are the 12 candidates and their political parties, in the order as listed on the ballots:

Attention: The internal data of table "4" is corrupted!

For this election the How to Vote cards can be seen at the ABC news site. The only candidate not to provide a full ordered list was Joseph Toscano, who simple advised people to number his square 1, and the other squares in any order they liked, along with a recommendation for people to number Lidia Thorpe 2.

As I don't have a complete list of all possible ballots with their orderings and numbers, I'm going to make the following assumptions:

  1. Every voter followed the How to Vote card of their preferred candidate exactly.
  2. Joseph Toscano's preference ordering is: 3,4,2,5,6,7,8,9,1,10,11,12 (This gives Toscano 1; Thorpe 2; and puts the numbers 3 -- 12 in order in the remaining spaces).

These assumptions are necessarily crude, and don't reflect the nuances of the election. But as we'll see they end up providing a remarkably close fit with the final results.

For the exploration of the voting data I'll use Python, and so here is all the How to Vote information as a dictionary:

In [ ]: htv = dict()

	In [ ]: cands = list(htv.keys())

voting took place at different voting centres (also known as "booths"), and the first preferences for each candidate at each booth can be found at the Victorian Electoral Commission. I copied this information into a spreadsheet and saved it as a CSV file. I then used the data analysis library pandas to read it in as a DataFrame:

In [ ]: import pandas as pd
	firstprefs = pd.read_csv('northcote_results.csv')
	firsts = firstprefs.loc[:,'Hayward':'Fontana'].sum(axis=0)

Out[ ]:
Hayward        354
Sanaghan       208
Thorpe       16254
Lenk           770
Chipp         1149
Cooper         433
Rossiter      1493
Burns        12721
Toscano        329
Edwards        154
Spirovska      214
Fontana       1857
dtype: int64

As Thorpe has more votes than any other candidate, then by the voting system of simple plurality (or First Past The Post) she would win. This system is used in the USA, and is possibly the worst of all systems for more than two candidates.

Checking IRV

So let's first check how IRV works, with a little program that starts with a dictionary and first preferences of each candidate. Recall our simplifying assumption that all voters vote according to the How to Vote cards, which means that when a candidate is eliminated, all those votes will go to just one other remaining candidate. In practice, of course, those ballots would be redistributed across a number of candidates.

Here's a simple program to manage this version of IRV:

def IRV(votes):
    # performs an IRV simulation on a list of first preferences: at each stage
    # deleting the candidate with the lowest current score, and distributing
    # that candidates votes to the highest remaining candidate
    vote_counts = votes.copy()
    for i in range(10):
	m = min(vote_counts.items(), key = lambda x: x[1])
	ind = next(j for j in range(2,11) if cands[htv[m[0]].index(j)] in vote_counts)
	c = cands[htv[m[0]].index(ind)]
	vote_counts += m[1]

We could make this code a little more efficient by stopping when any candidate has amassed over 50% pf the votes. But for simplicity we'll eliminate 10 of the 12 candidates, so it will be perfectly clear who has won. Let's try it out:

In [ ]: IRV(firsts)
  Out[ ]:
  Thorpe    18648
  Burns     17288
  dtype: int64

Note that this is very close to the results listed on the VEC site:

Thorpe:    18380
Burns:     14410
Fontana:   3298

At this stage it doesn't matter where Fontana's votes go (in fact they would go to Burns), as Thorpe already has a majority. But the result we obtained above with our simplifying assumptions gives very similar values.

Now lets see what happens if we work through each booth independently:

In [ ]: finals = {'Thorpe':0,'Burns':0}

In [ ]: for i in firstprefs.index:
   ...:     booth = dict(firstprefs.loc[i,'Hayward':'Fontana'])
   ...:     f = IRV(booth)
   ...:     finals['Thorpe'] += f['Thorpe']
   ...:     finals['Burns'] += f['Burns']
   ...:     print(firstprefs.loc[i,'Booth'],': ',f)
Alphington :  {'Thorpe': 524, 'Burns': 545}
Alphington North :  {'Thorpe': 408, 'Burns': 485}
Bell :  {'Thorpe': 1263, 'Burns': 893}
Croxton :  {'Thorpe': 950, 'Burns': 668}
Darebin Parklands :  {'Thorpe': 180, 'Burns': 204}
Fairfield :  {'Thorpe': 925, 'Burns': 742}
Northcote :  {'Thorpe': 1043, 'Burns': 875}
Northcote North :  {'Thorpe': 1044, 'Burns': 1012}
Northcote South :  {'Thorpe': 1392, 'Burns': 1137}
Preston South :  {'Thorpe': 677, 'Burns': 639}
Thornbury :  {'Thorpe': 1158, 'Burns': 864}
Thornbury East :  {'Thorpe': 1052, 'Burns': 804}
Thornbury South :  {'Thorpe': 1310, 'Burns': 1052}
Westgarth :  {'Thorpe': 969, 'Burns': 536}
Postal Votes :  {'Thorpe': 1509, 'Burns': 2262}
Early Votes :  {'Thorpe': 5282, 'Burns': 3532}

In [ ]: finals
Out[ ]: {'Burns': 16250, 'Thorpe': 19686}

Note again that the results are surprisingly close to the "two-party preferred" results as reported again on the VEC site. This adds weight to the notion that our assumptions, although crude, do in fact provide a reasonable way of experimenting with the election results.

Borda counts

These are named for Jean Charles de Borda (1733 -- 1799) an early voting theorist. The idea is to weight all the preferences, so that a preference of 1 has a higher weighting that a preference of 2, and so on. All the weights are added, and the candidate with the greatest total is deemed to be the winner. With candidates, there are different methods of determining weighting; probably the most popular is a simple linear weighting, so that a preference of is weighted as . This gives weightings from down to zero. Alternatively a weighting of can be used, which gives weights of down to

  1. Both are equivalent in determining a winner. Another possible

weighting is .

Here's a program to compute Borda counts, again with our simplification:

def borda(x): # x is 0 or 1
    borda_count = dict()
    for c in cands:
    for c in cands:
	v = firsts  #  number of 1st pref votes for candidate c
	for i in range(1,13):
	    appr = cands[htv.index(i)]  # the candidate against position i on c htv card
	    if x==0:
		borda_count[appr] += v/i
		borda_count[appr] += v*(11-i)
    if x==0:
	for k, val in borda_count.items():
	    borda_count[k] = float("{:.2f}".format(val))
	for k, val in borda_count.items():
	    borda_count[k] = int(val)

Now we can run this, and to make our lives easier we'll sort the results:

In [ ]: sorted(borda(1).items(), key = lambda x: x[1], reverse = True)
Out[ ]:
[('Burns', 308240),
 ('Thorpe', 279392),
 ('Lenk', 266781),
 ('Chipp', 179179),
 ('Cooper', 167148),
 ('Spirovska', 165424),
 ('Edwards', 154750),
 ('Hayward', 136144),
 ('Fontana', 88988),
 ('Toscano', 80360),
 ('Rossiter', 75583),
 ('Sanaghan', 38555)]

In [ ]: sorted(borda(0).items(), key = lambda x: x[1], reverse = True)
Out[ ]:
[('Burns', 22409.53),
 ('Thorpe', 20455.29),
 ('Lenk', 11485.73),
 ('Chipp', 10767.9),
 ('Spirovska', 6611.22),
 ('Cooper', 6592.5),
 ('Edwards', 6569.93),
 ('Hayward', 6186.93),
 ('Fontana', 6006.25),
 ('Rossiter', 5635.08),
 ('Toscano', 4600.15),
 ('Sanaghan', 4196.47)]

Note that in both cases Burns has the highest output. This is in general to be expected of Borda counts: that the highest value does not necessarily correspond to the candidate which is seen as better overall. For this reason Borda counts are rarely used in modern systems, although they can be used to give a general picture of an electorate.

Condorcet criteria

There are a vast number of voting systems which treat the vote as simultaneous pairwise contests. For example in a three way contest, between Alice, Bob, and Charlie the system considers the contest between Alice and Bob, between Alice and Charlie, and between Bob and Charlie. Each of these contests will produce a winner, and the outcome of all the pairwise contests is used to determine the overall winner. If there is a single person who is preferred, by a majority of voters, in each of their pairwise contests, then that person is called a Condorcet winner. This is named for the Marquis de Condorcet (1743 -- 1794) another early voting theorist. The Condorcet criterion is one of many criteria considered appropriate for a voting system; it says that if the ballots return a Condorcet winner, then that winner should be chosen by the system. This is one of the faults of IRV: that it does not necessarily return a Condorcet winner.

Let's look again at the How to Vote preferences, and the numbers of voters of each:

In [ ]: htvd = pd.DataFrame(list(htv.values()),index=htv.keys(),columns=htv.keys()).transpose()
In [ ]: htvd.loc['Firsts']=list(firsts.values)
In [ ]: htvd

Out[ ]:
	   Hayward  Sanaghan  Thorpe  Lenk  Chipp  Cooper  Rossiter  Burns  Toscano  Edwards  Spirovska  Fontana
Hayward          1         3       6     7     10       5         6     10        3        2          2        2
Sanaghan        10         1       9     8     12      12        12     12        4       10         12        3
Thorpe           7         2       1     3      4       8         9      5        2        4          3        4
Lenk             6         5       3     1      5       6        11      3        5        3          7        5
Chipp            8         6      10     5      1       2         2      2        6        8          4        6
Cooper           5         7       8    11      6       1         7      4        7        9          5        7
Rossiter        12         8      12    12      7       7         1      6        8       12          6        8
Burns           11         9       2     2      3       3         5      1        9        6          8        9
Toscano          3        10       7     9     11      11         8     11        1        5         10       10
Edwards          2        11       4     4      9       9        10      9       10        1          9       11
Spirovska        4        12       5     6      2      10         3      8       11        7          1       12
Fontana          9         4      11    10      8       4         4      7       12       11         11        1
Firsts         354       208   16254   770   1149     433      1493  12721      329      154        214     1857

Here the how to vote information is in the columns. If we look at just the first two candidates, we see that Hayward is preferred to Sanaghan by all voters except for those who voted for Sanaghan. Thus a majority (in fact, nearly all) voters preferred Hayward to Sanaghan.

For each pair of candidates, the number of voters preferring one to the other can be computed by this program:

def condorcet():
    condorcet_table = pd.DataFrame(columns=cands,index=cands).fillna(0)
    for c in cands:
	hc = htv
	for i in range(12):
	    for j in range(12):
		if hc[i] &lt; hc[j]:
		    condorcet_table.loc[cands[i],cands[j]] += firsts

We can see the results of this program:

In [ ]: ct = condorcet(); ct
Out[ ]:
	   Hayward  Sanaghan  Thorpe   Lenk  Chipp  Cooper  Rossiter  Burns  Toscano  Edwards  Spirovska  Fontana
Hayward          0     35728    4505   5042  19370   21633     20573   3116    35607     4888       3335    18283
Sanaghan       208         0    2065   2394  18648    3164     19926   2748     2835     2394       2394    17715
Thorpe       31431     33871       0  21504  20140   20935     34010  19370    33760    35428      32726    32153
Lenk         30894     33542   14432      0  19926   33442     34229   3886    33760    33935      32726    31945
Chipp        16566     17288   15796  16010      0   18895     34443   6037    18845    18404      18960    33871
Cooper       14303     32772   15001   2494  17041       0     34443   3395    18075    18404      15548    31608
Rossiter     15363     16010    1926   1707   1493    1493         0   4101    18075    18404      17041    15906
Burns        32820     33188   16566  32050  29899   32541     31835      0    35099    35428      32726    32024
Toscano        329     33101    2176   2176  17091   17861     17861    837        0     3887       2902    18075
Edwards      31048     33542     508   2001  17532   17532     17532    508    32049        0      20359    18075
Spirovska    32601     33542    3210   3210  16976   20388     18895   3210    33034    15577          0    20717
Fontana      17653     18221    3783   3991   2065    4328     20030   3912    17861    17861      15219        0

What we want to see, of course, if anybody has obtained a majority of preferences against everybody else. To do this we can find all the values greater than the majority, and add up their number. A value of 11 indicates a Condorcet winner:

In [ ]: maj = firsts.sum()//2 + 1; maj
Out[ ]: 17969

In [ ]: ((ct &gt;= maj)*1).sum(axis = 1)
Out[ ]:
Hayward       6.0
Sanaghan      2.0
Thorpe       11.0
Lenk          9.0
Chipp         6.0
Cooper        5.0
Rossiter      2.0
Burns        10.0
Toscano       2.0
Edwards       5.0
Spirovska     6.0
Fontana       2.0
dtype: float64

So in this case we do indeed have a Condorcet winner in Thorpe, and this election (at least with our simplifying assumptions) is also one in which IRV returned the Condorcet winner.

Range and approval voting

If you go to you'll find a nspirited defense of a system called range voting. To vote in such a system, each voter gives an "approval weight" for each candidate. For example, the voter may mark off a value between 0 and 10 against each candidate, indicating their level of approval. There is no requirement for a voter to mark candidates differently: a voter might give all candidates a value of 10, or of zero, or give one candidate 10 and all the others zero. One simplified version of range voting is approval voting, where the voter simply indicates as many or as few candidates as she or he approves of. A voter may approve of just one candidate, or all of them. As with range voting, the winner is the one with the maximum number of approvals. A system where each voter approves of just one candidate is the First Past the Post system, and as we have seen previously, this is equivalent to simply counting only the first preferences of our ballots.

We can't possibly know how voters may have approved of the candidates, but we can run a simple simulation: given a number between 1 and 12, suppose that each voter approves of their first preferences. Given the preferences and numbers, we can easily tally the approvals for each voter:

def approvals(n):
      # Determines the approvals result if voters took their
      # first n preferences as approvals
      approvals_result = dict()
      for c in cands:
	  approvals_result = 0
      firsts = firstprefs.loc[:,'Hayward':'Fontana'].sum(axis=0)
      for c in cands:
	  v = firsts  #  number of 1st pref votes for candidate c
	  for i in range(1,n+1):
	      appr = cands[htv.index(i)]  # the candidate against position i on c htv card
	      approvals_result[appr] += v

Now we can see what happens with approvals for :

In [1 ]: for i in range(1,7):
    ...:     si = sorted(approvals(i).items(),key = lambda x: x[1],reverse=True)
    ...:     print([i]+[s[0] for s in si])
[1, 'Thorpe', 'Burns', 'Fontana', 'Rossiter', 'Chipp', 'Lenk', 'Cooper', 'Hayward', 'Toscano', 'Spirovska', 'Sanaghan', 'Edwards']
[2, 'Burns', 'Thorpe', 'Chipp', 'Hayward', 'Fontana', 'Rossiter', 'Spirovska', 'Lenk', 'Edwards', 'Cooper', 'Toscano', 'Sanaghan']
[3, 'Burns', 'Lenk', 'Thorpe', 'Chipp', 'Hayward', 'Spirovska', 'Sanaghan', 'Fontana', 'Rossiter', 'Toscano', 'Edwards', 'Cooper']
[4, 'Burns', 'Lenk', 'Thorpe', 'Edwards', 'Chipp', 'Cooper', 'Fontana', 'Spirovska', 'Hayward', 'Sanaghan', 'Rossiter', 'Toscano']
[5, 'Thorpe', 'Lenk', 'Burns', 'Spirovska', 'Edwards', 'Chipp', 'Cooper', 'Fontana', 'Hayward', 'Sanaghan', 'Rossiter', 'Toscano']
[6, 'Lenk', 'Thorpe', 'Burns', 'Hayward', 'Spirovska', 'Chipp', 'Edwards', 'Cooper', 'Rossiter', 'Fontana', 'Sanaghan', 'Toscano']

It's remarkable, that after , the first number of approvals required for Thorpe again to win is .

Other election methods

There are of course many many other methods of selecting a winning candidate from ordered ballots. And each of them has advantages and disadvantages. Some of the disadvantages are subtle (although important); others have glaring inadequacies, such as first past the post for more than two candidates. One such comparison table lists voting methods against standard criteria. Note that IRV -- the Australian preferential system -- is one of the very few methods to fail monotonicity. This is seen as one of the system's worst failings. You can see an example of this in an old blog post.

Rather than write our own programs, we shall simply dump our information into the Ranked-ballot voting calculator page and see what happens. First the data needs to be massaged into an appropriate form:

In [ ]: for c in cands:
   ...:     st = str(firsts)+":"+c
   ...:     for i in range(2,13):
   ...:         st += "&gt;"+cands[htv.index(i)]
   ...:     print(st)

The above can be copied and pasted into the given text box. Then the page returns:

winner method(s)
Thorpe Baldwin
Burns Borda

You can see that Thorpe would be the winner under almost every other voting system. This indicates that Thorpe being returned by IRV seems not just an artifact of the system, but represents the genuine wishes of the electorate.

Programmable CAD

Every few years I decide to have a go at using a CAD package for the creation of 3D diagrams and shapes, and every time I give it up. There's simply too much to learn in terms of creating shapes, moving them about, and so on, and every system seems to have its own ways of doing things. My son (who is an expert in Blender) recommended that I experiment with Tinkercad, and indeed this is probably a pretty easy way of getting started with 3D CAD. But it didn't suit me: I wanted to place things precisely in relation to each other, and fiddling with dragging and dropping with the mouse was harder and more inconvenient than it should have been. No doubt there are ways of getting exact line ups, but it isn't obvious to the raw beginner.

I then discovered that there are lots of different CAD "programming languages"; or more properly scripting languages, where the user describes how the figure is to be built in the system's language. Then the system builds it from the script. In this sense these systems are descendants of the venerable VRML, of which you can see some examples here, and its modern version X3D.

Some of the systems that I looked at were:

No doubt there are others. All of these systems have primitive shapes (spheres, cubes, cylinders etc), operations on shapes (shifting, stretching, rotating, extruding etc) so a vast array of different forms can be generated. Some systems allow for a great deal of flexibility, so that a cylinder with a radius of zero at one end will be a cone, or of different radii at each end a frustum.

I ended up choosing OpenJSCAD, which is being actively developed, is based on a well known and robust language, and is also great fun to use. Here is a simple example, to construct a tetrahedron whose vertices are chosen from the vertices of a cube with vertices . The vertices whose product is 1 will be the vertices of a tetrahedron. We can make a nice tetrahedral shape by putting a small sphere at each vertex, and joining each sphere by a cylinder of the same radius:

The code should be fairly self-explanatory. And here is the tetrahedron:

I won't put these models in this post, as one of them is slow to render: but look at a coloured tetrahedron, and an icosahedron.

Note that CAD design of this sort is not so much for animated media so much as precise designs for 3D printing. But I like it for exploring 3D geometry.