Suppose we have a list of incomes, and the number of people earning that income (where we may assume that each number here refers to thousands), for example:

Population Income ---------- ------ 741 0 381 200 692 400 778 600 20 800 662 1000 228 1200 796 1400 221 1600 51 1800 361 2000

The only restriction is that the incomes must be listed in increasing order. Start by ordering by forming the cumulative sum of both incomes and populations, and scale each sum to between 0 and 1:

Scaled cumulative population Scaled cumulative income ---------------------------- ------------------------ 0.15027 0.00000 0.22754 0.01818 0.36788 0.05455 0.52565 0.10909 0.52971 0.18182 0.66396 0.27273 0.71020 0.38182 0.87163 0.50909 0.91645 0.65455 0.92679 0.81818 1.00000 1.00000

The right column is now a fraction of the total income earned, and the left column is the fraction of the population who earn up to that income.

What we have now is the fraction of population which earns a fraction of the total income. Plot income against population; the result will be a concave curve known as a *Lorenz curve* :

If income is perfectly equal, then for any fraction between 0 and 1, that fraction of the population will earn that fraction of total income, and the Lorenz curve will be the straight line . The Gini coefficient is defined to be the fraction:

or more simply

Thus the larger the Gini coefficient, the more unequal the income. A population which enjoys perfectly equal incomes will have a Gini coefficient of zero.

Given a discrete list of scaled cumulative sums of incomes and population, the integral of the Lorenz curve can be approximated by trapezoidal sums. If the cumulative income values are

and the cumulative population values are

then the area of the trapezoid between the values of and is

Thus the Gini coefficient can be computed as

where the sum is twice the area of trapezoids which form the area under the Lorenz curve.

This can be computed easily in Matlab or any other matrix-oriented language such as GNU Octave or Scilab. Suppose `cp` and `ci` are the lists from above corresponding to population and income. Then the Gini coefficient can be computed as:

1-sum(diff(cp).*(ci(1:end-1)+ci(2:end))) ans = 0.52579

We shall look at Australian incomes as obtained from the Australian Bureau of Statistics, in ten year intervals; 1995-1996, 2005-2006, and 2015-2016 obtained from http://www.abs.gov.au/ausstats/abs@.nsf/mf/6302.0

The raw data given in the following table, where the values represents thousands of people earning that particular income:

Income 1995-1996 2005-2006 2015-2016 ------- --------- --------- --------- 0.00 28.70 19.80 0.00 99.00 57.00 67.10 92.40 199.00 58.20 36.30 48.30 299.00 502.10 195.00 87.80 399.00 393.10 601.90 159.20 499.00 511.70 272.60 598.90 599.00 363.30 456.10 359.40 699.00 340.60 402.20 401.60 799.00 315.00 328.70 398.70 899.00 291.00 312.80 341.40 999.00 307.50 274.00 317.80 1099.00 264.60 266.50 315.30 1199.00 244.30 282.40 255.60 1299.00 204.60 295.40 271.40 1399.00 239.70 262.50 282.50 1499.00 220.80 236.20 238.30 1599.00 211.70 250.70 236.60 1699.00 200.30 216.80 251.80 1799.00 190.10 202.50 228.60 1899.00 174.20 213.80 230.90 1999.00 185.50 204.60 221.00 2199.00 263.00 369.40 426.30 2399.00 214.90 367.60 369.00 2599.00 165.70 271.60 334.80 2799.00 144.40 251.50 293.50 2999.00 120.10 219.50 248.70 3499.00 161.40 338.70 547.80 3999.00 90.00 243.20 371.40 4999.00 79.40 217.90 487.60 5000.00 77.10 227.60 509.90

Suppose the data is read into GNU Octave as a matrix `H`, then we can determine the scaled cumulative sums, and hence the Gini coefficients:

> tmp = H(:,1); > inc = cumsum(tmp)/sum(tmp); > tmp = H(:,2); > pop1 = cumsum(tmp)/sum(tmp); > tmp = H(:,3); > pop2 = cumsum(tmp)/sum(tmp); > tmp = H(:,4); > pop3 = cumsum(tmp)/sum(tmp); > gini1 = 1-sum(diff(pop1).*(inc(1:end-1)+inc(2:end))) ans = 0.57696 > gini2 = 1-sum(diff(pop2).*(inc(1:end-1)+inc(2:end))) ans = 0.42627 > gini3 = 1-sum(diff(pop3).*(inc(1:end-1)+inc(2:end))) ans = 0.29479

This means we have:

Year | Gini Coefficient |
---|---|

1995 – 1996 | 0.57696 |

2005 – 2006 | 0.42627 |

2015 – 2016 | 0.29479 |

which seems to indicate that inequality is in fact *decreasing*. We can also plot the Lorenz curves for each population set in turn, and for comparison also show the line of maximum income equality:

The Lorenz curves become closer to the straight line, which indicates a decrease in inequality over this sequence of measurements, at least as measured by the Gini coefficient.

]]>One of the outcomes of this theorem is that any attempt to map the earth’s surface onto a plane – to create a map projection – will require some compromise. You have to give up any of shape, size, angles. This hasn’t stopped cartographers trying for several thousand years to find the best compromise, and there are now many many different projections.

Note, for simplicity I will speak of the earth as a sphere, even though it’s not: it’s flattened slightly north-south, and bulges slightly at the equator, making it an oblate spheroid. But it’s pretty *close* to a sphere: its flattening, the value of with and being the minor and major axes of an ellipse from a cross-section through the poles, is only about .

Standard projections, seen in atlases, on classroom walls, and on the web, include the ancient equirectangular projection, where the earth’s surface is flattened onto a cylinder, thus vastly distorting the regions away from the equator, and the Mercator projection, where as well as flattening onto a cylinder, it is also stretched upwards – so preserving angles. This makes the Mercator projection excellent for navigation, which is what it was designed for. Mercator himself realized that there were unavoidable distortions of shape and size, and that his map was not useful for depicting landmasses. By a curious quirk of fate, this is what it is now mostly used for!

Efforts to decrease the distortions near the poles include the Robinson and Winkel tripel projections (which look superficially similar) – the latter is now the projection of choice by the National Geographic Society.

All of those projections, and many others, project the earth’s surface onto a single unbroken map. Other projections split the map to reduce distortions. One of these is the Goode Homolosine projection, which has been described as “abominable”, and a “travesty”: not only are land masses distorted, but lines of longitude bend all over the place.

Here’s a picture taken from geoawesomeness.com showing some of these standard projections:

Although you can’t map the entire sphere onto a plane, you can map small bits of it with manageable distortions. So one approach to mapping the earth was to project the sphere onto a polyhedron, and then flatten the polyhedron. Buckminster Fuller had a go at this with his Dymaxion world map, using an icosahedron. The result is certainly excellent for reducing distortion, but has an ugly, jagged look to it:

Also, many of the landmasses are (unavoidably) in curious places and at odd angles: Australia and South America are at opposite ends of the map, as the oceans have been chopped into bits to preserve the landmasses. I don’t believe this map has never got much love.

Although the icosahedron would seem to be the best choice of polyhedron because of its large number of faces (and no doubt this was Buckminster Fuller’s reasoning), the best results seems to have been obtained using an octahedron. Here is Waterman’s “butterfly projection” first developed in 1996, which is in many ways a magnificent example of good cartography:

The green circles here are Tissot indicatricies: they show the local distortions by means of small circles. As you can see, the distortions are very small indeed. And of course you can break the world up into an octahedron in such a way as to minimize distortions over particular regions. You can see more projections at the map’s own page.

Most Waterman maps show Antarctica as a separate entity, and another approach was provided many years earlier in 1909 by Bernard Cahill; Cahill’s map has been redeveloped, starting in 1975, by Gene Keyes, and Keye’s own website is a treasure house of cartographic information, as well as stern critiques of many standard projections. (This is where you’ll find Goode’s homolosine projection soundly trashed.) Keye’s version of Cahill’s map: the Cahill-Keyes projection, is as of now the best projection available:

Keyes has discussed Waterman’s map against his own, and provides various reasons why he believes the Cahill-Keyes projection is the better of the two. Remarkably, the landmasses are placed in positions and angles not vastly different from standard (Mercator, Robinson) projections we are all used to, so it doesn’t appear too strange. As with all maps, there are compromises: I’d love it if Australia and New Zealand were next to each other rather than at opposite edges. But that’s just they way the octahedron has been placed.

I believe that properly constructed polyhedral projections are the way to go, and of the several in existence, the Cahill-Keyes is the best. Even if you don’t care about the mathematics and the cartography, it simply looks terrific.

Gene Keyes very kindly responded to this post, and recommended – in view of my comments about the placements of Australia and New Zealand – that I check out another projection against a “starry night” background, and available at http://www.genekeyes.com/DW-STARRY/C-K-DW-starry.html. However, the map’s designer – Duncan Webb – was dissatisfied with this map and asked that it not be published. So I won’t include the picture here, but invite you to view it on Gene Keye’s site. But notice Australia and New Zealand in chummy proximity!

]]>One of the more powerful uses of turtle graphics is for investigating Lindenmayer Systems, named for Aristid Lindenmayer, who developed them for modelling plant growth. (As an aside, note his first name: *Aristid*. At least once I’ve seen it misspelled “Astrid”.)

To start, we can draw a simple Y-shaped tree, with a trunk, and two branches at 45 degrees from the vertical, with these commands:

fd 100 rt 45 fd 100 bk 100 lt 90 fd 100 bk 100 rt 45 bk 100

which produces something like this:

Of course we would want to be able to change the size of the tree, so we could write the procedure

to tree :size fd :size rt 45 fd :size bk :size lt 90 fd :size bk :size rt 45 bk :size end

And now with the command `tree` we can draw Y-shaped trees of any size. We have carefully designed our tree so that the turtle ends up back where we started. This means we can replace branches of the tree with smaller versions of itself:

to tree2 :size fd :size rt 45 tree :size/2 lt 90 tree :size/2 rt 45 bk :size end

Then “`cs tree2 100`” produces this:

To replace all branches with smaller copies of the tree we can perform a recursion, stopping only when the branch size reaches a certain lower limit. Like this:

to tree3 :size if :size < 2 [stop] fd :size rt 45 tree3 :size/2 lt 90 tree3 :size/2 rt 45 bk :size end

If we now enter

cs bk 150 tree3 200

(with the "`bk 150`" simply to give the tree room to grow in our graphics window), we obtain:

Now this looks like no tree found in nature. But we can easily manipulate it. All we require is for our basic, fundamental, tree, to be so designed that the turtle ends up at the start. Then we can recursively plot the branches with smaller copies of itself.

For example, here is a new basic tree:

to tree :size fd :size rt 30 fd :size/1.5 bk :size/1.5 lt 45 fd :size bk :size rt 15 bk :size end

so that "`tree 100`" produces this:

and a recursive version (which I'll now call "ltree"):

to ltree :size if :size < 2 [stop] fd :size rt 30 ltree :size/2 lt 45 ltree :size/1.5 rt 15 bk :size end

so that "`cs bk 150 ltree 150`" produces:

Already we are getting some vague semblance of a "natural" plant. Here's an example I pinched from the Ruby pages linked to above, rewritten in Logo, and with colors and widths taken out:

to ltree2 :size if :size < 5 [stop] fd :size/3 lt 30 ltree2 :size*2/3 rt 30 fd :size/6 rt 25 ltree2 :size/2 lt 25 fd :size/3 rt 25 ltree2 :size/2 lt 25 bk :size*5/6 end

And here's the result of "`cs bk 150 ltree2 250`":

(I had to do this in the online Logo interpreter at http://www.calormen.com/jslogo/, as for some reason my ucblogo kept crashing with core dumps.) But it's remarkable that with a very simple procedure we can construct a picture which already looks more like a real plant than ever.

In fact, our plants aren't Lindenmayer systems themselves, but the output of such systems. Formally, a Lindenmayer System is a grammar, or set of rules, for creating recursively defined structures. Our initial plant, the highly unlife-like one, can be expressed in the L-systems language as

angle = 45 1 -> [1 1] 0 -> [1 [0] 0]

where 1 and 0 may be interpreted as drawing a line segment, and a line segment ending in a leaf, respectively. The brackets [ and ] may be interpreted as turning left and right (by 45 degrees) respectively.

This might seem a little obtuse, and an equivalent method is given by Chris Jennings who defines this tree as

angle = 45 axiom = FX X -> S[-FX]+FX

Here F and X mean drawing the trunk and branches respectively, and S means "draw everything smaller from now on". And the term "axiom" simply means the starting value. Here it's the + and - which refer to the turns, and the brackets indicate pushing and popping objects off a stack which ensure that we maintain the current position. This means that the last line here, which is this system's "rule" can be read as:

- Make everything shorter, and
- draw a tree to the left
- draw a tree to the right

Using these symbols, another plant can be described (this is example 7 from https://en.wikipedia.org/wiki/L-system) as:

angle = 25 axiom = X X -> F[−X][X]F[−X]+FX F -> FF

In this example, the shortening is assumed. This can be turned into a Logo procedure as follows:

to ltree7 :size if :size < 5 [stop] fd :size lt 25 ltree7 :size/2 rt 25 ltree7 :size/2 fd :size/2 lt 25 ltree7 :size/2 rt 25 rt 25 fd :size/2 ltree7 :size/2 bk :size/2 lt 25 bk :size/2 bk :size end

and when invoked with

cs pu bk 250 ht pd ltree7 200 st

produces this noble plant:

Remarkable that such simple instruction sets can yield results of such beauty! It's quite possible to add colors and thicknesses to these plants to make them even more lifelike. But you get the idea.

]]>- If contains all vertices of G without repeats, and the end of P is the same as the start, then return TRUE.
- For all vertices adjacent to the end of and which are not contained in output the result of
**isHamilton**(, ). - return FALSE

For a nice introduction to backtracking, see https://www.cis.upenn.edu/~matuszek/cit594-2012/Pages/backtracking.html.

We will enter our graphs as *adjacency lists*: this is a list of lists, where each member list consists of a vertex followed by all its adjacent vertices. For example, the adjacency list of the wheel graph on 6 vertices, with 1 as the central vertex and vertices 2 – 6 in clockwork order:

would look like this:

`[[1 2 3 4 5 6] [2 1 3 6] [3 1 2 4] [4 1 3 5] [5 1 4 6] [6 1 2 5]]`

We start with two helper programs, one which tests the adjacency of two vertices, and one which tests whether a vertex can be added to an existing path:

to adjacentp :vert1 :vert2 :graph output memberp :vert1 bf item :vert2 :graph end to allowable :vert :path :graph make "bool1 memberp :vert :path make "bool2 adjacentp :vert (last :path) :graph output (and (not :bool1) :bool2) end

The Hamiltonian code now simply copies the above algorithm description:

to hamilton :path :graph print :path make "count :count + 1 if equalp (count :path) (count :graph) ~ [ifelse adjacentp (first :path) (last :path) :graph [output "TRUE] [output "FALSE]] foreach firsts :graph ~ [if allowable # :path :graph ~ [if hamilton lput # :path :graph [output "TRUE]]] output "FALSE end

The `print :path` statement does exactly that; the idea is that when the program finishes its run with a positive result, the last path printed before output TRUE will be the cycle we want. Here’s an example:

? make "wheel6 [[1 2 3 4 5 6] [2 1 3 6] [3 1 2 4] [4 1 3 5] [5 1 4 6] [6 1 2 5]] ? show hamilton [1] :wheel6 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 TRUE

As an example of a non-Hamiltonian graph, consider this simple graph on 5 vertices:

? show hamilton [1] [[1 2 4 5] [2 1 3] [3 2 4 5] [4 1 3] [5 1 3]] 1 1 2 1 2 3 1 2 3 4 1 2 3 5 1 4 1 4 3 1 4 3 2 1 4 3 5 1 5 1 5 3 1 5 3 2 1 5 3 4 FALSE

Note how in this last example, the routine explored all possible paths before giving up and returning FALSE. This is a very inefficient program; there are ways to speed up such a program, some of which are listed in the references here. When I ran this program on the Barnette-Bosák-Lederberg graph, it took several hours, and 4,173,760 calls of the program before it returned FALSE.

Clearly there are many possible improvements which could be made: for example there should be a “calling” program for which you simply enter a graph, and then the above program is used as a driver. Also, printing out all paths as they are formed is unnecessary (although it provides insight into the backtracking approach). The `allowable` program returns FALSE immediately if the path is empty: this is why in its current form the `hamilton` program needs a non-empty path at the start. We could also more conventionally enter a graph as a list of edges (for an unordered graph, each edge would be an unordered list of two elements – being the vertices at its ends), and construct an adjacency list from that. But I was only interested in a proof of concept, and in how relatively easy it was to use Logo for it.

Logo was developed at MIT in 1967 (which makes it 50 this year) by the late Seymour Papert and associates. It was designed as a “language for learning” and it’s one of the unfair twists of fate that most people’s conception of Logo (if they have one at all), is that it’s programming “for kids”; not a serious language; not worth considering by adults. Such a view is very wrong. Logo was designed according to the principle of “low threshold, high ceiling” which means that even very young children can drive a turtle around with simple commands; more adept programmers can write high-level compilers.

In many ways Logo has been superseded by more colourful, whizz-bang environments such as Scratch (also from MIT), and certainly Scratch is more immediately exciting and interesting. But Scratch is more of a kids environment. I don’t know of anybody who would seriously suggest teaching tertiary computing with Scratch. But Logo as such a beginner language has had a very persuasive advocate in Brian Harvey, whose three volumes of “Computer Science, Logo style” – still in print but also available as downloads from his website, show how Logo can be used to teach computing at a deep level.

Anyway, the apparent simplicity of Logo is one of its great strengths.

Logo is a dialect of Lisp, more by way of Scheme than Common Lisp, which should immediately give it some street cred. Although most people’s concept of Logo start and finish with Turtle Graphics, it has powerful text and list handling abilities, lots of control structures, and of course being a lisp-like language means you can write your own control structures for it.

There are many versions of Logo, but probably the closest to a “standard” would be the version developed by Brian Harvey and his students at the UC Berkeley, and known as Berkeley Logo, or just as `ucblogo`.

Here’s a little example of Logo in action: implementation of merge sort. As you may recall, this is a general purpose sorting algorithm that works by dividing the list into two roughly equal halves, recursively merge-sorting each half, and then merging the two sorted haves into one list. One of the many nice attributes of Logo is the ease of using recursion – so much so, in fact, that recursion is a more natural way of looping or repeating than standard for or while loops. So merge sort (or really, any other recursive procedure) is a doddle in Logo.

;; halve breaks up a list into two halves: [1 2 3 4 5] -> [[1 2] [3 4 5]] to halve :list localmake "n count :list localmake "h int :n/2 output list (cascade :h [lput item # :list ?] []) ~ (cascade (:n-:h) [lput item #+h :list ?] []) end ;; merge joins two sorted lists to merge :list1 :list2 if emptyp :list1 [output :list2] if emptyp :list2 [output :list1] ifelse (first :list1) < (first :list2) output se (first :list1) (merge (bf :list1) :list2) ~ output se (first :list2) (merge :list1 (bf :list2)) end ;; This is the mergesort procedure to mergesort :list if emptyp :list output [] localmake "halves halve :list output merge (mergesort first :halves) (mergesort last :halves) end

This is a pretty brain-dead implementation, all I’ve done really is copy down the definition. No doubt a better Logo programmer could use more of its clever commands to write a smaller, and no doubt faster, program. You can see another version at the Rosetta Stone site; also check out their Logo version of Quicksort. And there are also many other algorithms implemented in Logo.

]]>

As before, we invoke our numerical software:

>> N = 13; n = 1:N; a = 1./(n.^2); s = cumsum(a); >> s.' ans = 1.000000000000000 1.250000000000000 1.361111111111111 1.423611111111111 1.463611111111111 1.491388888888889 1.511797052154195 1.527422052154195 1.539767731166541 1.549767731166541 1.558032193976458 1.564976638420903 1.570893798184216 >> atk = aitken(s); >> atk(end,:).' ans = 1.570893798184216 1.604976638420905 1.623249742431601 1.633102742772414 1.638466686494426 1.641588484246542 1.641583981554060

We are a little better off here; the final cumulative sum is in error by about , and using Aitken’s process gives us a final value which is in error by 0.003. But this is nowhere near as close as we achieved earlier. And this is because Aitken’s process doesn’t work for a sequence whose convergence is *logarithmic*, that is

This is named for Samuel Lubkin, who wrote extensively about series acceleration in the 1950’s. In fact, his 1952 paper is available online. In the 1980’s J. E. Drummond at the Australian National University took up Lubkin’s cudgels and further extended and developed his methods. Drummond noticed that Aitlen’s and Lubkin’s processes were very closely linked.

First, Aitken’s process can be written as

We can check this using any symbolic package, or working it out by hand:

This last expression can be expanded to:

After some algebraic fiddlin’, we end up with

which is the initial formula for Aitken’s process.

Lubkin’s process can be written as

Given that for a sequence we have , then can be expanded as

It turns out to be more numerically stable to write in the form

of which the numerator of the fraction is equal to . A little bit of algebra shows that the denominator is equal to We can thus write the W-transformation as

So let’s experiment with

>> N = 13; n = 1:N; a = 1./n.^2; >> s = cumsum(a); >> s = cumsum(a); s(end) ans = 1.570893798184216 >> abs(s(end)-pi^2/6) ans = 0.074040268664010

So far, not a particularly good approximation, as we’d expect. So we’ll try the W-transformation, in its original form as the quotient of two second differences:

>> s0 = s(1:N-3); s1 = s(2:N-2); s2 = s(3:N-1); s3 = s(4:N); >> w = (s2./(s3-s2)-2*s1./(s2-s1)+s0./(s1-s0))./(1./(s3-s2)-2./(s2-s1)+1./(s1-s0)); >> w(end) ans = 1.644837749532052 >> abs(w(end)-pi^2/6) ans = 9.631731617454342e-05

which is a great improvement. We can do this again, simply by

>> s = w; N = N-3;

and repeating the above commands. The new values of the final result, and error, are:

>> w(end) ans = 1.644933894309522 >> abs(w(end)-pi^2/6) ans = 1.725387044348992e-07

Just as with Aitken’s process, we can whip up a little program to perform the W-transformation iteratively:

function out = lubkin(c) % Applies Lubkin's W-process to a vector c, which we suppose to % represent a sequence converging to some limit L N = length(c); M = N; % length of current vector s = reshape(c,N,1); % ensures we are working with column vectors out = s; for i = 1:floor(N/3) s0 = s(1:M-3); s1 = s(2:M-2); s2 = s(3:M-1); s3 = s(4:M); t = s2./(s3-s2)-2*s1./(s2-s1)+s0./(s1-s0); t = t./(1./(s3-s2)-2./(s2-s1)+1./(s1-s0)); tcol = zeros(N,1); tcol(3*i+1:N) = t; out = [out tcol]; M = M-3; s = t; end

Note that the generalization of these processes, using

for integers , have been explored by J. E. Drummond, and you can read his 1972 paper online here.

]]>

and . If then the convergence is *super-linear*. Convergence is *quadratic* if

again with . See the wikipedia page for more details. It is not hard to show that if a sequence converges linearly then the increase of significant figures is a linear function of , and if the convergence is quadratic then the number of significant figures roughly doubles at each step.

This is named for the esteemed mathematician Alexander Craig Aitken, who published it in 1926. To derive it, assume that

and solve for . This will produce

or equivalently as

We can now invoke the terminology of differences, of which the forward difference can be written

Then the second forward difference is

Comparing this with the expression for above, we can write

We can use this to produce a new sequence defined by

If the initial sequence converges linearly, then this new sequence will converge faster. If the sequence is obtained from a fixed point process (so that ) then this is called Steffensen’s method, and can be shown to have quadratic convergence. However, our interest is in series approximation.

As an example, we’ll take the first few cumulative sums of our standard sequence

And for simplicity, we’ll use a numerical package (Matlab, GNU Octave, Scilab, etc).

>> N = 13; >> n = 0:N-1; >> a = (-1).^n./(2*n+1); >> s = cumsum(a); >> 4*s' ans = 4.000000000000000 2.666666666666667 3.466666666666667 2.895238095238096 3.339682539682540 2.976046176046176 3.283738483738484 3.017071817071818 3.252365934718877 3.041839618929403 3.232315809405594 3.058402765927333 3.218402765927333

As we would expect, the final result is very innacurate. So we’ll give Aitken’s method a go, first taking out subsequences of length so as to be able to compute all the differences we need:

>> s0 = s(1:N-2); >> s1 = s(2:N-1); >> s3 = s(3:N); >> t = s0 - (s1-s0).^2./(s2-2*s1+s0); >> 4*t' ans = 3.166666666666667 3.133333333333334 3.145238095238096 3.139682539682540 3.142712842712843 3.140881340881342 3.142071817071818 3.141254823607766 3.141839618929403 3.141406718496503 3.141736099260667

and already the last value is surprisingly accurate: with and error of only , compared to an initial error (in the first sequence of cumulative sums) of . And we can apply the delta-squared process to this new sequence:

>> N = N-2; >> t0 = t(1:N-2); >> t1 = t(2:N-1); >> t2 = t(3:N); >> u = t0 - (t1-t0).^2./(t2-2*t1+t0); >> 4*t' ans = 3.142105263157895 3.141450216450217 3.141643323996266 3.141571290201428 3.141602841602842 3.141587320947787 3.141595655236941 3.141590862710498 3.141593774239114

and we have increased the accuracy again. If we applied this process to successively decreasing sequences, the final values would be:

3.218402765927333 3.141736099260667 3.141593774239114 3.141592673909636 3.141592654277287 3.141592653625053 3.141592653591176

and this last value is in error only by about .

Starting with a sequence of partial sums, we can write a simple Matlab program to compute Aitken’s process times, where is the length of the sequence:

function out = aitken(s) % Applies Aitken's delta-squared process to a vector s, which we suppose to % represent a sequence converging to some limit L N = length(s); M = N; % length of current vector a = reshape(s,N,1); % ensures we are working with column vectors out = a; for i = 1:floor(N/2) a0 = a(1:M-2); a1 = a(2:M-1); a2 = a(3:M); b = a0 - (a1-a0).^2./(a2-2*a1+a0); bcol = zeros(N,1); bcol(2*i+1:N) = b; out = [out bcol]; M = M-2; a = b; end

For example, let’s look at the sequence

>> N = 13; n = 1:N; a = (-1).^(n+1)./(n.^2+2*n); s = cumsum(a); >> s.' ans = 0.333333333333333 0.208333333333333 0.275000000000000 0.233333333333333 0.261904761904762 0.241071428571429 0.256944444444444 0.244444444444444 0.254545454545455 0.246212121212121 0.253205128205128 0.247252747252747 0.252380952380952 >> atk = aitken(s); >> atk(end,:).' ans = 0.252380952380952 0.250007568189386 0.250000065527560 0.250000001161480 0.250000000036343 0.250000000001713 0.250000000000078]]>

then the series

converges. However, simply adding terms one by one is, in general very slow and inefficient. For example, take the well known series

which converges to . Here’s a table of the sum of the first terms, and multiplied by 4, with correct digits in bold:

As you see, obtaining correct digits requires terms to be summed.

There are many different methods for obtaining an approximate sum of an infinite series; most of these are known as techniques for *accelerating convergence* or series acceleration.

One of the first documented methods to accelerate the convergence of a series was given by Euler (but of course); he noticed that an alternating series

can be written as

This replaces the original series with a new alternating series which should converge faster, because the terms are smaller.

Now for some experiments. For this blog post I’m going to use the software Pari/GP which is designed primarily for computational number theory, but contains a whole lot of other goodies, in particular some routines for approximating series sums.

To obtain the examples above, for example with :

? s = sum(i=0,10^4,(-1)^i/(2*i+1); ? \p 12 ? 4.0*s %1 = 3.14169264359

Here the first line computes the sum. Since GP works in symbolic, and accurate representation mode as a default, the sum is computed as a massive fraction, which we don;t want displayed. Hence the semi-colon at the end of the line. The second line sets the precision, and the last line displays our approximation to .

We are going to work with the first 11 terms, and apply Euler’s trick. First, create all the terms, noting that we are dealing with the positive values; we don’t need to include the alternating signs:

? a = vector(11,n,1/(2*n-1)) %2 = [1, 1/3, 1/5, 1/7, 1/9, 1/11, 1/13, 1/15, 1/17, 1/19, 1/21]

Note that GP indexing starts at one, rather than zero. The alternating sum of these terms from above is 3.232315809405594, which does not even have one decimal place accuracy. We can create the new sequence by taking differences:

? b = a/2; ? for(i = 2, length(a), b[i] = (a[i]-a[i-1])/2)

and we can compare the values of terms in the original and new series:

? 1.0*matconcat(a;b)~ %3 = [ 1.00000000000 0.500000000000] [ 0.333333333333 -0.333333333333] [ 0.200000000000 -0.0666666666667] [ 0.142857142857 -0.0285714285714] [ 0.111111111111 -0.0158730158730] [0.0909090909091 -0.0101010101010] [0.0769230769231 -0.00699300699301] [0.0666666666667 -0.00512820512821] [0.0588235294118 -0.00392156862745] [0.0526315789474 -0.00309597523220] [0.0476190476190 -0.00250626566416]

The terms in the new series are an order of magnitude smaller than in the original series. Now add them as an alternating series:

? sgns = vector(11,i,(-1)^(i-1)); >> 4.0 * sum(i = 1, 11, b[i]*sgns[i]) %4 = 3.13707771417

This value is a great deal closer to than the first result!

This process can be repeated any number of times, depending on the number of terms of the original series we have at our disposal:

? c = b/2; ? for(i = 2, length(b), c[i] = (b[i]-b[i-1])/2) ? 4.0 * sum(i = 1, 11, c[i]*sgns[i]) %5 = 3.14209024550 ? d = c/2; ? for(i = 2, length(c), d[i] = (c[i]-c[i-1])/2) ? 4.0 * sum(i = 1, 11, c[i]*sgns[i]) %6 = 3.14150053593

and already we have far greater accuracy than at the start, without computing any further terms of the series.

Another way of implementing this acceleration technique is to write the first term as

and consider partial sums instead of individual terms. The -th partial sum of the original series is

Now consider the -th partial sum of the series of differences:

Breaking this up according to the first and second terms in each of the

numerators, we obtain

For the series above, we start with the terms including the sign changes:

? N = 11; ? a = vector(N,i,(-1)^(i-1)/(2*i-1));

Now it will be convenient to create a function called cumsum, which produces the cumulative sum of elements of a vector:

cumsum(x) = { s = x; for(i = 2, length(x), s[i] = s[i-1]+x[i]) return(s) }

Now we can use this to experiment with Euler’s method again:

? s = cumsum(a); ? t = concat(0, vector(N-1, i, (s[i]+s[i+1])/2); ? 4.0*matconcat(s;t)~ %7 = [4.00000000000 0] [2.66666666667 3.33333333333] [3.46666666667 3.06666666667] [2.89523809524 3.18095238095] [3.33968253968 3.11746031746] [2.97604617605 3.15786435786] [3.28373848374 3.12989232989] [3.01707181707 3.15040515041] [3.25236593472 3.13471887590] [3.04183961893 3.14710277682] [3.23231580941 3.13707771417]

And again we can take the pairwise averages of all the elements of , and again:

? u = concat(0,vector(N-1, i, (t[i]+t[i+1])/2)); ? v = concat(0,vector(N-1, i, (u[i]+u[i+1])/2)); ? \p 6 ? 4.0 * matconcat(s;t;u;v)~ %8 = [4.00000 0 0 0] [2.66667 3.33333 1.66667 0.833333] [3.46667 3.06667 3.20000 2.43333] [2.89524 3.18095 3.12381 3.16190] [3.33968 3.11746 3.14921 3.13651] [2.97605 3.15786 3.13766 3.14343] [3.28374 3.12989 3.14388 3.14077] [3.01707 3.15041 3.14015 3.14201] [3.25237 3.13472 3.14256 3.14136] [3.04184 3.14710 3.14091 3.14174] [3.23232 3.13708 3.14209 3.14150]

We can write a small program to do this for the initial values:

euler_trans(x) = { local(s,N,p,t); s = x; N = length(s); p = Vec(s[N]); for(i = 1, N, t = concat(0,vector(N-1,i,(s[i]+s[i+1])/2)); p = concat(p,t[N]); s = t; ); return(p); }

And now use this program:

? \r euler_trans.gp ? \p 10 ? s = cumsum(a*1.0); ? Mat(4*euler_trans(s))~ %9 = [3.232315809] [3.137077714] [3.142090245] [3.141500536] [3.141618478] [3.141582188] [3.141598683] [3.141587686] [3.141598683] [3.141581088] [3.141633874] [3.139152897]

Note that better accuracy is in fact obtained not at the end, but about two-thirds of the way down. This observation (and the proof that it does indeed have higher accuracy), is the contribution of Adriaan van Wijngaarden, who published it in 1965.

So one last check, for 120 terms, stopping our loop at 80:

? \p 20 ? N = 120; ? a = vector(N,i,(-1)^(i-1)/(2*i-1)); ? s = cumsum(a*1.0); ? p = euler_trans(s); ? forstep(i = 1,floor(N*2/3),10,print(p[i])) 3.1332594649198298159 3.1415926535897932384 3.1415926535897932385 3.1415926535897932385 3.1415926535897932385 3.1415926535897932385 3.1415926535897932385 3.1415926535897932385

Our accuracy is at the limit of our set precision:

? print(abs(4*p[80]-Pi)) 1.5281426560689737604 E-37

Notice that all of this was produced with just the first 120 terms of the series; at no time in the computation of the loop did we compute any further terms.

For comparison, the direct sum of these terms is a very poor approximation indeed:

? print(abs(4*s[N]-Pi)) 0.0083331886699634225838

One last example, to compute

Before we attempt it ourselves, we can check its value with the `sumalt` command of Pari/GP:

? altsqrt = sumalt(i=2,(-1)^i/sqrt(i)) %10 = 0.39510135657836962975273408576404450022

As above,:

? N = 120; ? a = vector(N,i,(-1)^(i+1)./sqrt(i+1)); ? s = cumsum(a); s(N) %11 = 0.34974072346962855572717264600532619101 ? abs(s[N]-altsqrt) %12 = 0.045360633108741074025561439758718309212

So the initial sum of 120 elements is not very good – as we’d expect. So, let’s see how Euler-van Wijngaarden manages:

? p = euler_trans(s); ? forstep(i = 1, floor(N*2/3), 10, print(p[i])) 0.34974072346962855572717264600532619101 0.39510135657836962968976358865551977318 0.39510135657836962975273408576388436451 0.39510135657836962975273408576404450023 0.39510135657836962975273408576404450022 0.39510135657836962975273408576404450022 0.39510135657836962975273408576404450022 0.39510135657836962975273408576404450022

And in fact we have reached accuracy to our set precision even earlier:

? forstep(i = 1, floor(N*2/3), 10, print([i,abs(p[i]-altsqrt)])) [1, 0.045360633108741074025561439758718309212] [11, 6.2970497108524727047861082221352448351 E-20] [21, 1.6013571021345509597 E-31] [31, 4.408103815583578155 E-39] [41, 0.E-38] [51, 1.4693679385278593850 E-39] [61, 2.938735877055718770 E-39] [71, 4.408103815583578155 E-39]

Note again that the power of this method is that *without computing any extra terms of the series*, we have increased the precision of the sum from one decimal place to nearly 40.

With Linux, and sticking to open-source or free typesetting software, there are three that I’ve used:

- GNU Lilypond, very mature software in active development since at least 2004, and with a large user base. It is designed to meet pretty much all notational needs, and as you can see by hunting around on its web page, can be used for scores both simple and complex. Its aim is to be software suitable for use by professionals.
- ABC notation goes back even further than Lilypond, to the early 1980s, when its initial creator, Chris Walshaw, was looking for ways of notating folk tunes easily. Since that time ABC has been extended to manage polyphonic music, and with the program abcm2ps developed by Jef Moine it can handle very complex music. A good page which pulls all the strands of ABC together, with examples, is provided by Guido Gonzato. The ABC standard is at version 2.2, and is defined here.
- MusixTeX uses the power of TeX to typeset music. The results are impressive, but the raw input is fiddly in the extreme. For this reason there are several pre-processors, of which PMX is the most full-featured, and M-Tx for lyrics.

All of these programs require the user to write out the music in ascii notation; thus the input is an ascii document which defines the music to be typeset. This provides a nice backwards compatibility: you can grab an ascii file created 10 or more years ago for any of these programs, and maybe doctor it a little to take advantage of the newer functionality.

As an experiment, I’m going to typeset the first eight bars of the the third movement of the “Variation Sonata on Bonny Jean of Aberdeen” by the Scottish composer Charles McLean (fl 1732). A “variation sonata” was a local invention: a sonata in multiple movements where each movement was a variation on a popular folk tune. The variation sonata was supposedly invented by another shadowy figure in 18th century Scottish music: Alexander Munro.

First, Lilypond:

\version "2.19.53" \header{ title = "Variation Sonata on Bonny Jean of Aberdeen" composer = \markup {\column {" " \large "Charles McLean"} " "} } \layout{ indent = 25\mm } global = { \language english } allegroVln = \new Voice \relative c'' { \set Staff.instrumentName = #"Violin " \override Staff.InstrumentName.self-alignment-X = #RIGHT \time 4/4 \key g \major \partial 4 b8 a | g4 b8 c d4 d,8 e | g4 d'8 c b4 a8 g | c8 g'4 c,8 b g'4 b,8 | a4 b8 g e4 b'8 a | g4 b8 c d4 d,8 e | g4 b8 d g4 fs8 e | d g4 c,8 b g'4 b,8 | a g'4 fs8 g4 \bar ":.|.:" } allegroBass = \new Voice \relative c' { \set Staff.instrumentName = #"Bass " \override Staff.InstrumentName.self-alignment-X = #RIGHT \clef bass \key g \major \time 4/4 \partial 4 r4 | g4 d g, b8 c | b4. a8 g4 g' | e c g' g, | d' b c a | b4. a8 g4 b8 c | b4 g'8 fs e4 d8 c | b4 a g g' | d d, g \bar ":.|.:" } \score { \new StaffGroup << \new Staff << \global \allegroVln >> \new Staff << \global \allegroBass >> >> \header{ piece = \markup{\fontsize #2 "III. Allegro"} } }

This is the result:

Next, ABC, using the abcm2ps software to turn the file into a page of music:

X: 1 T: Variation Sonata on Bonny Jean of Aberdeen C: Charles McLean L: 1/4 K: G M: 4/4 Q:"III Allegro" %%score [vln | Bc] V:vln clef=treble name=Violin bracket=2 space=+5pt V:Bc clef=bass name=Bass [V:vln] B/G/ | AB/c/dD/E/| Gd/c/B A/G/ | c/gc/ B/gB/ | AB/G/EB/A/ | GB/c/dD/E/ | GB/d/gf/e/ | d/gc/ B/gB/ | A/gf/g :| [V:Bc] z | G,D,G,,B,,/C,/ | B,,>A,,G,,G, | E,C,G,G,, | D,B,,C,A,, | B,,>A,,C,A,, | B,,G,/F,/E,D,/C,/ | B,,A,,G,,G, | D,D,,G,, :|

with result:

Finally, MusixTeX with the PMX pre-processor:

% Charles McLean, Variation Sonata on Bonny Jean of Aberdeen % PREAMBLE: % nstaves ninstr mtrnuml mtrdenl mtrnump mtrdenp 2 2 4 4 0 6 % npickup nkeys 1 1 % npages nsystems musicsize fracindent 1 2 16 .08 Bass Violin bt ./ % BODY: % HEADER: Tc Charles McLean (fl. 1732) Tt Variation Sonata on Bonny Jean of Aberdeen h III. Allegro Abec4 % Bars 1 - 4 r4 g43 d g- b8 c | b4.a g4 g+ | e c g+ g- | d+ b c a | / [ b84 a ] g4 b8 c d4 d8- e | g4 d8+ c b4 a8 g | c8 g4+ c8- b g4+ b8- | a4 b8 g e4 b8+ a / % Bars 5 - 7 b4.a g b8 c | b4 g8+ f e4 d8 c | b4 a g g+ | / g4 b8 c d4 d8- e | g4 b8 d g4 f8 e | d8 g4 c8- b g4+ b8- | / % Last partial bar m3400 d4 d- g Rr / a8 g4+ f8 g4 Rr /

and its result:

For me, the use of a music notation system is dependent on two aspects: ease of use; and quality of output. I think all of these systems – at least, for my needs – are roughly comparable in their output. They all make sensible suggestions about layout and spacing, and as far as I can tell all three outputs look pretty good.

For ease of use, I found ABC easiest, followed by Lilypond, and last by PMX. In fact PMX does so much checking that it won’t compile and typeset your music until the file is set up correctly. As you see from the preamble, there are many factors which affect the result, and which can be set by the user. PMX works by a sort of coding system, where the length of a note is given by one digit: 0, 1, 2, 4, 8, 3, 6, 9 for breve, whole-note (or semibreve), half-note (minim), quarter note (crotchet), eighth, sixteenth, thirty-second and sixty-fourth note. PMX shares with Lilypond a relative pitch naming system, where a note is placed automatically in the same octave as the previous note (that is, within one fourth), and can be moved into a higher or lower octave by extra symbols: in PMX plus and minus (+ and -); in Lilypond apostrophe and comma (‘ and ,).

Lilypond’s learning curve is steeper than ABC, and if you read through the manual you’ll come up against “contexts”, “engravers” which are initially confusing. Once mastered, however, you have almost infinite control over the output.

PMX was also the only system where the notes in the first partial, or pick-up, bar of the violin part weren’t automatically beamed. Also PMX is very fussy about final partial bars (such as in my example) – it requires you to formally define a partial bar by what is called a “blind meter change”.

ABC, or rather abcm2ps, is the least fussy of all the systems. For example if you make a mistake in your notation, and end up with a bar that has the wrong number of beats in it, ABC will just go ahead and try to print it out, even if it looks like rubbish. This makes debugging simple, as you can immediately see where you’ve gone wrong. On the other hand, ABC doesn’t support relative pitch notation, although this can be alleviated by defining your octave, and placing notes within it.

So which is better? For my purposes, ABC is certainly good enough and is by far and away the simplest, although Lilypond has the edge in terms of the way it is structured, and for professional control over the output. (Although, in fairness, ABC also allows fine control over the output, at least for placement of notes and other elements.) I don’t think MusixTeX/PMX is in the running: it’s too fiddly, and there are easier ways of obtaining equivalent output.

]]>

to compute the derivative of “from first principles”. And this limit can be evaluated analytically in several ways. We could for example use the addition law for sine and write the limit as

and then compute the individual limits. This is where that fine old mathematical chestnut

arises.

Or we could use the sine difference formula

and so write the limit as

Neither of these approaches is wrong, but as teaching tools they leave a lot to be desired. They are very abstract, and several removes from the geometric notion of the derivative as a rate of change, or of the gradient of a tangent. Beginning students, especially those with weak backgrounds, are likely to be bemused.

So here is a geometric approach. I discovered that it is in fact very well known (well, something so simple was bound to be), but in fact I’d never come across this until today. So here it is.

Start with and on a diagram:

Then is the vertical side of the small blue triangle in this next diagram:

Since is very small, the hypotenuse of this triangle will be very close to the arc length of the circle, which is also , since we are working in radians. Thus the hypotenuse can be considered to be .

Now let’s look at a few angles. First the outer angles in the triangle created by will be equal, since the triangle is isosceles (the two long sides are both radii of the circle):

Both these angles will be equal to

Since the upper green angle in the first diagram is

the upper angle in the blue triangle will be

which simplifies to

Just to make this quite clear, here’s the top point:

We have

and so

as we found above.

So, the blue triangle has hypotenuse , vertical side and upper angle :

This means that, for small :

or alternatively that

Taking the limit of both sides as tends to zero:

Voilà!

Note that the base of the blue triangle, checking with the first diagram, is

This means that

or

from which, by taking limits of both sides as tends to zero, we obtain

]]>