I didn’t explain the problem well enough in my last post: AGL, the energy supply company, are trying (among other things) to create several “reference profiles” of energy use, to which customers can be compared. This will give them a better idea about supply and demand, and allow them to more accurately forecast energy requirements.
This was a problem more statistical than mathematical, and people were throwing all sorts of clustering analysis techniques (about which I know nothing) at it. However, an averaging plot showed two parallel “clumps” of data, which a single curve of best fit couldn’t describe. Since I know nothing about statistics, but quite a lot about imaging, I had the idea of treating each customer plot as an image:
Notice the two clumps at the bottom left – this is an example of a plot to which a curve of best fit could not be usefully applied.
The image has been subdivided into unequal quadrants, of which the upper left and right parts are the energy usage during hot and cold temperatures. The vertical dividing line is at 20C = 68F. The horizontal line is one-quarter from the bottom. This seemed (without any analysis) to be a good place for cutting off “standard” energy usage (below the line) to “high” usage (above the line. The customers can then be classified by choosing thresholds for “high hot” and “high cold” usage, and clustering them depending on whether they have used more or less than the threshold value.
The nice thing about his approach is that it’s both very easy and very transparent, and also can be adjusted; for example using finer grids on the image, by subdividing the original data into times of day, or into weekdays/weekends (or both), or by using a better approach than “by eye” to determine the cutoff point between standard and high usage.
The workshop ended before I could explore this idea any further… so I might play around with this on my own some more.