This week I’m at the Mathematics and Statistics Industry Study Group, a yearly event where companies from Australia and New Zealand provide problems for the assembled mathematicians and statisticians to solve. I’m in a crowd of people working on a problem provided by the energy company AGL. They have given us electricity smart meter data for 95 customers (there is a larger data set with 900 customers) of electricity readings taken every 30 minutes over a 199 day period. That’s 9552 observations for each customer! You can read the problem here.
So far everybody has (in the words of one participant) “thrown every imaginable package at the data and drawn lots of pretty pictures”. Other folk are using Excel, Minitab, R and other statistics packages; I’m using Octave. Today we have decided to break down the data into times of day: night (0000 – 0600), mornings (0600 – 0900), daytime (0900-1700) and evening (1700 – 2400), and to look at electricity usage versus temperature over those times, and over the course of the 199 days, which go from winter to summer. We can also break down the data into weekdays, and weekends.
Here is an example of one data plot: a customers usage by temperature (which is the horizontal axis and given in celsius, so the upper value, 40, is very hot: 104F).
This customer’s usage, as you can see, is skewed up to the right (where the temperature is hotter), which would seem to indicate the use of an air-conditioner. Other such plots are skewed to the left (colder temperatures) which would appear to indicate the use of heating in cold weather, but no a/c in hot. Other plots have humps at both ends. A few people have been trying to fit quadratic functions to these plots, without a huge amount of success, as the correlation is so low.
If you simple plot the data of a single customer over the entire data period you can see some interesting stuff. Here for example, is one data set:
Notice the “break” towards the right (this corresponds to early summer). We can infer that this customer went away at that time, and the residual power usage would have been for refrigerator, maybe security lights etc.
I am not myself any closer to solving the problems proposed by AGL, but I’m getting a better handle on the data. I still intend to do some data smoothing, probably with a Gaussian filter, and see if it’s possible to classify customers based on the outputs. Or I might head off to the staffroom in search of tea.