Thursday, May 31, 2007

Correlation Factors

Provided I can find some way to construct probability curves of the chance of one sensor going off after another in some time window, I should be able to superimpose several curves in order to get a more accurate measure. However, certain sensors will help predict better than other sensors. To this end, it wouuld be useful to find some measure of how related two sensors are. I decided to call this the correlation factor, with the idea that in order to find the probability, we can take the weighted averages of all the probability curves.

Since I'm planning to construct the curves using the number of hits within some time window of a sensor going off, the number of hits for a particular sensor can serve as the basis for this metric.

Looking back at the backward number of hits, the graph showing the correlation of sensors shows some outliers. Taking sensors above two standard deviations--or even one--seems to be a pretty good measure for our 150-piece sensor network.



The first idea for a correlation factor is the number of standard deviations above the mean number of hits within a certain time window. Or, for our purposes, (hits - mean(hits)) / sd(hits). Thus, for this dataset, the correlation factors would look like:



Though correlation factors measured like this seem to show connection for measurements of two or even one, the fact that it can go up to 6 is a little worrisome. We could alter the measure slightly without changing the formula by altering our definition of number of hits. This may also help with constructing probability curves.

I'll end this with two graphs: the first is of the unscaled curves. The second is of the curves scaled using the correlation factors.

No comments: