Thursday, May 31, 2007

Correlation Factors

Provided I can find some way to construct probability curves of the chance of one sensor going off after another in some time window, I should be able to superimpose several curves in order to get a more accurate measure. However, certain sensors will help predict better than other sensors. To this end, it wouuld be useful to find some measure of how related two sensors are. I decided to call this the correlation factor, with the idea that in order to find the probability, we can take the weighted averages of all the probability curves.

Since I'm planning to construct the curves using the number of hits within some time window of a sensor going off, the number of hits for a particular sensor can serve as the basis for this metric.

Looking back at the backward number of hits, the graph showing the correlation of sensors shows some outliers. Taking sensors above two standard deviations--or even one--seems to be a pretty good measure for our 150-piece sensor network.



The first idea for a correlation factor is the number of standard deviations above the mean number of hits within a certain time window. Or, for our purposes, (hits - mean(hits)) / sd(hits). Thus, for this dataset, the correlation factors would look like:



Though correlation factors measured like this seem to show connection for measurements of two or even one, the fact that it can go up to 6 is a little worrisome. We could alter the measure slightly without changing the formula by altering our definition of number of hits. This may also help with constructing probability curves.

I'll end this with two graphs: the first is of the unscaled curves. The second is of the curves scaled using the correlation factors.

Monday, May 28, 2007

Backward and forward correlation

I did this work a few weeks ago, and I think it's pretty cool.

It started out with the idea of constructing probability curves from just the distances in time between motion events in order to predict if a motion sensor will go off in the future. I'm hoping it has more applications, though.

Oh, yeah, and I did all this analysis and graphing and stuff using R.

Forward Analysis

So if one sensor goes off somewhere, what's the chance that another sensor will go off and when?

This situation is easiest when in a corridor. On average, only one answer should appear: however long it takes to walk from one to the other. So I looked at the data from a sensor in the north corridor of the MediaLab, looked at all the data that came after each motion event (within 40 seconds), and here's what I came up with:


These are the number of hits within 40 seconds after a certain motion sensor went off (keep in mind, this is not after a specific motion event, but several motion events that belong to the same motion sensor). I ordered them so that it's evident that there are several that seem to correlate more with the motion sensor, since there are a lot more hits for those sensors.

Now I chose a couple of sensors, and plotted the density function of the time it took for the sensor to go off after the motion sensor I was looking at. There is an obvious peak on each curve at the time (in milliseconds) that it takes to walk from one sensor to the next.


Backward Analysis

There's another way to approach the data, though. Instead of looking 40 seconds after a motion sensor's event, we can look a few (I chose 10) seconds before a motion sensor goes off. This worked better for my inital idea of using the curves to predict if a sensor was about to go off. When I plotted and ordered the number of hits 10 seconds before a motion sensor's events in the Tangible Media corridor, I got this graph:

This is actually even more drastic than the previous similar graph. A lot of the sensors seem to have no correlation (the leftmost data points). Now we can look at the outlying data points:
This is a map of the sensor locations, with the Tangible Media sensor in question in red, those outside of two standard deviations from the mean number of hits in green, and those between one and two standard deviations in blue. This was one of the better results, but even with this, one can see that there's a sensor in the cafe area that is obviously unrelated to the sensor in question, probably because it goes off a lot.

We can then plot the density function as we did before, and similar graphs arise. Notice that most curves peak at the same time as another curve. This can be explained by the two directions a person can walk in to reach the motion sensor. It should also be noted that these motion sensors were graphed because they had two standard deviations above the average number of hits associated with them.

This process is pretty general, and can pretty easily be repeated. Above are pretty good results. A counterexample is observing a motion sensor in the cafe area. Once again, we have a few outlying sensor hits (not shown), but when we actually plot the density functions of those above 2 standard deviations from the average number of hits, we get this graph:

So, the density of hits seems pretty evenly spread over the 10 second window. This gives us almost no predictive capabilities for this sensor. Still, the graph below shows what we can infer from the data.
This graph, though, shows that the process is not completely wasted on such spaces: the sensors it correlates to are a rather kitchen-like area, and perhaps a natural subsection of the whole cafe area. It should be noted that the red coloring of the sensor in question actually masks the fact that this sensor in fact correlates to itself (it's green underneath).

Limitations make us special

I'm always a little worried about what I'm working on in this project because I'm not sure how useful it is. Now that I'm trying to do some data analysis, that's turned into me being worried that other people have already done this kind of analysis and that I'm going to find that out after investing a lot of time and effort into some problem.

I think I've found a way that OpenSpace is different than other sensor net projects: our limitations. OpenSpace only has the network of motion sensors, put up a little haphazardly by Sarah, Isha, and I. The motion sensors are not perfect, their placement was initially done using pixels on my laptop and those pixel locations were later converted into feet (which I later had to turn into meters... too many different metrics). Oh, and we have no access to cameras or microphones, or really any way to confirm that things are working aside from our own observations.

These limitations let me work solely with the motion events, though. With such a limited data set, I can try to thoroughly explore its capabilities and try to extend its assumed limits. Additionally, I really like the idea that this idea could be extended later into a different space, so having an easy and adaptable set-up would be really nice.

OK, that's all well and good. What do I actually have, though? They're all ideas at this point. I think I can get rid of the placement element completely for data analysis at least. To stay true to the idea of "exploring the social life of a building," it would be best to leave divides like "cafe area," "plw," and "corridor" to the data itself. Maybe there won't actually be names, but we can still construct distinct areas. To do this, we can observe the relations between motion events (perhaps giving them an assigned numerical value). And since we're doing away with the spacial dimension, all we have is their distances in time. Singly, this doesn't say much, but compounding all of these time distances, I'm quite sure we can extend the apparently evident limitations of our system.

Saturday, May 26, 2007

What's this all about

I've never blogged before, but I want to set this up to record my progress on the OpenSpace project.

Some background on the project can be found here.