Bayesian Statistics: Stochastic Handling of Wireless Coffee Shop Customers

Stochastic Handling of Coffee Shops

The following is meant to complement a response to a message posted on LiquidBlur. The response was posted by Theo in response to a message posted by MaVeRiCK. The original two messages are given here:

MaVeRiCK's Message
Theo's Response

The following gives some details about what it really means to use stochastic methods to estimate the states of real objects in an environment full of uncertainty. It is meant to provide some conceptual understanding of these methods.

I recommend that further reading be done about probability, statistics and estimators.

It was posted on December 20, 2004.

MaVeRiCK, (I have a feeling this is going to echo a lot of what
mathking said) remember that what we're building here is a mapping from
measurements to estimated coordinates. Given the two measurements I've
taken, how do I estimate two coordinates?

What I want to show here is that (like mathking said) this has
nothing to do with certainty. This method is actually all about
uncertainty. We don't "prophesize" ("estimate") one SINGLE "CERTAIN"
coordinate until the VERY END of the method, and when we do that,
we actually carry with it a measure of how "certain" we are about
it. The computer is located at 2 meters by 3 meters with a 2
square meter uncertainty. Or this message is spam with a 10%
chance of uncertainty. Or perhaps a more familiar example,
the country will vote 51% for Bush and 48% for Kerry with a 4%
uncertainty (yes, polling is another example of all of this!!).

So it's our job as engineers to minimize this uncertainty as much as
possible. We can do that by getting better sensors. We can do that
reducing the possible number of outcomes as well, which allows for
significantly poorer sensors. If we only care what table someone is
sitting at, we really don't need inch resolution. And on top of all of
this, if we can't do these things ahead of time, we can do them on
the fly (see mathking's post).

CONCEPTUALLY, how it works:
We Start Out Knowing Nothing, So EVERYTHING IS UNIFORM,
and THUS NO ESTIMATOR is good here.
Now, if you know nothing about your environment, then your two
measurements tell you no useful information about the real coordinates
of the object that you're tracking. Thus, any estimate you pick will be
equally valid and thus equally invalid.

Mathematically, you can think of this has having uniform distributions at ,
, and .  In the expressions given in my previous post, you can see
that if all three of those are flat, uniform distributions, then all
three of those will be constants, and the resulting 
 will be a constant (flat and uniform).

The Effect of Knowing about Your Sensor
Now, what happens if you actually know something about your
sensors? What happens if you actually run through your entire environment
multiple times with a test object and see what mesaurements you get
from that test object each pass? Assuming your sensor works better than
a divining rod, you probably will gain some information about the
object's location from the sensor. Imagine that you had a "perfect"
sensor. Even though you know nothing about the distribution of the
coordinates, you don't need to. Just knowing you have a perfect sensor
gives you absolute certainty that a measurement implies a particular
coordinate.

In reality, all sensors have a certain amount of uncertainty. And that
uncertainty will be reflected in the distribution  of coordinates given a
particular measurement. But keep in mind that as long as the sensor does
better than a uniform distribution (as all sensors should), then your
distribution of coordinates given a particular measurement gets BETTER.

Knowing Where People Cluster
If you know something about the distribution of people, it only makes the
distribution of coordinates given a particular measurement get better.

And you can estimate this on the fly. As you run your coffee
shop longer and longer, you will be able to process a lot more data about
the distribution of your customers. You'll find that there are certain
tables that simply aren't used. You'll find that at certain TIMES there
are more people in your coffee shop, and when there are more people,
people start to fill up areas that aren't typically filled up. You get
a lot more information that allows you to pick a pretty good choice for
the distribution of coordinates in your shop.

And keep in mind that as much as this is a model, it can be completely
based off of measurements. We take lots and lots of measurements
and make HISTOGRAMS out of that data, and those histograms are what we
use to generate our probability distributions.

When we don't have measurements to help out, we can use models of the
environment to predict that model, but as time goes on, we can replace
all of those models with data taken from ACTUAL MEASUREMENTS.

The Effect of Resolution
And on top of all of this, if we don't care where ON a table a
computer sits but only on WHICH table it sits, it makes our job much
better. We can say that once we've reduced our ESTIMATION UNCERTAINTY to
under the area of a TABLE (or even half the distance between tables!),
then we can stop.

Moving from Distributions to Estimations
So in the end, we do our best to narrow the DISTRIBUTION of possible
coordinates given certain measurements.

This is still a picture of uncertainty. In fact, we can even
measure this. We can take thousands and thousands of measurements
of a known environment and look at every time we took a particular
measurement. Hopefully, that measurement will only correspond with a
certain small range of coordinates. For whatever this range of coordinates
is, we can histogram the actual coordinates that correspond to this
particular measurement. If we do this for every possible measurement value,
we get a picture of how frequently each coordinate occurs given a particular
single measurement.

To illustrate, see the following image:



Each graph represents one possible shape of this "picture." The arrows
represent how the picture changes as we move from having no information
(the far left) from having near perfect information (the far right). This
is a picture of the probability density function representing the
probability of an object B (a coordinate) given an CERTAIN measurement
A=a (i.e., taking a measurement and getting a). In this case, assume
the measurement from the sensor is 0.5. As we learn more about
our sensor or our environment, we gain certainty that our measurement
0.5 actually represents something about the real coordinate of
the object in the environment.

So given this picture, the real tricky thing is figuring out how to
estimate a "most likely" coordinate from this distribution. If the
distribution is a spike at one particular coordinate, then it's easy. Just
pick that coordinate (or a table or whatever).

If it's not a spike, you have to use things like "mean," "median," "mode,"
and a whole slew of other estimators that are best for particular types
of distributions. And that CHOICE of ESTIMATOR is a very complicated one
and carries with it a lot more mathematics than we've gone through so far.

But let's just say that "mean" is the estimator we want to use. In that
case, we just take the mean ("expected value") of the distribution we
found and call that our "coordinate estimation." And, additionally,
we try to publish some sort of data on how "certain" we are about that
estimation.

So NO, you can never be certain,
but that's the point!!
So no, you can never be certain about your estimate. BUT BECAUSE the
method is BUILT on understanding uncertainty, you have a much better
grasp on that uncertainty!!