WEBVTT
00:05.490 --> 00:08.860
So, welcome back to the lecture on automotive vision.
00:09.760 --> 00:15.480
Yeah, last Monday we started to talk about tracking objects in the
00:15.480 --> 00:19.720
three -dimensional world and determining their movement.
00:20.500 --> 00:24.640
That means their velocity, their direction of movement, etc.
00:25.600 --> 00:29.100
And yeah, we started with a traditional regression approach.
00:29.580 --> 00:33.900
And just to remind you of the last slide that we discussed, that was
00:33.900 --> 00:35.340
actually this one.
00:35.420 --> 00:39.780
We assume that we have a camera or another sensor that is measuring
00:39.780 --> 00:43.760
the position of a vehicle that we observe over time.
00:44.260 --> 00:47.140
So, in this case, we have four different points in time.
00:47.760 --> 00:51.880
For each point in time, we get a measurement where the vehicle is.
00:51.980 --> 00:55.700
Of course, all these measurements are suffering from imprecision and
00:55.700 --> 00:56.080
noise.
00:56.300 --> 01:01.000
So, they are not exactly the true position of the vehicle, but just a
01:01.000 --> 01:01.960
measurement of it.
01:01.960 --> 01:05.940
Then we said, okay, if we assume that the vehicle is moving this
01:05.940 --> 01:11.000
constant velocity along this road here, which is given by a certain
01:11.000 --> 01:16.700
coordinate system, then we can say that the position of the vehicle at
01:16.700 --> 01:22.860
a certain point in time t can be described by means of this linear law
01:22.860 --> 01:29.980
of movement, a certain initial position plus the velocity times time.
01:29.980 --> 01:33.280
And, of course, what we do is we measure the position.
01:33.480 --> 01:37.760
So, the measurement itself is affected by a certain amount of noise
01:37.760 --> 01:40.740
and imprecision that we might add here.
01:41.300 --> 01:47.800
And then we were saying, okay, we do know the sensed position on the
01:47.800 --> 01:49.840
left -hand side for some points in time.
01:50.380 --> 01:54.840
We know the points in time t at which we made the measurements.
01:55.540 --> 02:01.420
And what we do not know is the initial position x zero and the
02:01.420 --> 02:02.320
velocity v.
02:02.740 --> 02:08.080
So, these are the two unknown variables and x of t, the sensed
02:08.080 --> 02:15.560
position at a certain times t, and the point in time t is what we
02:15.560 --> 02:15.840
know.
02:16.280 --> 02:20.520
And now we want to, well, determine the unknown parameters from our
02:20.520 --> 02:20.940
measurement.
02:21.500 --> 02:26.760
And the way we can do so is that we say, okay, for each measurement,
02:27.080 --> 02:33.380
we can enter x of t here on the left-hand side, we can enter t here on
02:33.380 --> 02:38.460
the right-hand side, and we aim for finding x zero and v such that
02:38.460 --> 02:43.680
these equalities that we get for all the measurements are almost
02:43.680 --> 02:44.320
fulfilled.
02:45.040 --> 02:48.340
Now, if we have more than two measurements, then it's very likely that
02:48.340 --> 02:53.180
we don't find any initial position and velocity with which all the
02:53.180 --> 02:56.260
equations are met completely perfectly.
02:57.540 --> 03:03.820
But we will, we aim for a solution that is good for all possible
03:03.820 --> 03:04.320
measurements.
03:04.320 --> 03:08.760
And this can be done by eliminating this, actually the difference.
03:09.240 --> 03:11.980
So, this is the difference of the left-hand side and the right-hand
03:11.980 --> 03:17.900
side of this equation, taking the square of it, and then summing up
03:17.900 --> 03:23.960
this error in this equation, so to say, that we get over all the
03:23.960 --> 03:27.060
measurements, the n measurements that we have.
03:27.500 --> 03:32.680
So, now we have a kind of error measure that describes how good a
03:32.680 --> 03:37.880
certain choice of x zero and v fits to the measurements.
03:38.120 --> 03:46.000
So, the larger the value is here of the sum, the, well, the larger the
03:46.000 --> 03:53.700
sum is, the worse is, the worse is the fit.
03:53.960 --> 03:58.140
That means x zero and v do not fit to the measurements.
03:58.580 --> 04:03.160
If the sum here is very small, if it is close to zero, then we know
04:03.160 --> 04:08.860
that the certain choice of x zero and v fits very well to the
04:08.860 --> 04:09.500
measurements.
04:09.800 --> 04:15.860
That means that we found good choices of x zero and v.
04:16.660 --> 04:21.120
So, how do we determine the best solution?
04:21.380 --> 04:25.840
Well, we take this term, we calculate the partial derivatives with
04:25.840 --> 04:32.020
respect to the unknown variables x zero and v, we zero those partial
04:32.020 --> 04:36.640
derivatives and resolve the system of equations that we get.
04:36.760 --> 04:37.880
And this is shown here.
04:38.340 --> 04:43.760
So, once you do all the calculations, you will see that we end up with
04:43.760 --> 04:46.180
such a system of linear equations.
04:46.520 --> 04:50.960
And as soon as the matrix on the left-hand side has full rank, which
04:50.960 --> 04:56.660
typically is the case as soon as we have at least two measurements,
04:57.180 --> 05:03.720
then we can resolve the system of equations and get the best values
05:03.720 --> 05:05.620
for x zero and v.
05:06.880 --> 05:12.100
If we like, we can also derive in a second step variances, that means
05:12.100 --> 05:17.520
measures of uncertainty that tell us how sure we can be about the
05:17.520 --> 05:20.440
values of x zero and v that we have found.
05:21.880 --> 05:29.360
So, we can even extend our approach.
05:30.000 --> 05:33.800
So, up to now we were assuming the vehicle is moving with constant
05:33.800 --> 05:34.340
velocity.
05:35.840 --> 05:39.200
However, in some situations this is obviously not true.
05:39.340 --> 05:44.460
So, we assume the car is waiting in front of traffic lights and now
05:44.460 --> 05:47.600
the traffic light switches to green and the car accelerates.
05:48.020 --> 05:52.200
Of course, in this such a situation, definitely this assumption that
05:52.200 --> 05:55.140
the vehicle is moving with constant velocity is not met.
05:55.520 --> 06:01.220
So, in this case, we need another motion model that we apply to
06:01.220 --> 06:03.560
estimate the movement of the vehicle.
06:03.780 --> 06:07.020
And for instance, we could assume in this case that the vehicle is
06:07.020 --> 06:09.360
moving with constant acceleration.
06:09.920 --> 06:13.880
And in such a case, we will get a motion model like the one here on
06:13.880 --> 06:18.380
the slide, that x of t can be described by an initial position x zero,
06:18.980 --> 06:25.020
an initial velocity v zero, and a constant acceleration a.
06:25.790 --> 06:30.540
That means all the measurements that we make, we assume that all those
06:30.540 --> 06:36.640
are determined from such a linear... from such a relationship that we
06:36.640 --> 06:37.240
see here.
06:38.340 --> 06:44.080
Yeah, and that the movement of the vehicle can be explained by these
06:44.080 --> 06:47.640
three parameters x zero, v zero, and a.
06:48.300 --> 06:52.180
Now, we can proceed as in the case before.
06:52.540 --> 06:56.660
We make a certain number of measurements, where we measure the
06:56.660 --> 06:59.240
position of the car at each point in time.
06:59.740 --> 07:06.680
Then we create an optimization problem, where we say, okay, what we
07:06.680 --> 07:12.780
want is for each measurement, we want that the left-hand side and the
07:12.780 --> 07:16.960
right -hand side of this equation fits as best as possible.
07:17.520 --> 07:20.180
That should hold for all measurements in parallel.
07:20.880 --> 07:24.380
That means what we do is, we create this difference of the left-hand
07:24.380 --> 07:28.260
side and the right-hand side of this motion equation here.
07:28.740 --> 07:33.120
We sum up the squares of those differences over all measurements, and
07:33.120 --> 07:38.780
then we state, okay, we want to find those parameters which minimize
07:38.780 --> 07:39.660
this term.
07:40.040 --> 07:46.600
Again, we calculate the partial derivatives of this term, zero those,
07:46.760 --> 07:49.960
and resolve the system of equations with respect to the unknown
07:49.960 --> 07:54.020
variables a, v zero, and x zero.
07:54.140 --> 08:00.420
This also yields a system of linear equations, and yeah, if you like,
08:00.740 --> 08:01.980
just derive it at home.
08:03.180 --> 08:06.800
What are the advantages of such an approach, of regression approaches
08:06.800 --> 08:07.420
in general?
08:07.800 --> 08:11.460
Well, so first of all, these approaches are simple.
08:11.900 --> 08:15.720
The calculations that you need to do are rather simple.
08:16.260 --> 08:19.400
It's easy to use, efficient calculations.
08:19.840 --> 08:23.140
The only thing, the most expensive thing that you have to do
08:23.140 --> 08:27.820
computationally is, you have to solve a system of linear equations.
08:28.000 --> 08:33.480
But this small system of linear equations, this still is not really
08:33.480 --> 08:34.560
time -consuming.
08:35.940 --> 08:41.800
So that works easily for all straight movements in a one-dimensional
08:41.800 --> 08:42.300
case.
08:42.360 --> 08:46.440
So if you only consider the position of a vehicle along a road, for
08:46.440 --> 08:49.520
instance, it also works in a two-dimensional case.
08:49.520 --> 08:53.680
If you want to estimate the two-dimensional movement of a vehicle, so
08:53.680 --> 08:57.660
a movement on the plane, so assume a parking lot or something, a large
08:57.660 --> 09:02.180
planar area where a vehicle is moving, and you want to estimate its
09:02.180 --> 09:06.300
velocity, this can also be done with linear regression.
09:06.800 --> 09:08.660
And also in 3D, it would work.
09:08.740 --> 09:12.000
So if you have a spaceship or whatever, and you want to describe the
09:12.000 --> 09:14.620
movement of the spaceship, it would also work.
09:15.120 --> 09:18.520
So my suggestion is, give it a try.
09:18.680 --> 09:23.000
So if you are faced with such a problem, try such a regression
09:23.000 --> 09:27.600
approach, whether it works or not, and before you go to more
09:27.600 --> 09:32.060
complicated approaches to solve the task.
09:34.180 --> 09:37.620
Shortcomings of these methods is, of course, as soon as we are not
09:37.620 --> 09:43.980
faced anymore with straight movements, and we want to create motion
09:43.980 --> 09:48.820
models of not-straight movements, so for instance, for a car that can
09:48.820 --> 09:55.240
turn, so it can drive along a circle on a circular trajectory, then
09:55.240 --> 10:00.540
those approaches become non-linear, so we get non-linear optimization
10:00.540 --> 10:05.700
problems, and so we have to use more complicated numerical solvers to
10:05.700 --> 10:06.380
solve those.
10:06.500 --> 10:10.400
It's still possible, but it's computationally more demanding.
10:11.440 --> 10:16.660
So one example where we were using these kind of regression approaches
10:16.660 --> 10:18.960
is robot soccer.
10:19.240 --> 10:25.000
That's a project that I did many years ago meanwhile, so more than 10
10:25.000 --> 10:25.540
years ago.
10:26.120 --> 10:34.060
So we were operating autonomous robots, these ones here with a pink or
10:34.060 --> 10:38.120
the violet number signs.
10:39.820 --> 10:44.360
They were fully autonomous, yeah, they had a camera on board with
10:44.360 --> 10:50.260
which they could detect the environment and recognize objects in the
10:50.260 --> 10:50.680
environment.
10:51.380 --> 10:56.460
You see, the environment was color-coded in a certain way, so there
10:56.460 --> 10:59.900
was an orange ball so that it is easy to detect it.
11:00.480 --> 11:05.040
All the obstacles are black, the ground was green, there were white
11:05.040 --> 11:07.200
lines, so all these things.
11:07.540 --> 11:11.700
So a color-coded environment so that the perception is not too
11:11.700 --> 11:16.860
complicated, which was quite challenging, still challenging at these
11:16.860 --> 11:20.960
days when the computers were not that powerful as today, and the
11:20.960 --> 11:25.500
computer vision techniques as well were very limited in what they were
11:25.500 --> 11:26.280
able to do.
11:26.920 --> 11:33.580
However, so the task was to let those robots play soccer autonomously,
11:33.760 --> 11:39.020
so not tailor-operated, no one at the sideline that was having a
11:39.020 --> 11:43.320
joystick to operate them, but they had to decide on their own what to
11:43.320 --> 11:45.180
do and how to behave.
11:45.880 --> 11:49.640
And of course, for playing soccer, it is very important to know where
11:49.640 --> 11:53.120
the ball is, and it's also very important to know in which way the
11:53.120 --> 11:53.900
ball is moving.
11:54.680 --> 12:02.580
Yeah, obviously, if you want, for instance, to implement a goalkeeper
12:03.740 --> 12:08.620
and the goalkeeper only knows where the ball is at the point in time
12:08.620 --> 12:12.760
when the last measurement was made, then the goalkeeper will not be
12:12.760 --> 12:16.980
very successful, yeah, because it doesn't know in which direction to
12:16.980 --> 12:22.980
drive and where to expect the ball crossing the goal line.
12:24.640 --> 12:29.560
Okay, so we need the motion of the ball, and also we need our ego
12:29.560 --> 12:29.920
motion.
12:30.120 --> 12:34.320
We, of course, need to know how fast we are moving, in which direction
12:34.320 --> 12:34.940
we are moving.
12:35.660 --> 12:39.040
Yeah, so these were the two things that we were trying to estimate
12:39.040 --> 12:40.060
with regression.
12:40.860 --> 12:44.020
And of course, we would like also to estimate the movement of other
12:44.020 --> 12:47.920
robots, but at that point in time, the recognition of the other robots
12:47.920 --> 12:51.220
was not that stable and not that reliable that we could do it.
12:52.500 --> 12:57.120
Okay, so we're estimating the motion of the ball.
12:57.480 --> 12:59.480
Yeah, we did that with the regression approach.
12:59.480 --> 13:03.040
Just a brief sketch of how we did it.
13:03.400 --> 13:08.640
So, the first starting point of our development was at a point in time
13:08.640 --> 13:16.360
where the other soccer teams could not do chip kicks.
13:16.480 --> 13:21.380
So, they could only kick the ball in a flat way, so the ball was not
13:21.380 --> 13:22.260
leaving the ground.
13:22.980 --> 13:27.020
So, we could assume that the motion of the ball is a two-dimensional
13:27.020 --> 13:28.240
motion on the plane.
13:29.240 --> 13:33.680
And we also were assuming that at least for short periods of time, we
13:33.680 --> 13:36.680
can assume that the ball moves with constant velocity.
13:37.400 --> 13:42.900
So, if the ball is dribbled by another robot, or if the ball was
13:42.900 --> 13:47.040
kicked and now is moving afterwards, we were assuming that at least
13:47.040 --> 13:57.280
for a couple of, no, say for several hundreds of milliseconds, we can
13:57.280 --> 14:01.880
approximate the movement of the ball sufficiently well as a movement
14:01.880 --> 14:03.060
with constant velocity.
14:03.240 --> 14:08.400
Of course, it was not perfectly true, but it was still okay.
14:10.580 --> 14:17.140
For that purpose, what we did was we were estimating the velocity of
14:17.140 --> 14:20.800
the ball based on the last observations.
14:20.800 --> 14:26.120
We took between 3 and 15 observations, the last 3 to 15 observations,
14:26.680 --> 14:32.960
and we had a frame rate of the cameras of 33 milliseconds.
14:33.360 --> 14:41.380
That means, if we have just 3 observations, the time between the first
14:41.380 --> 14:46.740
and the last observation that we were using was roughly 66 to 67
14:46.740 --> 14:47.360
milliseconds.
14:48.220 --> 14:55.520
So, this was a minimal time duration, and with 15 observations, we had
14:55.520 --> 15:00.420
a total observation time of roughly half a second.
15:01.060 --> 15:05.460
So, based on that, we were applying this linear regression approach
15:05.460 --> 15:08.940
that you just saw, assuming constant velocity.
15:08.940 --> 15:12.380
We did that in a two-dimensional manner, so we were estimating the
15:12.380 --> 15:17.960
velocity in x direction and independently in y direction, so that we
15:17.960 --> 15:22.340
get a velocity vector that describes in which way the ball is moving.
15:23.700 --> 15:28.940
Of course, this assumption of constant velocity is sometimes violated,
15:29.440 --> 15:33.280
especially if the ball is kicked by a robot in this moment.
15:33.380 --> 15:37.260
When it is kicked, of course, it is immediately changing its velocity
15:37.260 --> 15:39.540
in a very hard way.
15:40.000 --> 15:44.440
Or, for instance, if the ball collides with a robot or an obstacle,
15:44.660 --> 15:50.980
then its moving direction also changes a lot, so at these points in
15:50.980 --> 15:51.280
time.
15:51.920 --> 15:55.000
And, of course, at that point in time, the assumption of constant
15:55.000 --> 15:56.300
velocity is violated.
15:56.900 --> 16:04.380
To deal with that, we had this adaptive observation length.
16:04.540 --> 16:09.140
That means we always observe whether a new measurement fits to the
16:09.140 --> 16:10.820
motion model that we have so far.
16:11.280 --> 16:17.480
And if we observe that one or two times, the deviation between the
16:17.480 --> 16:24.280
observed position and the expected position using the estimated motion
16:24.280 --> 16:29.220
model deviates very much, then we said, okay, here it seems that
16:29.220 --> 16:32.340
something happened that violates our basic assumption.
16:32.720 --> 16:37.040
So, in these cases, we shrink again the number of observations to our
16:37.040 --> 16:38.320
minimum number of three.
16:39.120 --> 16:44.400
And otherwise, we increase the observation time up to the maximum
16:44.400 --> 16:45.380
observation time.
16:46.060 --> 16:50.280
The more observations we have, in general, if the assumption is not
16:50.280 --> 16:53.840
violated, of course, the better the estimate is, the estimated
16:53.840 --> 16:54.520
movement is.
16:54.860 --> 16:59.660
However, if we observe that our basic assumption is violated, we
16:59.660 --> 17:10.240
shrink the observation interval in order to cope with these changes.
17:11.660 --> 17:19.260
Okay, then later on, some teams, some soccer teams started to kick the
17:19.260 --> 17:23.740
ball not only in a flat way on the ground, on the soccer ground, but
17:23.740 --> 17:28.380
they started to kick the ball in such a way, doing some chip kicks.
17:29.900 --> 17:34.740
That caused a lot of problems for us because we didn't have a stereo
17:34.740 --> 17:37.580
vision system on board of our robots at that time.
17:37.940 --> 17:41.280
So, we were not able to determine the three-dimensional position.
17:41.280 --> 17:46.600
However, we were able to detect that the ball is not on the ground by
17:46.600 --> 17:49.040
doing some plausibility analysis.
17:52.100 --> 17:56.760
Later on, we installed a stereo vision system, a binocular camera
17:56.760 --> 17:59.980
vision on one of the robots, especially on the goalkeeper.
18:00.680 --> 18:03.700
And then the goalkeeper was able to determine the three- dimensional
18:03.700 --> 18:04.960
position of the ball.
18:05.220 --> 18:09.220
And then we were able to model also the vertical movement of the ball.
18:11.820 --> 18:15.880
For that purpose, we were assuming that the movement in vertical
18:15.880 --> 18:18.260
direction is an accelerated movement.
18:18.600 --> 18:24.820
And we already knew the acceleration, namely, that is just the gravity
18:24.820 --> 18:26.620
that accelerates the ball.
18:26.820 --> 18:28.720
That is, we know it.
18:29.500 --> 18:33.620
The only thing is that we also had to consider that once the ball
18:33.620 --> 18:38.100
meets the ground, it's bouncing back and not continuing this motion.
18:38.340 --> 18:44.240
So, again, this kind of model checking be done whether or not the ball
18:44.240 --> 18:46.420
bounces at the ground.
18:47.960 --> 18:51.940
And if it bounces, we're assuming that it again moves upwards
18:51.940 --> 18:52.520
afterwards.
18:53.920 --> 18:58.940
So, in total, what we did is a regression approach plus some model
18:58.940 --> 19:05.320
checking in order to cover situations in which the assumptions of
19:05.320 --> 19:09.700
constant or accelerated vertical motion are violated.
19:10.900 --> 19:16.400
So, some example here.
19:17.100 --> 19:24.040
So, this is a test case that were recorded in our lab and that is now
19:24.040 --> 19:27.020
played in a cycle again and again.
19:27.480 --> 19:29.740
So, we see the soccer ground, obviously.
19:30.280 --> 19:33.200
Then you see this cyan triangle.
19:34.000 --> 19:34.880
Where is it?
19:37.820 --> 19:42.840
The cyan triangle here.
19:43.140 --> 19:45.880
That is actually our robot.
19:47.300 --> 19:49.420
Here you see two circles.
19:49.840 --> 19:53.860
One dark red circle, dark red dashed circle.
19:54.280 --> 19:58.260
That is actually the position of the ball that was sensed with the
19:58.260 --> 19:59.760
camera at that point in time.
20:00.760 --> 20:09.640
And then this solid red circle, that is the estimated position of the
20:09.640 --> 20:12.020
ball after having done the regression.
20:12.540 --> 20:16.940
And then we see this line here that is indicating the movement
20:16.940 --> 20:20.480
direction and the velocity that was estimated.
20:20.640 --> 20:25.000
So, the longer the line, the larger the velocity that we estimated.
20:25.480 --> 20:29.560
So, the motion actually starts here at this point.
20:30.360 --> 20:35.580
The first two cycles that you observe there is only this observed ball
20:35.580 --> 20:39.500
because we do not have enough measurements yet to estimate the
20:39.500 --> 20:39.820
velocity.
20:40.640 --> 20:47.900
And then this solid circle occurs and this indicates that we were able
20:47.900 --> 20:49.860
to estimate the velocity of the ball.
20:50.460 --> 20:56.400
And if we look at the direction of this line that indicates the motion
20:56.400 --> 21:00.720
of the ball, we see that it roughly fits to the real motion of the
21:00.720 --> 21:00.940
ball.
21:01.500 --> 21:07.900
Now, here maybe this direction to the left, the estimated motion is a
21:07.900 --> 21:08.940
little bit too much.
21:09.560 --> 21:15.760
It points to the right, but okay, the precision of this camera system
21:15.760 --> 21:18.960
was not that good and so there were some mistakes.
21:19.500 --> 21:23.460
So, actually the ball was rolling over the field at that point in
21:23.460 --> 21:23.720
time.
21:23.940 --> 21:28.180
So, that also means that the velocity was decreasing slowly over time.
21:29.960 --> 21:34.700
Okay, so that was the ball detection and the ball velocity estimation.
21:36.080 --> 21:40.600
We also use that, as I said, to estimate the motion of the robot.
21:41.260 --> 21:44.360
And for the robot motion, the things were a little bit different.
21:44.520 --> 21:48.420
So, we were not assuming straight movement of the robot, but a
21:48.420 --> 21:51.600
movement with constant jaw rate.
21:51.600 --> 21:54.640
That means a movement on a circular trajectory.
21:55.560 --> 22:00.240
And the measurements that we had, again, were the position of the
22:00.240 --> 22:04.200
robot and also the orientation of the robot that the robot had.
22:04.760 --> 22:09.840
And based on that, we were deriving a technique based on regression
22:09.840 --> 22:14.440
with which we could efficiently estimate also the movement of the
22:14.440 --> 22:14.700
robot.
22:16.480 --> 22:23.780
Okay, so far this excursion to robot soccer and so far the discussion
22:23.780 --> 22:29.380
of regression techniques for estimating velocities of objects in the
22:29.380 --> 22:30.000
world.
22:31.640 --> 22:37.440
And now we want to go to a different technique, which is the technique
22:37.440 --> 22:41.800
of Bayesian filters, including Kalman filters and things like that.
22:41.840 --> 22:44.580
They are very much based on probability theory.
22:45.320 --> 22:49.780
And since I guess that most of you do not had a lecture on probability
22:49.780 --> 22:56.140
theory so far, or maybe it is so long ago that you don't remember the
22:56.140 --> 23:01.240
things, I want to start with a very brief repetition of a probability
23:01.240 --> 23:07.260
theory so that we have the basics again and can start based on these
23:07.260 --> 23:09.280
basic ideas of probability theory.
23:10.460 --> 23:15.880
Okay, so it's actually, I must say, not mathematically completely
23:15.880 --> 23:24.180
correct this presentation, but it's a compromise in order to have it
23:24.180 --> 23:26.120
short and intuitive.
23:27.760 --> 23:30.060
Okay, but let's start.
23:30.700 --> 23:36.440
So, first of all, when we talk about probability theory, we have to
23:36.440 --> 23:41.600
talk about random events, so things that might occur or might not
23:41.600 --> 23:42.600
occur randomly.
23:43.380 --> 23:48.680
And if we have such random event, say A, and behind this A there might
23:48.680 --> 23:53.940
be any event that you can imagine, we might assume a probability for
23:53.940 --> 23:54.220
that.
23:54.380 --> 23:58.800
And the probability, intuitively speaking, is something like a
23:58.800 --> 24:02.500
frequency with which we expect that this event occurs.
24:03.520 --> 24:08.220
So, yeah, and what A might be is up to the problem.
24:08.540 --> 24:11.820
A might be in the context of automated driving.
24:12.020 --> 24:17.620
How likely is it that a vehicle ahead of an ego vehicle turns right at
24:17.620 --> 24:18.220
an intersection?
24:19.040 --> 24:20.460
Yeah, could be such an event.
24:20.800 --> 24:26.100
It could be the event it's raining tomorrow, whether or not it's
24:26.100 --> 24:26.760
raining tomorrow.
24:26.920 --> 24:31.220
It's up to random, we don't know in advance, but however, we can make
24:31.920 --> 24:36.040
a kind of guess how likely it is that it's raining tomorrow.
24:37.220 --> 24:39.580
Okay, so that's the probability.
24:40.120 --> 24:45.280
Then, if we have several events, those events might occur at the same
24:45.280 --> 24:46.240
time or not.
24:46.840 --> 24:51.520
Yeah, let's assume we have two events, A and B, and now we can define
24:51.520 --> 24:53.840
something like a joint probability.
24:54.260 --> 24:59.400
Joint probability means a relative frequency, so to say, with which we
24:59.400 --> 25:04.220
expect that those two events occur at the same time.
25:07.290 --> 25:11.390
So, for instance, in automated driving, if we are facing that again,
25:11.570 --> 25:14.790
and we ask whether a vehicle is turning right at the next
25:14.790 --> 25:17.470
intersection, that would be an event, A.
25:17.750 --> 25:23.350
And another event would be that the vehicle ahead of us is activating
25:23.350 --> 25:28.010
the indicator lights, though, whether it's blinking right or not.
25:28.730 --> 25:37.620
Now, these two events, these are two events, and we can somehow make a
25:37.620 --> 25:43.880
guess how likely it is, how often we expect that a vehicle ahead of us
25:43.880 --> 25:50.020
turns right at an intersection and has its indicator lights activated.
25:50.940 --> 25:54.620
Yeah, we see that already in this example that there is a relationship
25:54.620 --> 26:00.560
between these two events, A and B, but, of course, from A does not
26:00.560 --> 26:06.400
fully follow, completely follows that B also holds and vice versa.
26:06.640 --> 26:12.200
So, sometimes vehicles turn right without activating the indicator
26:12.200 --> 26:13.700
lights, sometimes they do.
26:14.140 --> 26:15.780
There is a stochastic relationship.
26:16.120 --> 26:21.580
There is some probability that if A occurs, B also occurs, but it's
26:21.580 --> 26:22.100
not sure.
26:23.880 --> 26:28.740
Then the third kind of probabilities that we need to introduce are so
26:28.740 --> 26:33.820
-called conditional probabilities, also referring to two events, A and
26:33.820 --> 26:34.060
B.
26:34.540 --> 26:40.220
And they say, more or less, they model how often or how likely it is
26:40.220 --> 26:46.980
that event A occurs if in those cases and only in those cases in which
26:46.980 --> 26:49.740
event B occurred.
26:50.960 --> 26:55.840
Yeah, so if we consider again this example with a car that is
26:55.840 --> 27:00.160
approaching an intersection, might turn right, might have its
27:00.160 --> 27:06.000
indicator lights activated, this probability would mean how often does
27:06.000 --> 27:12.140
it happen that a car turns right at the intersection if it is having
27:12.140 --> 27:14.580
its indicator lights being activated.
27:17.140 --> 27:21.780
This should not be confused with the joint probability of A and B.
27:22.460 --> 27:28.800
The joint probability of A and B means for all cars that are
27:28.800 --> 27:34.280
approaching an intersection, how probable, how likely is it, how often
27:34.280 --> 27:40.400
do we observe that the car turns right and has its indicator lights
27:40.400 --> 27:41.720
being activated.
27:42.400 --> 27:46.960
While the conditional probability says, only in those cases, for those
27:46.960 --> 27:50.980
cars which have their indicator lights being activated, how probable,
27:51.320 --> 27:56.240
how often do we observe that such a car is turning right.
27:57.200 --> 28:01.620
Yeah, so the two probabilities are related to each other, the joint
28:01.620 --> 28:05.540
probability and the conditional probability, but they are not the same
28:05.540 --> 28:11.860
and they cannot be exchanged that easily.
28:13.220 --> 28:17.700
Okay, so these are the three type of probabilities that we have to
28:17.700 --> 28:18.440
consider.
28:19.340 --> 28:25.300
So, besides random events, there is something else in probability
28:25.300 --> 28:27.860
theory that is called random variables.
28:28.340 --> 28:33.980
So, often in probability theory, we do not deal just with kind of
28:33.980 --> 28:39.300
crisp events, something like turns right or not, where we can say
28:39.300 --> 28:43.680
either A or not, whether the event occurs or the event does not occur.
28:44.180 --> 28:50.120
But we are often faced with numbers, with numbers that are somehow
28:51.260 --> 28:53.600
randomly chosen, yeah.
28:54.500 --> 28:58.480
So, and these numbers which are somehow randomly chosen are called
28:58.480 --> 28:59.720
random variables.
29:00.600 --> 29:04.440
So, they are two different cases that we consider to be, or that we
29:04.440 --> 29:04.840
distinguish.
29:05.680 --> 29:10.780
The first one are discrete random variables, so random variables which
29:10.780 --> 29:14.140
only might take on integer numbers.
29:15.620 --> 29:21.000
So, for instance, only positive integer numbers, but just say
29:21.000 --> 29:21.620
integers.
29:22.220 --> 29:28.440
And the continuous variables are those which can take any real values,
29:28.700 --> 29:31.000
just real values.
29:31.360 --> 29:35.560
So, for instance, if we say the velocity of a car that we observe,
29:36.040 --> 29:38.760
that's not just an integer.
29:39.180 --> 29:43.300
The car cannot only drive one kilometer per hour, and two, and three,
29:43.460 --> 29:49.280
and four, and five, but it can also drive 4.3 or 4.8 kilometers per
29:49.280 --> 29:49.540
hour.
29:50.260 --> 29:53.440
And therefore, this is a continuous random variable.
29:54.180 --> 30:01.060
While, for instance, things that we can count, yeah, or that can only
30:01.060 --> 30:06.240
take on some integer numbers, like for instance, the question how many
30:06.240 --> 30:09.780
bicycles are parked in front of this lecture hall, that's an integer.
30:09.780 --> 30:12.540
These are discrete random variables.
30:13.000 --> 30:17.680
And the treatment of this discrete and continuous random variables is
30:17.680 --> 30:20.580
a little bit different, and therefore, we have to make this
30:20.580 --> 30:21.040
distinction.
30:21.980 --> 30:26.380
So, now, these random variables are somehow related to random events.
30:26.520 --> 30:32.300
So, what are typical random events which can be defined for random
30:32.300 --> 30:32.840
variables?
30:33.380 --> 30:38.540
So, for discrete random variables, a typical event that we might be
30:38.540 --> 30:44.620
interested in is something like this one here, the event that this
30:44.620 --> 30:49.820
discrete random variable Y takes on a certain integer, now that Y is
30:49.820 --> 30:50.520
equal to 4.
30:50.560 --> 30:54.620
So, we write it here with these box brackets.
30:56.400 --> 31:00.860
That means inside of these box brackets, that's, so to say, a
31:00.860 --> 31:04.280
condition that describes the event, yeah?
31:04.480 --> 31:09.800
Or another event for discrete variables could be that the random
31:09.800 --> 31:14.960
variable Y is inside of a certain interval of numbers.
31:15.240 --> 31:16.780
That's also a typical event.
31:17.400 --> 31:21.340
And another one would be that we have a certain subset of the integers
31:21.340 --> 31:27.240
and ask how, yeah, and define that as an event, that this variable Y
31:27.240 --> 31:32.920
takes on a value within the subset of the integers, yeah?
31:33.160 --> 31:36.900
For each of those events, again, we can define probabilities, yeah?
31:36.900 --> 31:41.640
We can say what is the probability of the event that Y is equal to 4
31:41.640 --> 31:45.600
or that Y is in the set of 2, 3, 7, and 11.
31:46.720 --> 31:52.260
For continuous variables, things are a little bit different.
31:53.040 --> 31:59.300
So, for theoretical reasons, for axiomatic reasons, it's impossible to
31:59.300 --> 32:03.940
define an event that a continuous random variable takes on a certain
32:03.940 --> 32:09.120
value, that A is exactly equal to 2.3 or something like that.
32:09.380 --> 32:13.960
That is nothing, that is no random event, yeah?
32:14.340 --> 32:20.080
We can, for continuous variables, we can only define random events
32:20.080 --> 32:21.140
based on intervals.
32:21.480 --> 32:27.420
We can ask, we can define a random event, for instance, whether X,
32:27.840 --> 32:33.660
variable X, is within a certain interval, for instance, between 7.1
32:33.660 --> 32:34.600
and 8.3.
32:35.180 --> 32:39.200
And we can also, of course, take union of several intervals and ask
32:39.200 --> 32:46.680
how likely is it that the random variable X is between 4.5 and 8.2 or
32:46.680 --> 32:49.600
between 10.3 and 11.2.
32:50.080 --> 32:57.960
We can have also unbounded intervals, like the interval from 15 on to
32:57.960 --> 33:00.040
infinity, that is also possible.
33:00.220 --> 33:03.520
But the main, the important thing is that for continuous random
33:03.520 --> 33:08.020
variables, we can only define random events based on intervals and not
33:08.020 --> 33:09.960
on individual numbers.
33:12.800 --> 33:19.980
That comes from the theoretical basis of probability theory that we
33:19.980 --> 33:25.680
cannot, we can only define random events for continuous variables
33:25.680 --> 33:26.600
based on intervals.
33:27.560 --> 33:37.540
Okay, so now, now once we define these random, these probabilities and
33:37.540 --> 33:44.440
these random events, we might introduce some rules to calculate these
33:44.440 --> 33:45.100
probabilities.
33:45.920 --> 33:51.700
And the first rule is called a is called the marginalization rule.
33:53.140 --> 34:01.320
It relates a simple probability for a single event with a joint
34:01.320 --> 34:05.480
probability of two events, or one event and a random variable.
34:05.820 --> 34:08.980
And it's defined like that for discrete random variables.
34:09.460 --> 34:13.960
So we assume we have an event A, doesn't matter what it is, just an
34:13.960 --> 34:17.640
arbitrary event A, and we have a discrete random variable Y.
34:19.080 --> 34:28.640
So now we might ask, in which way is this probability of A related to
34:28.640 --> 34:34.040
the joint probability of A, and the event that Y, the random variable
34:34.040 --> 34:36.580
Y, takes on a certain value.
34:37.140 --> 34:41.940
And the solution, or this relationship, is shown here.
34:42.660 --> 34:52.340
So once we know all these joint probabilities of the event A, and the
34:52.340 --> 34:56.940
events that the random variable capital Y takes on a certain value,
34:57.780 --> 35:04.060
then we can derive the probability of A, without considering Y, from
35:04.060 --> 35:08.960
these joint probabilities in this way as it is shown on the slide.
35:09.520 --> 35:14.040
We consider all these joint probabilities of event A, and all possible
35:14.040 --> 35:19.740
values of random variable Y, and sum up over all these possibilities.
35:20.800 --> 35:25.160
And by doing that, we get rid of this second random variable Y.
35:25.960 --> 35:32.900
So we can eliminate the influence of a second random variable, or a
35:32.900 --> 35:40.140
certain random variable, by summing up joint probabilities over all
35:40.140 --> 35:43.740
possible values that this random variable might take.
35:45.440 --> 35:50.340
And by doing that, we can get from joint probabilities to
35:50.340 --> 35:54.220
probabilities of a single event, or vice versa.
35:54.500 --> 35:57.960
Once we have a probability for a single event, and we want to
35:57.960 --> 36:06.220
introduce a second random variable Y, we can relate this probability
36:06.220 --> 36:11.280
of A to the joint probabilities of A, and the random variable Y.
36:12.500 --> 36:18.800
Okay, this rule is known as the marginalization rule, and in this
36:18.800 --> 36:23.760
case, this probability on the left hand side, where the random
36:23.760 --> 36:29.840
variable Y has disappeared, is called the marginal distribution of
36:29.840 --> 36:32.220
this joint distribution.
36:32.780 --> 36:36.920
Yeah, marginal... the word marginal comes from the fact that if we
36:36.920 --> 36:41.680
write down all these joint probabilities in a large table, then... and
36:41.680 --> 36:46.960
we calculate the sums in each row, then we get those marginal
36:46.960 --> 36:47.980
probabilities.
36:48.220 --> 36:51.740
So we write them at the margin of a table, and therefore they are
36:51.740 --> 36:52.680
called marginals.
36:53.800 --> 36:58.260
Then the second calculation rule is this one here.
36:59.100 --> 37:03.660
It relates the conditional probability to the marginal and the joint
37:03.660 --> 37:04.360
probability.
37:05.240 --> 37:11.060
And it says that the joint probability of two events A and B is equal
37:11.060 --> 37:16.560
to the conditional probability of A given B times the marginal
37:16.560 --> 37:17.840
probability of B.
37:18.480 --> 37:23.340
And since the order of these events in the joint probability A and B
37:23.340 --> 37:26.840
doesn't matter, it can be exchanged without any change.
37:27.240 --> 37:32.480
This is also equal to the conditional probability of B given A times
37:32.480 --> 37:34.300
the marginal probability of A.
37:34.880 --> 37:39.460
So this is an important rule to get this relationship between joint
37:39.460 --> 37:43.640
probabilities and conditional probabilities, and describes in which
37:43.640 --> 37:47.080
way they are related to each other.
37:48.620 --> 37:54.920
We can also extend this rule if we are faced with three or more random
37:54.920 --> 37:55.700
events.
37:57.620 --> 38:01.800
For instance, if we have three random events A, B, and C, then we can
38:01.800 --> 38:06.840
group those random events into two groups, and then apply this rule
38:06.840 --> 38:09.900
for those groups of random variables.
38:10.080 --> 38:14.440
So for instance, we could say we group those three random events A, B,
38:14.520 --> 38:18.660
and C into one small group with only event A, and one larger group
38:18.660 --> 38:21.940
with events B and C, and then apply this rule.
38:22.380 --> 38:26.120
And then this means this is equal to the probability of A given B and
38:26.120 --> 38:29.520
C times the joint probability of B and C.
38:30.600 --> 38:32.720
We can also group it in a different way.
38:32.800 --> 38:37.100
We can also say, okay, we group the events A and B in one group and
38:37.100 --> 38:41.140
the events C in another group, and then we get the probability of A
38:41.140 --> 38:43.840
and B given C times the probability of C.
38:44.360 --> 38:51.860
And like that, we can calculate with those probabilities.
38:53.140 --> 38:59.880
And finally, the third important rule that is actually derived from
38:59.880 --> 39:05.720
this rule for conditional probabilities is a so-called Bayesian Bayes
39:05.720 --> 39:06.400
theorem.
39:09.220 --> 39:13.880
That is given here, and it's actually... you can derive it directly
39:13.880 --> 39:18.120
from this row here by dividing this row by probability of B.
39:18.340 --> 39:20.140
Then we get this theorem.
39:21.000 --> 39:26.180
It says that the conditional probability of A given B is equal to the
39:26.180 --> 39:33.040
probability of B given A times the marginal probability of A over the
39:33.040 --> 39:34.620
marginal probability of B.
39:35.480 --> 39:37.520
For which case is that interesting?
39:37.820 --> 39:40.000
So let's go back to our example.
39:40.480 --> 39:42.460
We observe a car at an intersection.
39:43.360 --> 39:47.980
One event A is the car is moving right, turning right at the
39:47.980 --> 39:48.460
intersection.
39:49.260 --> 39:54.140
The probability... the event B is it's blinking right, so its
39:54.140 --> 39:55.860
indicator lights are active.
40:00.700 --> 40:09.350
So then this probability means, how likely is it that a car that is
40:09.350 --> 40:15.630
blinking right, that has its indicator lights active, is really
40:15.630 --> 40:16.390
turning right?
40:18.300 --> 40:24.030
And this probability here means, how likely is it that a car blinks
40:24.030 --> 40:25.490
that wants to turn right?
40:27.330 --> 40:33.050
So these are two different probabilities, but they are related to each
40:33.050 --> 40:33.390
other.
40:34.010 --> 40:35.610
But they are not the same.
40:36.250 --> 40:40.930
They are related by the relationship that is given by Bayes' theory.
40:42.250 --> 40:46.450
And sometimes it's easy to determine one of those conditional
40:46.450 --> 40:50.670
probabilities, but very difficult to determine the other kind of
40:50.670 --> 40:51.090
probability.
40:51.990 --> 40:58.570
So if maybe we might be able to state that if a car wants to turn
40:58.570 --> 41:04.290
right, it is activating its indicator lights with a large probability,
41:04.470 --> 41:07.730
say with 90 percent or so, that might be easy.
41:08.670 --> 41:15.710
It is more difficult to conclude about if the car is blinking right,
41:15.710 --> 41:19.090
how likely is it really turning right?
41:19.950 --> 41:24.090
That is what we want to know when we are following this car and have
41:24.090 --> 41:29.970
to interact with it, and want to calculate its future behavior.
41:31.610 --> 41:33.110
Okay, so that's Bayes' theorem.
41:33.430 --> 41:38.670
Again, we can extend it to more than two variables, and then again we
41:38.670 --> 41:46.030
can group those variables and deal with groups of variables, or we can
41:46.030 --> 41:50.390
keep variables here in this condition part without changing them.
41:50.770 --> 41:54.530
So in this case, for instance, we could keep c always in the condition
41:54.530 --> 41:59.230
part of all the probabilities here, and only exchange the role of a
41:59.230 --> 41:59.630
and b.
42:00.070 --> 42:01.130
That's also possible.
42:01.520 --> 42:08.270
Yeah, if you want to prove that, you just have to apply this
42:08.270 --> 42:11.850
definition here, or in the case of three variables, these definitions
42:11.850 --> 42:15.130
here, and then you can prove that this is also true.
42:17.130 --> 42:22.650
Okay, these are the three major, or the three really important rules
42:22.650 --> 42:25.410
for dealing, calculating these probabilities.
42:25.630 --> 42:30.030
Once you know these rules, then you can actually do the whole
42:30.030 --> 42:31.230
probability theory.
42:32.510 --> 42:37.510
So now, the next concept is the concept of stochastic independence.
42:37.770 --> 42:38.350
What's that?
42:39.350 --> 42:43.950
Well, if we observe these two events, car is blinking and car is
42:43.950 --> 42:49.690
turning at an intersection, we easily see that there is some
42:49.690 --> 42:51.450
relationship between these events.
42:51.790 --> 42:55.590
Yeah, if we observe a car blinking, then it's actually very probable
42:55.590 --> 42:59.750
that it's also turning at the intersection, though there is a strong
42:59.750 --> 43:02.250
connection between these two events.
43:03.030 --> 43:09.970
However, if we consider two different events, say the car ahead of us
43:09.970 --> 43:14.330
is turning right at the intersection, that's event A, and event B
43:14.330 --> 43:22.130
would be the event that it's raining tomorrow, we hardly see any
43:22.130 --> 43:24.450
relationship between these events.
43:24.830 --> 43:29.010
If we know that it's raining tomorrow, this doesn't help us at all to
43:29.010 --> 43:32.970
predict whether the car ahead of us will turn right or not at the
43:32.970 --> 43:33.390
intersection.
43:33.710 --> 43:36.970
So they are not coupled, they are independent of each other.
43:37.330 --> 43:42.510
And this idea of being independent, of not having any influence on
43:42.510 --> 43:45.570
each other, is called stochastic independence.
43:46.210 --> 43:52.570
And it's defined in such a way that two events A and B are said to be
43:52.570 --> 43:57.510
stochastically independent if the conditional probability of B given A
43:57.510 --> 44:00.350
is equal to the marginal probability of B.
44:01.090 --> 44:09.530
That means, if we only consider cases in which A, the event A occurs,
44:09.830 --> 44:17.290
and we ask what happens with event B, is it is it met or not, then
44:17.290 --> 44:21.470
this knowledge of event A doesn't help us at all.
44:22.290 --> 44:31.130
If we ignore this knowledge, we cannot say more or less than if we
44:31.130 --> 44:33.310
know whether A is true or not.
44:33.890 --> 44:35.590
So this is stochastic independence.
44:35.770 --> 44:39.390
And of course, many things are assumed to be stochastically
44:39.390 --> 44:40.070
independent.
44:41.030 --> 44:47.510
What happens on the road here in Europe, of course, seems to be, at
44:47.510 --> 44:52.590
least we can assume, that it is independent of what goes on on the
44:52.590 --> 44:55.730
roads somewhere else on earth, for instance.
44:56.530 --> 45:02.270
Or we might also assume that a
45:05.940 --> 45:11.780
decision of one driver and a decision of another driver that is not
45:11.780 --> 45:17.820
directly in the vicinity of the first driver, that those decisions and
45:17.820 --> 45:20.520
those behaviors are also independent of each other.
45:21.200 --> 45:25.980
So independence is really a nice concept because it simplifies all
45:25.980 --> 45:26.620
calculations.
45:27.400 --> 45:28.900
And that's the important thing.
45:30.040 --> 45:35.880
So actually, the definition is, as I said, the conditional probability
45:35.880 --> 45:40.820
of B given A is equal to B, the probability of B, then those events
45:40.820 --> 45:43.040
are said to be stochastically independent.
45:43.260 --> 45:47.960
And this condition is equivalent to those two conditions that are
45:47.960 --> 45:48.500
given here.
45:49.360 --> 45:54.020
We can use the rules for calculating those probabilities and then you
45:54.020 --> 45:56.300
can easily show that this is actually the same.
45:58.380 --> 46:02.240
So now, let's go on.
46:02.320 --> 46:07.580
For discrete variables, we saw that we can identify events,
46:07.940 --> 46:15.120
probability, random events, where we can say, okay, we are interested
46:15.120 --> 46:20.300
in the probability that this discrete random variable takes on a
46:20.300 --> 46:22.240
certain integer number.
46:23.140 --> 46:28.020
And if you like, we can create a large table with all the integers
46:28.020 --> 46:34.580
which this variable can take on and ask for each of these integer
46:34.580 --> 46:38.940
numbers, how probable is it that this random variable takes on this
46:38.940 --> 46:39.520
value.
46:39.660 --> 46:46.020
So we can represent the probabilities for all individual integers
46:46.020 --> 46:51.600
explicitly and by doing that describe the whole probability
46:51.600 --> 46:52.620
distribution.
46:53.620 --> 46:58.900
For continuous random variables, as I said, this is not that easy
46:58.900 --> 47:04.960
because, well, all these individual numbers, the real numbers, they
47:04.960 --> 47:06.220
are uncountable.
47:06.220 --> 47:08.520
So we cannot make such a table.
47:08.960 --> 47:09.880
It's impossible.
47:10.540 --> 47:15.080
And we have seen or I've said that we can only define probabilities
47:15.080 --> 47:22.400
for intervals, that the variable, the continuous variable, is located
47:22.400 --> 47:25.400
inside of a certain interval between two numbers.
47:26.160 --> 47:29.640
Now the question is, how can we represent these probabilities
47:29.640 --> 47:30.760
efficiently?
47:31.020 --> 47:36.020
So there are, of course, also an uncountable amount of intervals of
47:36.020 --> 47:36.820
the real axis.
47:37.400 --> 47:42.100
So we cannot make a table and for each interval write down its
47:42.100 --> 47:42.760
probability.
47:43.080 --> 47:45.180
So how can we represent that efficiently?
47:45.840 --> 47:50.120
And the solution is to introduce something, a technique that is called
47:50.120 --> 47:54.280
probability density functions or for short PDF functions.
47:54.700 --> 47:56.280
So probability density function.
47:56.360 --> 48:05.400
That's a function that is non-negative for all values so that it can
48:05.400 --> 48:09.680
be zero, it can take on real positive numbers but never negative
48:09.680 --> 48:10.100
numbers.
48:10.900 --> 48:16.060
And we need that the integral from minus infinity to plus infinity of
48:16.060 --> 48:19.080
this probability density function is equal to one.
48:19.200 --> 48:22.820
That comes from the axioms of probability theory.
48:23.800 --> 48:28.540
And with such a probability density function, we can represent the
48:28.540 --> 48:37.100
probability of intervals, so intervals like that, the probability that
48:37.100 --> 48:41.420
the random variable is located between two numbers a and b.
48:42.340 --> 48:47.580
And by this definition, so if we use a probability density function,
48:48.200 --> 48:52.660
we said that the probability of x being located between a and b is
48:52.660 --> 48:58.320
equal to the integral of this probability density function ranging
48:58.320 --> 48:59.860
from a to b.
49:00.880 --> 49:06.300
So we use this representation to represent the probability
49:06.300 --> 49:12.880
distribution, we say probability distribution of such a continuous
49:12.880 --> 49:13.860
random variable.
49:15.360 --> 49:21.120
So Px in this case is the probability density function for random
49:21.120 --> 49:24.740
variable capital X evaluated at a certain position.
49:25.460 --> 49:29.420
So the probability density function is not a probability, so it's not
49:29.420 --> 49:33.200
something that we can directly interpret as probability.
49:34.280 --> 49:39.380
And it can also be larger than one, different than probabilities which
49:39.380 --> 49:41.100
cannot be larger than one.
49:41.780 --> 49:45.400
But this probability density function can become larger than one, it
49:45.400 --> 49:47.060
can become very large.
49:49.980 --> 49:57.640
However, it is somehow, say, somehow proportional to the probability
49:57.640 --> 50:02.460
or approximately proportional to the probability that the random
50:02.460 --> 50:08.060
variable takes on a value in the vicinity of the special value at
50:08.060 --> 50:11.340
which we evaluate the probability density function.
50:12.540 --> 50:15.340
Okay, that's a very important concept.
50:15.980 --> 50:20.460
So for this probability density functions, we can do actually the same
50:20.460 --> 50:26.660
calculations as we did for discrete probabilities, for probabilities
50:26.660 --> 50:28.440
of discrete random variables.
50:28.800 --> 50:33.820
But everywhere where we had probabilities so far, now we have to use
50:33.820 --> 50:35.400
probability density functions.
50:35.700 --> 50:39.880
And everywhere where we had summation, we have to replace it by
50:39.880 --> 50:40.560
integration.
50:41.040 --> 50:45.140
Yeah, that means this marginalization rule that we introduced for
50:45.140 --> 50:49.900
discrete random variables can be rewritten for continuous random
50:49.900 --> 50:53.020
variables in the way that is shown here on the slide.
50:53.340 --> 50:56.680
So we see that here on the right hand side, we have a probability
50:56.680 --> 51:02.380
density function for the joint, for the pair of continuous random
51:02.380 --> 51:03.740
variables X and Y.
51:03.860 --> 51:08.580
This is defined in actually the same way as basic probability density
51:08.580 --> 51:11.260
functions, just for a pair of numbers.
51:12.680 --> 51:19.200
And we can relate this probability density function of this pair of
51:19.200 --> 51:26.540
random variables to the marginal probability density function only for
51:26.540 --> 51:32.140
variable capital X by integrating out the influence of random variable
51:32.140 --> 51:38.140
Y, by taking the integral of the joint probability density function
51:38.140 --> 51:43.280
over the interval from minus infinity to plus infinity.
51:47.630 --> 51:52.990
The next step, the conditional probabilities, the same applies as for
51:52.990 --> 51:55.090
discrete random variables.
51:55.090 --> 52:01.590
We just replace the probability term by the probability density here
52:01.590 --> 52:04.930
and the same, but the same rules actually apply.
52:05.870 --> 52:10.390
And for Bayes' rule, the same applies as well.
52:10.550 --> 52:15.230
We exchange probabilities by probability density functions and still
52:15.230 --> 52:16.890
the theorem holds.
52:18.890 --> 52:22.190
So now, these probability density functions.
52:22.590 --> 52:28.030
Of course, all functions which are non-negative and which, for which
52:28.030 --> 52:31.670
the integral from minus infinity to infinity is equal to one, can be
52:31.670 --> 52:33.710
used as probability density functions.
52:34.250 --> 52:37.250
And of course, which one is suitable to represent the real
52:37.250 --> 52:40.430
distribution of a certain random variable depends on the random
52:40.430 --> 52:41.890
variable, yeah?
52:42.470 --> 52:44.470
So, and there are...
52:45.390 --> 52:45.730
yeah.
52:46.350 --> 52:50.450
However, some choices of probability density functions have become
52:50.450 --> 52:56.310
very successful and very powerful and very useful to represent things
52:56.310 --> 52:58.730
in natural life.
52:59.270 --> 53:02.910
And one of those, and maybe the most popular one, is a so-called
53:02.910 --> 53:05.430
Gaussian or normal distribution.
53:06.130 --> 53:10.390
That's a distribution that is in a certain way very powerful, can be
53:10.390 --> 53:14.870
used in very many circumstances, and has some very nice properties.
53:16.490 --> 53:17.770
So, let's introduce it.
53:18.330 --> 53:20.010
This density... the probability...
53:20.010 --> 53:26.310
this Gaussian distribution is... the density function is given here
53:26.310 --> 53:29.670
for the one-dimensional case of a single random variable.
53:30.690 --> 53:32.350
It's defined like that.
53:32.510 --> 53:38.530
So, actually, oops, it's the exponential function of minus x squared.
53:39.230 --> 53:43.050
And then we have two parameters, mu and sigma, with which we can
53:43.050 --> 53:46.110
control a little bit the shape of this function.
53:46.490 --> 53:50.490
For the basic choice of mu being equal to zero and sigma being equal
53:50.490 --> 53:54.910
to one, the plot of this function, the graph, is given here.
53:55.430 --> 54:00.870
So, symmetric around zero, taking its maximum at zero, being all
54:00.870 --> 54:07.570
positive, and asymptotically tending to zero for very large and very
54:07.570 --> 54:09.430
small values x.
54:11.470 --> 54:14.110
Differentiable, etc.
54:14.590 --> 54:18.950
So, with the parameter mu, we can shift this curve a little bit to the
54:18.950 --> 54:19.990
left or the right.
54:20.150 --> 54:26.970
So, mu is actually the parameter that defines the center of symmetry
54:26.970 --> 54:28.590
of this shape.
54:28.770 --> 54:33.530
So, that means, for the basic choice, mu equal to zero, the center of
54:33.530 --> 54:35.670
symmetry is at the position zero.
54:36.090 --> 54:41.850
If we select mu equal to two, then the whole curve is shifted by two
54:41.850 --> 54:42.490
to the right.
54:42.870 --> 54:46.790
If we choose mu to be minus five, then the whole curve is shifted by
54:46.790 --> 54:47.930
minus five to the left.
54:48.570 --> 54:52.930
And the role of sigma is to control, actually, the width of this bell
54:52.930 --> 54:53.570
-shaped curve.
54:53.890 --> 54:57.670
The larger sigma is, the wider this bell-shaped curve becomes.
54:58.370 --> 55:04.330
The smaller sigma is, the smaller and more narrow this bell-shaped
55:04.330 --> 55:10.190
curve becomes, and the higher the peak in the center becomes.
55:11.090 --> 55:17.490
The larger sigma is, the smaller this peak is, the larger, the smaller
55:17.490 --> 55:21.130
sigma is, the larger this maximum becomes.
55:22.570 --> 55:27.290
We can extend this Gaussian distribution also to the case of several
55:27.290 --> 55:28.230
random variables.
55:28.330 --> 55:31.430
When we have several random variables and we want to describe the
55:31.430 --> 55:35.990
joint density function for several random variables, we can extend
55:35.990 --> 55:36.290
that.
55:36.590 --> 55:41.970
Let's assume that we write those random variables into this vector x
55:41.970 --> 55:48.090
here, and then we can define a probability density function like this
55:48.090 --> 55:48.630
one here.
55:49.010 --> 55:52.330
Yeah, we see that it's actually for the one-dimensional case, so if we
55:52.330 --> 55:57.450
assume that x is a one-dimensional vector, this becomes the same as
55:57.450 --> 55:58.810
this equation here.
56:00.690 --> 56:02.890
Yeah, looks like that.
56:03.230 --> 56:07.150
Again, we have this parameter mu, which controls the center of
56:07.150 --> 56:08.410
symmetry, where it is.
56:08.550 --> 56:12.010
So in the basic case, it's the zero vector, then the center of
56:12.010 --> 56:18.770
symmetry is at zero, but we can shift it to any place that we like.
56:20.850 --> 56:25.530
And yeah, here this graph shows a plot of this function in the two
56:25.530 --> 56:31.750
-dimensional case, so with the center being at zero and this matrix
56:31.750 --> 56:34.070
sigma being the identity matrix.
56:34.790 --> 56:37.710
So what is the role of this matrix sigma?
56:38.570 --> 56:45.110
So because it refers to the small sigma here somehow, it controls how
56:45.110 --> 56:53.030
wide this curve is, whether it's very peaky or very wide, and it
56:53.030 --> 56:53.830
controls...
56:54.370 --> 57:00.450
and this peakiness or this shape can change in different directions.
57:00.850 --> 57:09.750
So this density function might be very peaky in some directions, and
57:09.750 --> 57:13.010
it might be very wide in other directions at the same time.
57:13.250 --> 57:18.510
So we can control a little bit this kind of wideness of this function
57:18.510 --> 57:23.770
in different direction in this space here, in the space.
57:24.270 --> 57:28.530
And this is given with this so-called covariance matrix sigma.
57:28.890 --> 57:33.470
This covariance matrix sigma has to be a matrix that is symmetric and
57:33.470 --> 57:38.270
that is positive-definite, that means all eigenvalues must be larger
57:38.270 --> 57:43.970
than zero, and all those matrices for which these properties hold can
57:43.970 --> 57:47.230
be used as covariance matrices.
57:47.830 --> 57:53.070
And the entries in the diagonal of this matrix, they control the
57:53.070 --> 57:58.930
wideness, so to say, in the different directions of the coordinate
57:58.930 --> 58:07.450
system, and the non-diagonal elements somehow turn this curve a little
58:07.450 --> 58:07.830
bit.
58:11.950 --> 58:17.530
Okay, so, and of course, Gaussian distributions will be the ones that
58:17.530 --> 58:21.270
we need throughout the lecture, and with which we will calculate.
58:22.190 --> 58:29.010
So finally, let's summarize our basic repetition of probability theory
58:29.010 --> 58:30.850
with a little bit of notation stuff.
58:31.670 --> 58:36.970
So if we want to be fully correct in our notation, and we are faced
58:36.970 --> 58:42.170
with a random variable, say a discrete random variable x, and we want
58:42.170 --> 58:47.110
to express the event that x, this random variable x, takes on a
58:47.110 --> 58:52.450
certain value, which is given here with small x, then we would need to
58:52.450 --> 58:53.570
write it like that.
58:54.010 --> 58:58.410
P, capital P, so the probability of the event that the random variable
58:58.410 --> 59:03.310
capital X takes on the value small x.
59:03.550 --> 59:06.910
That's the full correct, mathematically correct notation.
59:07.650 --> 59:11.290
And for a probability density function, so if capital X is a
59:11.290 --> 59:15.730
continuous random variable, then we would say this is the probability
59:15.730 --> 59:21.750
density function of the variable capital X, and we evaluate it at the
59:21.750 --> 59:23.750
position which is given by small x.
59:23.970 --> 59:26.270
This is the full correct notation.
59:27.390 --> 59:33.210
However, if we use this notation, we write formulas that are very,
59:33.290 --> 59:38.470
very long, and therefore many mathematicians prefer to have a short
59:38.470 --> 59:42.350
kind of writing, a shorter notation that looks like that.
59:43.070 --> 59:44.230
Sorry, it looks like that.
59:45.750 --> 59:53.350
P of capital X, and small p of capital X, instead of this and that
59:53.350 --> 59:53.990
notation.
59:55.750 --> 59:58.350
This might be confusing sometimes.
59:58.930 --> 01:00:01.090
Yeah, this might sometimes be really confusing.
01:00:01.490 --> 01:00:06.110
If you are confused by that, rewrite everything in this basic
01:00:06.110 --> 01:00:06.610
notation.
01:00:07.990 --> 01:00:13.610
Yeah, however, you will easily see that the terms that we get become
01:00:13.610 --> 01:00:19.050
very large, and therefore, I will also use the simplified notation if
01:00:19.050 --> 01:00:20.250
it's not too confusing.
01:00:20.470 --> 01:00:25.630
But if you face such a term, and you think, I don't understand it, I'm
01:00:25.630 --> 01:00:30.110
too much confused, first rewrite it to the long form to understand it.
01:00:31.310 --> 01:00:34.230
And another thing is this kind of notation.
01:00:34.670 --> 01:00:40.930
You might write capital X tilde, and then we get this n script, n of
01:00:40.930 --> 01:00:41.990
mu and sigma.
01:00:42.290 --> 01:00:46.950
This is a notation to write that we assume that this random variable
01:00:46.950 --> 01:00:51.110
capital X is distributed according to a Gaussian distribution.
01:00:51.230 --> 01:00:54.890
That means, we say that with this notation, we state that the
01:00:54.890 --> 01:01:00.330
probability density function of this random variable X is a Gaussian
01:01:00.330 --> 01:01:05.170
probability density function with these parameters mu and sigma, which
01:01:05.170 --> 01:01:05.810
are given here.
01:01:06.890 --> 01:01:08.870
Yeah, so this is the notation.
01:01:09.090 --> 01:01:14.190
So this says, we assume that the random variable X is distributed
01:01:14.190 --> 01:01:16.450
according to a Gaussian distribution.
01:01:16.450 --> 01:01:21.110
That means, that we have to use this Gaussian probability density
01:01:21.110 --> 01:01:28.190
function to describe the probabilities of this random variable.
01:01:29.310 --> 01:01:35.530
Okay, so far, the repetition of the very brief and repetition of the
01:01:35.530 --> 01:01:37.230
basics of probability theory.
01:01:37.770 --> 01:01:43.250
Now, let's use that to develop a model, which is very useful to do
01:01:43.250 --> 01:01:50.610
this kind of tracking of objects, estimating motion of objects over
01:01:50.610 --> 01:01:54.710
time, integrating measurements in an incremental way.
01:01:55.910 --> 01:02:03.690
So the main idea that we need for this is that we want to understand
01:02:03.690 --> 01:02:07.630
the world or the part of the world in which we want to model as a
01:02:07.630 --> 01:02:08.150
system.
01:02:09.210 --> 01:02:11.510
And the system is assumed to have a state.
01:02:11.630 --> 01:02:16.950
And we assume that this state contains all information that describe
01:02:16.950 --> 01:02:18.950
in which the system behaves.
01:02:19.670 --> 01:02:26.210
Yeah, so just knowing the state of a system should be sufficient to
01:02:26.210 --> 01:02:31.950
understand or to be able to describe how the system behaves.
01:02:32.990 --> 01:02:39.630
Yeah, so... and this means that we can describe the changes of the
01:02:39.630 --> 01:02:45.230
system over time by means of describing the changes of its state over
01:02:45.230 --> 01:02:45.670
time.
01:02:45.810 --> 01:02:48.570
So if we know the state of the system, we know everything.
01:02:49.910 --> 01:02:53.030
Yeah, so that's the basic thing, idea behind it.
01:02:53.090 --> 01:02:58.510
So let's go back to our standard example, a car that is moving with
01:02:58.510 --> 01:02:59.430
constant velocity.
01:03:00.050 --> 01:03:03.090
If you want to describe this car that is moving with constant
01:03:03.090 --> 01:03:07.650
velocity, it is fully sufficient to know its position and its velocity
01:03:07.650 --> 01:03:10.030
at a certain point in time.
01:03:10.030 --> 01:03:15.490
If you know its position and the velocity, then these two pieces of
01:03:15.490 --> 01:03:19.510
information are completely sufficient to describe the behavior of this
01:03:19.510 --> 01:03:21.210
car, to make predictions.
01:03:21.390 --> 01:03:23.130
Where will it be in 10 seconds?
01:03:23.270 --> 01:03:27.190
If I know where it is at the moment, and if I know how fast it is, I
01:03:27.190 --> 01:03:29.430
can easily predict where it will be.
01:03:29.930 --> 01:03:36.950
Maybe up to some randomness that still might occur, but in general,
01:03:37.130 --> 01:03:38.450
I'm able to do this prediction.
01:03:38.610 --> 01:03:43.490
I'm also able to describe where has this vehicle been 10 seconds ago,
01:03:44.030 --> 01:03:44.810
something like that.
01:03:45.070 --> 01:03:49.150
So the behavior, at least the relevant behavior of this vehicle, is
01:03:49.150 --> 01:03:51.570
fully described by these state variables.
01:03:52.790 --> 01:03:56.890
If the vehicle is moving with constant velocity, if the vehicle is not
01:03:56.890 --> 01:04:01.030
moving with constant velocity, if this assumption is not fulfilled,
01:04:01.470 --> 01:04:04.510
then of course knowing just the position and the velocity is
01:04:04.510 --> 01:04:08.050
definitely not sufficient to describe the behavior of the vehicle.
01:04:08.810 --> 01:04:15.030
Then we are not able to predict where it will be in 10 seconds, if we
01:04:15.030 --> 01:04:18.490
don't know in which way it is accelerating.
01:04:19.750 --> 01:04:24.670
So that means then maybe we would need in such a case to add the
01:04:24.670 --> 01:04:29.730
acceleration of the vehicle as an additional variable to this state
01:04:29.730 --> 01:04:31.390
vector, to the state information.
01:04:32.050 --> 01:04:36.790
And if the car is not moving on a straight road, but maybe on a curved
01:04:36.790 --> 01:04:41.790
road, maybe we also need to know the steering angle of the vehicle or
01:04:41.790 --> 01:04:45.870
the draw rate of the vehicle in order to describe in which way it is
01:04:45.870 --> 01:04:46.190
turning.
01:04:46.950 --> 01:04:50.570
So it depends on the system that we observe, on the assumptions that
01:04:50.570 --> 01:04:56.270
we can make in which way it behaves, which pieces of information must
01:04:56.270 --> 01:04:59.230
be added to this state information.
01:05:00.950 --> 01:05:04.110
But we assume that it is possible to define it.
01:05:04.410 --> 01:05:08.830
We assume that it is possible to select a certain number of pieces of
01:05:08.830 --> 01:05:14.470
information, a finite number of pieces of information, to add it to a
01:05:14.470 --> 01:05:19.070
state vector or to assemble of pieces of information.
01:05:19.790 --> 01:05:23.310
And then with this knowledge, we can fully describe the behavior of
01:05:23.310 --> 01:05:23.750
the system.
01:05:26.170 --> 01:05:31.070
Okay, so based on that, then we can maybe create a transition model
01:05:31.070 --> 01:05:34.970
and say, okay, based on these pieces of information, we can predict
01:05:34.970 --> 01:05:40.530
where will the vehicle be in a certain number of seconds.
01:05:40.910 --> 01:05:42.150
How will it behave?
01:05:42.250 --> 01:05:43.430
We go back to that later.
01:05:46.570 --> 01:05:53.330
Okay, we also assume that we can observe the system.
01:05:53.710 --> 01:05:57.150
For instance, the car that is moving, we observe it with a sensor.
01:05:57.710 --> 01:06:02.310
So we also need to describe in a certain abstract way how this
01:06:02.310 --> 01:06:04.090
measurement works.
01:06:05.010 --> 01:06:10.390
So in this case, we assume that we can make a measurement, we can
01:06:10.390 --> 01:06:16.010
measure something, and this measurement depends on the state of the
01:06:16.010 --> 01:06:16.390
system.
01:06:17.990 --> 01:06:24.410
And we assume that there is a function that explains in which
01:06:24.410 --> 01:06:31.310
measurement we can achieve if the system is in a certain state.
01:06:31.570 --> 01:06:36.690
So we assume that the measurement, which is now written here
01:06:36.690 --> 01:06:42.630
throughout the slides with the letter Z, that it is created by mapping
01:06:42.630 --> 01:06:47.530
the state of the system at the present point in time, that is now
01:06:47.530 --> 01:06:51.790
denoted with S for state, onto the measurement.
01:06:52.610 --> 01:06:55.190
And maybe we have some measurement noise.
01:06:56.630 --> 01:06:59.050
That's the second part here that is written.
01:06:59.210 --> 01:07:03.550
That means there's just some random influence that is affecting the
01:07:03.550 --> 01:07:03.910
measurement.
01:07:04.910 --> 01:07:10.690
So we assume there is, first of all, there is a clear, well-defined
01:07:10.690 --> 01:07:16.270
relationship that explains how the measurement is related to the
01:07:16.270 --> 01:07:18.830
present state of the object of the system.
01:07:19.870 --> 01:07:24.210
And there is some additional noise, some additional random influence
01:07:26.330 --> 01:07:31.370
that occurs, that is somehow disturbing the measurement.
01:07:31.390 --> 01:07:37.990
So that we don't get the real measurement value set, but that we get a
01:07:37.990 --> 01:07:43.550
randomly disturbed measurement.
01:07:44.250 --> 01:07:48.610
So for a car that we observe with the camera, we might observe, for
01:07:48.610 --> 01:07:52.810
instance, the position of the car.
01:07:53.390 --> 01:07:58.190
And then we say, okay, Z of t, the measurement that we get is equal to
01:07:58.190 --> 01:08:03.730
X of t, one of the state variables, plus some measurement noise, which
01:08:03.730 --> 01:08:07.970
we could in this case even rewrite into this matrix form that it is
01:08:07.970 --> 01:08:12.910
something like a row vector 1, 0 times this column vector that
01:08:12.910 --> 01:08:15.350
contains all the state variables plus some noise.
01:08:16.890 --> 01:08:22.510
Okay, important for us is what we can observe, of course, is Z of t.
01:08:25.170 --> 01:08:29.890
Z of t is what we can observe, that's what we can measure, that is
01:08:29.890 --> 01:08:30.470
what we get.
01:08:31.170 --> 01:08:36.450
And of course what we want to determine is the state of the system,
01:08:36.630 --> 01:08:39.330
which contains, for instance, the velocity of the car.
01:08:39.730 --> 01:08:43.210
This is what we aim to estimate, what we want to get.
01:08:44.130 --> 01:08:48.590
We don't assume that we can observe that directly, we don't assume
01:08:48.590 --> 01:08:50.750
that we can observe all the state variables.
01:08:54.500 --> 01:08:58.180
Maybe that's hidden, so maybe we don't have a sensor that can sense
01:08:58.180 --> 01:08:59.500
the velocity of the vehicle.
01:09:01.280 --> 01:09:05.160
But of course, based on our measurements, we want to draw conclusions
01:09:05.160 --> 01:09:11.060
about the state of the system and estimate the current state of the
01:09:11.060 --> 01:09:11.380
system.
01:09:11.380 --> 01:09:16.080
So for doing that, let's introduce some stochastic model, a so-called
01:09:16.080 --> 01:09:17.140
hidden Markov model.
01:09:17.260 --> 01:09:18.360
What is a hidden Markov model?
01:09:19.580 --> 01:09:23.260
A hidden Markov model is a time discrete stochastic state transition
01:09:23.260 --> 01:09:23.760
system.
01:09:24.260 --> 01:09:27.680
Its observation and its successor state depend entirely on its present
01:09:27.680 --> 01:09:31.200
state and do not depend on previous states or observations.
01:09:32.120 --> 01:09:34.460
Okay, let's go through the definition step by step.
01:09:34.760 --> 01:09:37.260
So it's a state transitioning system.
01:09:37.420 --> 01:09:38.120
What does it mean?
01:09:38.560 --> 01:09:43.520
We assume the system has a certain state at a certain point in time
01:09:43.520 --> 01:09:48.180
and then it makes a transition to another state for the next point in
01:09:48.180 --> 01:09:48.420
time.
01:09:49.220 --> 01:09:53.760
The definition says also that it's a time discrete state transition
01:09:53.760 --> 01:09:54.220
system.
01:09:54.360 --> 01:09:59.000
That means we are not considering time as a continuum, but we are
01:09:59.000 --> 01:10:05.640
considering time only up to a certain discrete sequence of points in
01:10:05.640 --> 01:10:05.900
time.
01:10:06.460 --> 01:10:10.520
So we assume, for instance, that we consider the state of the system
01:10:10.520 --> 01:10:17.260
only for points in time which are spaced by, say, one second.
01:10:17.380 --> 01:10:21.560
We're interested in the state of the system now, one second later, two
01:10:21.560 --> 01:10:24.260
seconds later, three seconds later, and so on.
01:10:24.380 --> 01:10:29.660
But we are not considering what happens between second zero and second
01:10:29.660 --> 01:10:29.940
one.
01:10:30.160 --> 01:10:34.780
We only consider this integer positions in time.
01:10:35.520 --> 01:10:36.920
So that is time discrete.
01:10:37.800 --> 01:10:41.180
Then stochastic means, well, there's some randomness in.
01:10:41.740 --> 01:10:46.580
So the state transition, the transition from one state to another
01:10:46.580 --> 01:10:48.500
state, is not fully deterministic.
01:10:49.100 --> 01:10:53.340
It's not fully explained by a transition function, but there is also
01:10:53.340 --> 01:10:56.780
some randomness that is affecting this state transition.
01:10:56.900 --> 01:10:59.820
That means if we know the state at the present point in time, we
01:10:59.820 --> 01:11:04.080
cannot completely determine what will be the successor state, but we
01:11:04.080 --> 01:11:08.020
can only determine that up to a probability distribution, because
01:11:08.020 --> 01:11:11.340
there is some randomness that is affecting this state transition.
01:11:13.140 --> 01:11:18.040
Okay, its observation and its successor state depend entirely on its
01:11:18.040 --> 01:11:21.460
present state and do not depend on previous states or observations.
01:11:22.310 --> 01:11:23.700
So what does it say?
01:11:23.940 --> 01:11:29.860
Okay, so its successor state depends entirely on its present state and
01:11:29.860 --> 01:11:33.760
not on its previous state observations.
01:11:35.200 --> 01:11:41.820
This means once we know the present state, we know everything that we
01:11:41.820 --> 01:11:48.540
need to predict what's going on in future, to explain the behavior of
01:11:48.540 --> 01:11:50.340
the system in future.
01:11:51.500 --> 01:11:53.780
We can forget about the past.
01:11:54.060 --> 01:12:00.000
We do not need to know how, in which way we entered a certain state,
01:12:00.100 --> 01:12:03.440
the system entered a certain state, but knowing the state is
01:12:03.440 --> 01:12:06.200
sufficient to describe its behavior.
01:12:07.860 --> 01:12:15.740
So think of, you want to drive to a certain place, yeah, you're
01:12:15.740 --> 01:12:20.320
driving on the road, you arrive at a certain place,
01:12:24.140 --> 01:12:28.280
not yet at your final destination, and you ask yourself, how do I get
01:12:28.280 --> 01:12:31.000
from this place to my final destination?
01:12:31.860 --> 01:12:38.780
Then it's definitely, it doesn't matter at all how you manage to come
01:12:38.780 --> 01:12:43.580
to this place, it only depends on where you are to plan your future
01:12:43.580 --> 01:12:46.020
path to your goal.
01:12:47.300 --> 01:12:51.040
And it doesn't matter whether you went there in a straight manner or
01:12:51.040 --> 01:12:55.860
not straight manner, that's completely irrelevant for describing how
01:12:55.860 --> 01:12:59.320
you can arrive at your final destination.
01:13:00.240 --> 01:13:04.320
So that is meant with, it depends entirely on its present state and
01:13:04.320 --> 01:13:05.440
not on the past.
01:13:06.200 --> 01:13:12.720
And also the observation that is related to the state only depends on
01:13:12.720 --> 01:13:15.380
the present state and not on the past.
01:13:15.960 --> 01:13:21.340
That means when we measure the position of a car with a camera, we
01:13:21.340 --> 01:13:26.080
assume that this measurement is completely independent of the fact
01:13:26.080 --> 01:13:28.020
from where the car came.
01:13:28.820 --> 01:13:36.060
Yeah, or whether it was accelerating or decelerating in past, that's
01:13:36.060 --> 01:13:38.100
completely irrelevant for this measurement.
01:13:40.600 --> 01:13:44.440
Okay, so that is an assumption that we have to make, and if these
01:13:44.440 --> 01:13:48.520
assumptions hold, then we are faced with something that is called a
01:13:48.520 --> 01:13:53.980
hidden Markov model in probability theory.
01:13:54.780 --> 01:13:59.920
Um, this definition is summarized in these two equations here.
01:14:00.500 --> 01:14:03.860
Now, these two equations actually describe this independence
01:14:03.860 --> 01:14:04.540
assumption.
01:14:04.540 --> 01:14:09.500
They state that, well, the probability of a certain successful state,
01:14:10.280 --> 01:14:13.620
knowing the present state, knowing the present observation, knowing
01:14:13.620 --> 01:14:17.360
the previous state, the previous observation, and so on, up to the
01:14:17.360 --> 01:14:25.100
very first state, can be simplified and is equal to just the
01:14:25.100 --> 01:14:28.760
probability of the successor state given the present state.
01:14:29.280 --> 01:14:35.160
That means if we want to predict what's going on in the future, if we
01:14:35.160 --> 01:14:39.940
want to say, okay, what, in which way will the, will the system behave
01:14:39.940 --> 01:14:44.820
in future, only we, what we need to know is the present state.
01:14:45.420 --> 01:14:50.200
And we can forget all the past states, we can forget all the past
01:14:50.200 --> 01:14:54.500
observations, they don't have any influence on the future behavior.
01:14:56.120 --> 01:15:00.180
As well for the observation, now the probability to make a certain
01:15:00.180 --> 01:15:05.520
observation at a certain, given, knowing the present state of the
01:15:05.520 --> 01:15:08.780
system, is completely sufficient.
01:15:08.940 --> 01:15:11.580
So, knowing the present state is completely sufficient.
01:15:11.780 --> 01:15:17.120
If we know previous observations and previous state, this doesn't help
01:15:17.120 --> 01:15:24.000
us to explain better the present observation as long as we know the
01:15:24.000 --> 01:15:24.600
present state.
01:15:25.060 --> 01:15:29.200
So, again, we can forget about all the past because the past doesn't
01:15:29.200 --> 01:15:34.420
have influence on the future, if we know the present state of the
01:15:34.420 --> 01:15:34.700
system.
01:15:34.700 --> 01:15:38.420
So, the present state contains all the relevant information that is
01:15:38.420 --> 01:15:42.720
necessary to describe the behavior of the system and to describe the
01:15:42.720 --> 01:15:43.660
measurement process.
01:15:46.270 --> 01:15:52.130
So, now, if we assume that we can model a system as a hidden Markov
01:15:52.130 --> 01:15:57.290
model, then what we want to get, or what we usually aim for, is that
01:15:57.290 --> 01:16:01.730
we want to get an idea of what is the present state of the system.
01:16:01.850 --> 01:16:06.310
So, we assume we are observing the system several points in time.
01:16:06.470 --> 01:16:12.070
We make several measurements, from point in time 1 to point in time t,
01:16:12.170 --> 01:16:12.850
in this case.
01:16:12.970 --> 01:16:17.470
So, having observed the whole sequence of observations, set 1 to set
01:16:17.470 --> 01:16:22.130
t, we want to know, well, what is the present state of the system.
01:16:22.270 --> 01:16:26.930
So, we want to calculate a probability or a probability distribution
01:16:26.930 --> 01:16:29.650
of the present state.
01:16:29.650 --> 01:16:34.270
So, we want to know, well, we made some observations of the position
01:16:34.270 --> 01:16:38.850
of the car, what is its present velocity, what is its present state,
01:16:38.970 --> 01:16:39.310
for instance.
01:16:40.350 --> 01:16:42.890
A variant of this question is given here.
01:16:43.670 --> 01:16:48.550
Namely, we made the certain sequence of observations, and we are not
01:16:48.550 --> 01:16:52.830
interested in what is the present state of the system, but we want to
01:16:52.830 --> 01:16:57.690
know what is the future state of the system, for the next point in
01:16:57.690 --> 01:16:57.970
time.
01:16:58.730 --> 01:17:02.970
If we observed up to now what was happening up to now, we want to
01:17:02.970 --> 01:17:06.630
know, well, what do we have to expect in future from the system?
01:17:06.790 --> 01:17:07.730
How will it behave?
01:17:08.350 --> 01:17:09.710
What will be the next state?
01:17:12.130 --> 01:17:14.530
Okay, so, let's derive that.
01:17:15.250 --> 01:17:19.790
Okay, for that purpose, I present to you these formulas, and we will
01:17:19.790 --> 01:17:21.810
go through all these formulas in detail.
01:17:22.510 --> 01:17:24.270
Let's start, where do we start?
01:17:24.550 --> 01:17:24.830
Here.
01:17:29.220 --> 01:17:32.320
Hey, no, let's derive them step by step.
01:17:32.640 --> 01:17:34.380
Okay, let's start here.
01:17:34.540 --> 01:17:38.220
So, the probability of St given the sequence of observations up to
01:17:38.220 --> 01:17:40.480
that point in time, what is it?
01:17:40.660 --> 01:17:48.020
We apply first the Bayesian rule, Bayes rule, and we exchange the role
01:17:48.020 --> 01:17:49.680
of St and Zt.
01:17:50.700 --> 01:17:52.560
Yeah, this is done here.
01:17:52.700 --> 01:18:01.360
So, just St and Zt are exchanged, yes, and Z1 up to Zt minus 1 is
01:18:01.360 --> 01:18:02.980
preserved in the condition part.
01:18:03.660 --> 01:18:08.180
With Bayes rules, this means this is equal to this equation here, to
01:18:08.180 --> 01:18:08.960
this term here.
01:18:10.120 --> 01:18:13.700
Yeah, okay, and now we simplify things.
01:18:14.640 --> 01:18:18.480
We look at the denominator, and we see that the denominator only
01:18:18.480 --> 01:18:20.040
contains observations.
01:18:20.920 --> 01:18:25.220
The observations from point in time 1 up to the present point in time
01:18:25.220 --> 01:18:25.460
t.
01:18:26.440 --> 01:18:31.140
We assume that we know those, we have observed them, so they don't
01:18:31.140 --> 01:18:33.600
change, they don't vary, they are fixed.
01:18:34.140 --> 01:18:37.680
And that means that this denominator is just a constant.
01:18:38.240 --> 01:18:41.500
It's independent of St. It's just a constant.
01:18:42.160 --> 01:18:47.060
And that means we can say this term here is proportional to its
01:18:47.060 --> 01:18:47.540
numerator.
01:18:48.520 --> 01:18:52.820
Furthermore, we assume that the system is a hidden Markov model.
01:18:53.420 --> 01:18:58.460
And since it is a hidden Markov model, we know that if we know the
01:18:58.460 --> 01:19:04.320
present state, we know everything that is necessary to describe the
01:19:04.320 --> 01:19:07.080
probability distribution over the observations.
01:19:07.620 --> 01:19:12.060
And all the past observations do not help us to make a better
01:19:12.060 --> 01:19:17.820
prediction of the observation or to make a better, to know more about
01:19:17.820 --> 01:19:18.980
the present observation.
01:19:19.360 --> 01:19:22.820
That is actually this independent assumption that comes from the
01:19:22.820 --> 01:19:23.700
hidden Markov model.
01:19:23.880 --> 01:19:29.440
That means we can simplify this term and leave away those old
01:19:29.440 --> 01:19:30.120
observations.
01:19:30.560 --> 01:19:31.520
They are not helpful.
01:19:31.840 --> 01:19:34.840
We don't need them because we assume that we are in a hidden Markov
01:19:34.840 --> 01:19:35.120
model.
01:19:35.520 --> 01:19:39.140
So this means this simplifies and becomes this term here.
01:19:40.000 --> 01:19:42.940
The second factor here is preserved.
01:19:43.300 --> 01:19:44.120
It's just the same.
01:19:46.020 --> 01:19:47.860
So that's the first equation.
01:19:48.200 --> 01:19:50.080
The second equation starts here.
01:19:50.180 --> 01:19:55.340
So that's about this future state that we, about which we want to say
01:19:55.340 --> 01:20:00.080
something once we are given the sequence of observations up to point
01:20:00.080 --> 01:20:00.700
in time t.
01:20:01.300 --> 01:20:05.360
For that purpose, we use the marginalization rule that we've
01:20:05.360 --> 01:20:05.820
introduced.
01:20:06.300 --> 01:20:11.240
And we add another additional random variable to this probability,
01:20:12.120 --> 01:20:16.260
namely the random variable St, which we need as a kind of linking
01:20:16.260 --> 01:20:21.700
element to later on simplify these equations.
01:20:21.880 --> 01:20:25.180
So we add this additional random variable St here.
01:20:25.300 --> 01:20:30.800
And we know if we add it at that point due to the marginalization
01:20:30.800 --> 01:20:34.980
rule, we have to sum up over all possible values that this random
01:20:34.980 --> 01:20:36.760
variable St might take.
01:20:38.140 --> 01:20:41.860
Okay, so that's actually the marginalization rule.
01:20:42.420 --> 01:20:47.240
Now we use the rule or the definition of conditional probabilities and
01:20:47.240 --> 01:20:48.260
reformulate that.
01:20:48.400 --> 01:20:53.000
So actually what we have here is a joint probability that is
01:20:53.000 --> 01:20:56.200
conditioned on some other variables.
01:20:56.480 --> 01:21:01.020
So now we use the definition of conditional distributions to change
01:21:01.020 --> 01:21:04.340
this joint probability into a conditional probability.
01:21:05.460 --> 01:21:09.880
By pushing, so to say, this St variable into the condition part.
01:21:10.520 --> 01:21:14.260
Now then we get this condition probability and we need the correction
01:21:14.260 --> 01:21:19.280
factor, namely the probability of S of t, this marginal, so to say, of
01:21:19.280 --> 01:21:19.980
S and t.
01:21:20.620 --> 01:21:25.240
However, since we kept all these old observations here in the
01:21:25.240 --> 01:21:27.220
condition part, we also have to have it here.
01:21:29.260 --> 01:21:35.160
So now we again use the knowledge that we are facing a hidden Markov
01:21:35.160 --> 01:21:35.560
model.
01:21:36.440 --> 01:21:40.460
And in a hidden Markov model, the state transition from a known state
01:21:40.460 --> 01:21:47.980
St to a future state St plus one is independent, as we defined, it is
01:21:47.980 --> 01:21:50.600
independent from all the observations.
01:21:51.080 --> 01:21:57.320
That means we can leave those observations away and simplify this to
01:21:57.320 --> 01:22:00.320
this simpler probability.
01:22:01.840 --> 01:22:07.380
Yeah, so no magic inside of these calculations, just applying the
01:22:07.380 --> 01:22:12.920
basic calculation rules for probabilities and using the fact that we
01:22:12.920 --> 01:22:15.940
are facing a hidden Markov model, that we are assuming that we are
01:22:15.940 --> 01:22:20.600
facing a hidden Markov model, in which the present observation only
01:22:20.600 --> 01:22:24.000
depends on the present state and not on past states and observations,
01:22:24.300 --> 01:22:28.160
and in which the present, the transition from the present state to the
01:22:28.160 --> 01:22:33.100
future state only depends on the present state and not on past states
01:22:33.100 --> 01:22:33.800
and observations.
01:22:35.260 --> 01:22:37.660
Okay, now we can have a look at what we see.
01:22:37.840 --> 01:22:42.140
So this is actually just the state transition probability that
01:22:42.140 --> 01:22:45.060
describes in which way the states are changing over time.
01:22:46.380 --> 01:22:51.460
This term here that we need here is actually something that we get as
01:22:51.460 --> 01:22:54.060
a result from this calculation here.
01:22:54.260 --> 01:22:57.940
So that's actually the same term that we can see here.
01:23:00.680 --> 01:23:02.040
Furthermore, what
01:23:16.750 --> 01:23:25.170
we see is that this term here is just the simple probability of an
01:23:25.170 --> 01:23:25.670
observation.
01:23:25.670 --> 01:23:28.790
It just says how likely is a certain observation if we are in a
01:23:28.790 --> 01:23:29.330
certain state.
01:23:30.930 --> 01:23:36.430
And we can also see that this probability is something that is
01:23:36.430 --> 01:23:42.210
calculated here, but for the previous point in time.
01:23:42.350 --> 01:23:47.090
So we see this has the same structure as this term.
01:23:47.510 --> 01:23:52.270
And the structure is the same, the probability of the state at a
01:23:52.270 --> 01:23:56.850
certain point in time given all the observations up to that point in
01:23:56.850 --> 01:23:57.110
time.
01:23:57.730 --> 01:24:03.990
And here it's the same kind of probability, but just for one point in
01:24:03.990 --> 01:24:04.550
time later.
01:24:06.870 --> 01:24:12.470
That means we get some recurrent relationship here.
01:24:13.350 --> 01:24:18.930
If we want to calculate the output, if we want to calculate this term
01:24:18.930 --> 01:24:24.970
here, we need these state transition probabilities and we need the
01:24:24.970 --> 01:24:30.350
result of the calculation of the upper equation here.
01:24:30.470 --> 01:24:34.490
So we need, so to say, if we interpret that as a kind of calculation
01:24:34.490 --> 01:24:42.230
where we get as a result the left-hand side, we need that result as
01:24:42.230 --> 01:24:45.030
input to the equation here.
01:24:46.150 --> 01:24:50.190
And if we want to evaluate this term, what we need is the probability
01:24:50.190 --> 01:24:55.790
of a certain observation and the result of a calculation which we did
01:24:55.790 --> 01:25:02.810
one step before by evaluating the equation at the bottom of the slide.
01:25:03.890 --> 01:25:09.730
So we get a kind of recurrent or recursive, better to say, recursive
01:25:09.730 --> 01:25:11.130
calculation over time.
01:25:11.830 --> 01:25:17.050
And this interpreted as an algorithm looks like that.
01:25:17.810 --> 01:25:20.090
So we have two steps.
01:25:20.550 --> 01:25:25.130
One is called the prediction step and this prediction step actually
01:25:25.130 --> 01:25:31.580
implements this lower equation here.
01:25:31.820 --> 01:25:37.280
So it takes as input these two things and it calculates this thing.
01:25:38.840 --> 01:25:38.840
Yeah?
01:25:43.300 --> 01:25:48.260
What does the sum sign,
01:25:51.300 --> 01:25:52.340
this one here?
01:25:54.140 --> 01:25:58.560
Well, we sum over all possible values that st can take.
01:25:58.560 --> 01:25:59.140
Okay,
01:26:05.480 --> 01:26:09.780
this is for the discrete state, for the discrete case.
01:26:10.660 --> 01:26:10.760
Yeah?
01:26:11.200 --> 01:26:14.280
Let's assume at this point that we're dealing with discrete random
01:26:14.280 --> 01:26:14.760
variables.
01:26:16.320 --> 01:26:19.780
The same applies for continuous random variables.
01:26:20.340 --> 01:26:24.940
Then we replace the sum by the integral and the probabilities by the
01:26:24.940 --> 01:26:26.320
probability density functions.
01:26:29.190 --> 01:26:29.250
Okay.
01:26:29.930 --> 01:26:34.370
But let's first assume that we have discrete random variables.
01:26:34.630 --> 01:26:34.770
Yeah?
01:26:34.810 --> 01:26:38.690
So we sum up over all possible values of a discrete random variable
01:26:38.690 --> 01:26:39.030
st.
01:26:40.990 --> 01:26:44.430
Okay, so that's this prediction step.
01:26:44.570 --> 01:26:52.050
It calculates the probability distribution over the future states.
01:26:52.270 --> 01:26:56.510
So we assume that we observed, we made the measurements up to point in
01:26:56.510 --> 01:27:04.110
time t and we want to ask, well, what is the expected state at the
01:27:04.110 --> 01:27:05.050
next point in time?
01:27:05.670 --> 01:27:05.790
Yeah?
01:27:05.830 --> 01:27:08.570
That's a prediction step, therefore called prediction step.
01:27:08.970 --> 01:27:11.810
The other step is called innovation or correction step.
01:27:12.270 --> 01:27:19.930
That's actually the step that is shown here at the upper equation here
01:27:19.930 --> 01:27:21.410
that implements this equation.
01:27:21.730 --> 01:27:26.730
And it's calculating, so to say, well, we observed all measurements up
01:27:26.730 --> 01:27:31.690
to now and now we want to know for this point in time, what is the
01:27:31.690 --> 01:27:34.190
present state of the system?
01:27:35.990 --> 01:27:39.850
And what we do, what is the, so to say, the difference, we know also
01:27:39.850 --> 01:27:40.990
the present observation.
01:27:42.210 --> 01:27:48.750
In this prediction step, we, so to say, the knowledge, the
01:27:48.750 --> 01:27:53.970
measurements that we made do not cover the point in time for which we
01:27:53.970 --> 01:27:57.330
know the state of the system, but they end up one point in time
01:27:57.330 --> 01:27:57.710
before.
01:27:58.270 --> 01:28:03.370
For the innovation step, we assume that we made all the observations
01:28:03.370 --> 01:28:04.810
up to that point in time.
01:28:05.370 --> 01:28:09.130
So that also means that we integrate, so to say, the last observation
01:28:09.130 --> 01:28:10.030
in our reasoning.
01:28:10.930 --> 01:28:16.130
So, and we execute these two steps one after the other.
01:28:17.010 --> 01:28:17.130
Yeah?
01:28:17.350 --> 01:28:22.030
Every time we make a new observation, we make one innovation step,
01:28:22.850 --> 01:28:30.130
then we get these these kind of probabilities, and then we make a
01:28:30.130 --> 01:28:34.930
prediction step to get to guess what will be the future state of the
01:28:34.930 --> 01:28:37.370
system one point in time later.
01:28:37.950 --> 01:28:40.730
Then we wait for a new measurement.
01:28:41.210 --> 01:28:45.950
Once we get the new measurement, we apply this new measurement in the
01:28:45.950 --> 01:28:46.790
innovation step.
01:28:47.210 --> 01:28:49.350
We go from this side to this side.
01:28:50.230 --> 01:28:52.910
Of course, what we also need is some initialization.
01:28:53.370 --> 01:28:58.310
So, yeah, at the beginning we need to start at some point, and well,
01:28:58.390 --> 01:29:02.150
it depends a little bit on the application whether we want to start
01:29:02.150 --> 01:29:05.430
here on the right-hand side or whether we want to start at the left
01:29:05.430 --> 01:29:05.890
-hand side.
01:29:06.330 --> 01:29:12.530
For some applications, it might be that we know really well the
01:29:12.530 --> 01:29:17.710
initial conditions of the system, that we really know very well what
01:29:17.710 --> 01:29:18.750
is the initial state.
01:29:18.930 --> 01:29:21.810
Then we start here on the left-hand side.
01:29:22.310 --> 01:29:33.400
If we don't know very well what is the... yeah, sometimes in other
01:29:33.400 --> 01:29:36.220
cases, we might start here on the right-hand side.
01:29:36.280 --> 01:29:38.300
That's up to the application.
01:29:39.000 --> 01:29:41.960
So, that means the whole process looks like that.
01:29:42.460 --> 01:29:47.040
So, we start with an initial guess, say, with an initial guess of for
01:29:47.040 --> 01:29:50.100
the first state, an initial probability distribution for the first
01:29:50.100 --> 01:29:50.380
state.
01:29:50.520 --> 01:29:53.840
Then we do an innovation step to integrate the first measurement.
01:29:54.120 --> 01:29:58.720
Then we make a prediction step to guess the probability distribution
01:29:58.720 --> 01:30:02.740
for the second point in time, knowing just the first measurement.
01:30:03.080 --> 01:30:06.200
Then we integrate the second measurement in an innovation step, and so
01:30:06.200 --> 01:30:06.900
on, and so on.
01:30:07.360 --> 01:30:10.980
And as I said, we might also start with a prediction step if we like,
01:30:11.440 --> 01:30:15.560
if that helps more than starting with an innovation step.
01:30:16.280 --> 01:30:18.980
Okay, that's it for today.
01:30:19.160 --> 01:30:20.200
Yeah, this is both possible.
01:30:20.340 --> 01:30:21.200
That's it for today.
01:30:21.980 --> 01:30:23.960
Yeah, we continue next week.