WEBVTT
00:05.460 --> 00:08.880
So welcome back to our lecture on automotive vision.
00:09.580 --> 00:16.580
So now let's go to the major topic of our lecture today.
00:16.800 --> 00:18.000
This is road recognition.
00:18.560 --> 00:23.400
So what we discussed last week on Monday was SLAM, so self
00:23.400 --> 00:28.320
-localization together combined with creating maps.
00:28.320 --> 00:33.000
And actually this is what we need for chapter number seven, when we
00:33.000 --> 00:34.380
talk about road recognition.
00:34.940 --> 00:39.360
In this chapter what we want to do is we want to create a map of the
00:39.360 --> 00:43.880
road, maybe a very simple map, but a map of the road while we are
00:43.880 --> 00:44.940
driving on the map.
00:45.440 --> 00:49.880
So that means we must localize ourselves in the map and we must
00:49.880 --> 00:54.220
somehow map the road.
00:54.340 --> 00:55.800
That is what we aim at.
00:55.800 --> 01:02.500
So the presentation slides are actually based on some slides that
01:02.500 --> 01:07.680
Heidi Lohse from Daimler AG created a couple of years ago.
01:07.840 --> 01:14.060
I've adapted them, so some of the slides are motivated by her slides
01:14.060 --> 01:14.680
at that time.
01:15.740 --> 01:20.580
Okay, so yeah, the references if you're interested.
01:20.580 --> 01:26.880
So maybe very simple or old paper, but still expressing the basic
01:26.880 --> 01:31.700
ideas behind what we see in the lecture is the paper of Enkelmann,
01:32.200 --> 01:33.860
video -based driver assistance.
01:34.360 --> 01:39.820
So this of course reflects the state of the art roughly 20 years ago,
01:40.420 --> 01:45.940
a bit old, but maybe to get some basic understanding still useful.
01:46.360 --> 01:51.700
As well, the second paper, the Tharsol paper as well, also gets a
01:51.700 --> 01:56.020
good, provides a good idea of how this approach works that we
01:56.020 --> 01:57.260
introduce here in the lecture.
01:57.880 --> 02:01.900
And of course the work of Heidi Lohse, who was PhD student at Daimler
02:01.900 --> 02:07.320
and also here at KRT can be seen here in this paper from 2009.
02:08.940 --> 02:10.760
So what is our task?
02:10.760 --> 02:16.160
Well, we assume we are somewhere driving along a road like this one.
02:16.820 --> 02:18.280
What do we want to do?
02:18.420 --> 02:23.300
Well, we want to recognize where is the road ahead of us and we want
02:23.300 --> 02:25.720
to derive a geometric model of the road.
02:26.040 --> 02:31.160
That means the first thing is somehow we have to determine where the
02:31.160 --> 02:34.160
boundary of the road and the boundary of the lane is.
02:34.640 --> 02:40.260
Maybe we might detect the markings on the road for that purpose.
02:40.900 --> 02:45.840
Then we want to describe the whole scene in a geometric way by a
02:45.840 --> 02:46.680
geometric model.
02:47.100 --> 02:50.200
So in this case, we might say this is a straight piece of a road.
02:50.300 --> 02:55.460
So we want to determine the direction and width of the road and
02:55.460 --> 02:57.740
express the road geometry like that.
02:58.130 --> 03:03.140
And of course, this should be done not only for a single image, but we
03:03.140 --> 03:05.580
want to track the representation over time.
03:05.640 --> 03:11.200
So we want to update the map again and again when we get new images.
03:11.520 --> 03:15.060
So these three steps are what we will discuss here.
03:15.420 --> 03:20.900
Let's start with the first one, the road, the recognition of the road
03:20.900 --> 03:25.020
boundary, actually, road boundary or lane boundary.
03:25.220 --> 03:26.680
Both is relevant for us.
03:26.680 --> 03:31.160
If we have a narrow road, we might be interested in both boundaries of
03:31.160 --> 03:31.500
the road.
03:32.100 --> 03:36.280
If you're on a wide road, maybe we are mainly interested on our ego
03:36.280 --> 03:41.600
lane and we are not interested about the boundary of the road as such.
03:43.160 --> 03:46.140
OK, so for which purpose can we use that?
03:46.420 --> 03:49.320
Well, there are different applications.
03:49.320 --> 03:54.660
So the simplest one is what you can already buy since many years is
03:54.660 --> 04:00.380
some techniques that while driver assistance systems that can be used
04:00.380 --> 04:09.400
to warn the driver once the driver is in term of leaving its lane,
04:09.780 --> 04:12.360
then the system is warning the driver.
04:12.540 --> 04:14.560
This is called lane departure warning system.
04:14.560 --> 04:20.700
So then whatever, there's a certain sound or there's a vibration on
04:20.700 --> 04:26.160
the steering wheel to make the driver aware that he's in turn of
04:26.160 --> 04:26.880
leaving the road.
04:27.300 --> 04:28.380
That's the first stage.
04:28.760 --> 04:31.200
The next stage would be a lane keeping support.
04:31.320 --> 04:37.020
That means you don't just warn the driver, but you put a certain
04:37.020 --> 04:41.540
moment on the steering wheel to remain in your lane.
04:41.540 --> 04:46.100
Of course, you don't know whether the driver wants to leave the lane
04:46.100 --> 04:48.420
intentionally.
04:48.960 --> 04:53.240
Therefore, you cannot force the car to stay in its lane, but you can
04:53.240 --> 04:58.700
put a small moment on the steering wheel so that the driver is somehow
04:58.700 --> 05:03.260
motivated to keep in the lane, but still intentionally can leave the
05:03.260 --> 05:03.440
lane.
05:03.700 --> 05:08.280
That would be a lane keeping support system also available for
05:08.280 --> 05:12.160
nowadays cars, at least for the more expensive ones.
05:13.280 --> 05:19.620
And the last thing would then be a system that automatically steers
05:19.620 --> 05:22.640
the car and automatically keeps the lane.
05:22.780 --> 05:27.820
So this goes towards autonomous driving or automated driving on a
05:27.820 --> 05:34.940
level, say, of not just an assistance of the driver, but the vehicle
05:34.940 --> 05:39.240
could steer itself, say, on a highway if you want to keep your lane
05:39.240 --> 05:44.400
and just follow this lane, you could start that assistance system and
05:44.400 --> 05:48.800
the vehicle would keep its lane and not deviate from it.
05:48.940 --> 05:53.980
So a full control, the vehicle takes control over the steering.
05:55.680 --> 05:59.960
Okay, so these are the three different levels of assistance that you
05:59.960 --> 06:05.260
could support with such a system and for all of them, of course, it's
06:05.260 --> 06:08.200
relevant to recognize the road.
06:09.020 --> 06:12.920
So let's start with the first step.
06:13.260 --> 06:16.880
So the first step is to recognize the boundary of the road or the
06:16.880 --> 06:17.840
boundary of the lane.
06:18.500 --> 06:20.920
Of course, this depends very much on the situation.
06:21.080 --> 06:25.380
In urban traffic, recognizing the boundary of a lane is different
06:25.380 --> 06:27.020
from, say, highway driving.
06:27.600 --> 06:31.880
In the lecture here, we mainly focus on more the highway scenario or
06:31.880 --> 06:36.720
the rural road scenario, not that much on the urban scenario, because
06:36.720 --> 06:41.240
it's easier and to get the basic ideas, it's easier to use the highway
06:41.240 --> 06:41.640
scenario.
06:42.240 --> 06:45.580
And in a highway scenario, we can use the fact that highways usually
06:45.580 --> 06:46.940
are marked very well.
06:47.260 --> 06:51.720
So we have the white lines or the dashed white lines, which determine
06:51.720 --> 06:53.040
the boundary of the lanes.
06:53.880 --> 06:56.140
This is different in urban settings.
06:56.140 --> 07:00.560
In urban settings, the boundary of the road might be a curbstone, it
07:00.560 --> 07:03.180
might be a line, it might be nothing, actually.
07:04.340 --> 07:12.960
It might be objects that are somewhere next to it, whatever, trees or
07:12.960 --> 07:15.020
something that people put there.
07:15.800 --> 07:16.660
So it's more difficult.
07:16.920 --> 07:21.060
But let's stick to the highway scenario or the rural road scenario.
07:21.420 --> 07:22.740
So a situation like that.
07:22.740 --> 07:28.100
Of course, then the key feature to detect the boundary are the white
07:28.100 --> 07:30.420
lane markings.
07:31.660 --> 07:34.160
So how can we detect them from an image?
07:34.680 --> 07:36.360
So there are several approaches.
07:36.560 --> 07:41.860
This is a traditional approach that works quite well for well-marked
07:41.860 --> 07:49.300
roads and therefore also is in use in driver assistance systems for
07:49.300 --> 07:50.840
lane keeping support.
07:50.840 --> 07:55.840
So the idea is, when we have such a situation, we can clearly see the
07:55.840 --> 07:57.660
marking is bright white.
07:58.360 --> 08:01.440
The other parts of the road are dark.
08:01.880 --> 08:06.260
So there is a gray level difference and we make use of that.
08:07.220 --> 08:10.540
So that's an example image here on top.
08:11.360 --> 08:16.000
Below, we find a gradient image where just each pixel, the gray value
08:16.000 --> 08:19.720
of each pixel is related to the length of the gray value gradient at
08:19.720 --> 08:20.320
that position.
08:20.740 --> 08:25.020
The darker the pixel is in this visualization, the longer the gradient
08:25.020 --> 08:25.380
is.
08:25.780 --> 08:30.380
And we clearly see that at the boundary of the lane markings, we have
08:30.380 --> 08:31.580
very strong gradients.
08:31.920 --> 08:37.600
And this gives us some idea about how we can find these lane markings.
08:37.760 --> 08:41.340
So what we could do is we could go through this image, this gradient
08:41.340 --> 08:43.800
image, row by row.
08:43.800 --> 08:50.080
And within each row, we could have a look at the gradient length along
08:50.080 --> 08:51.040
the row.
08:51.560 --> 08:53.380
This is shown here in the plot below.
08:53.580 --> 09:00.100
So for each pixel position along the red line here, it shows the
09:00.100 --> 09:00.820
gradient length.
09:01.240 --> 09:04.620
And of course, what we can see is that here and here where we have the
09:04.620 --> 09:07.180
lane marking, we get very strong gradients.
09:07.320 --> 09:11.640
And we do not get just one strong gradient, but two strong gradients,
09:11.820 --> 09:16.840
namely at the position where the image changes from road to lane
09:16.840 --> 09:19.340
marking and then back from lane marking to road.
09:19.420 --> 09:26.040
So we get two strong gradients here at the boundary of the lane
09:26.040 --> 09:26.400
marking.
09:27.580 --> 09:31.060
Of course, there might also be other positions like this one where we
09:31.060 --> 09:32.040
have strong gradients.
09:33.640 --> 09:37.960
But it's apparent that the gradient length somehow is related to the
09:37.960 --> 09:39.380
existence of a lane marking.
09:40.440 --> 09:47.080
So how could we then detect those positions where we are faced with
09:47.080 --> 09:47.600
lane markings?
09:47.800 --> 09:51.760
Well, the first criterion would be that we say we need very long,
09:52.040 --> 09:53.020
strong gradients.
09:53.580 --> 09:59.580
Then we would say, OK, we need two strong gradients at a certain
09:59.580 --> 10:00.180
distance.
10:00.540 --> 10:05.380
We know the distance because there are some rules that state that the
10:05.380 --> 10:07.200
lane markings have a certain width.
10:08.160 --> 10:12.280
Actually, in Germany, there are two possible widths, the wide lane
10:12.280 --> 10:15.680
markings and the narrow lane markings.
10:16.830 --> 10:21.420
So we know which widths the lane markings have.
10:21.500 --> 10:27.940
And from that, we can conclude whether two peaks in this gradient plot
10:27.940 --> 10:34.220
here might belong to a single lane marking or might not belong to it.
10:34.620 --> 10:37.740
So we could exclude that, for instance, the combination of this peak
10:37.740 --> 10:42.880
and this peak belongs to the same lane marking because they are too
10:42.880 --> 10:44.600
much different from each other.
10:44.720 --> 10:47.580
The position is too far away from each other.
10:47.860 --> 10:52.480
But we can conclude that here is an interesting pair of peaks and here
10:52.480 --> 10:56.360
is an interesting pair of peaks in the lane in the gradient plot.
10:57.160 --> 11:00.800
So furthermore, so we could also filter out isolated points like this
11:00.800 --> 11:04.740
one and conclude that there is no lane marking or it's at least not
11:04.740 --> 11:06.080
clear enough.
11:06.440 --> 11:10.700
We could use the distance and we could also use the orientation of the
11:10.700 --> 11:11.200
gradients.
11:11.580 --> 11:17.940
So we know that if we look at the gradient orientation at the two
11:17.940 --> 11:23.500
boundary points here of this lane marking, we know the left one at the
11:23.500 --> 11:28.940
left position, the gradient direction must be opposite to the gradient
11:28.940 --> 11:32.240
direction at the right, at the right boundary.
11:33.460 --> 11:40.140
And we might further know that the orientation at the left boundary
11:40.140 --> 11:45.020
must somehow point to the right, maybe a little bit down or upwards,
11:45.260 --> 11:50.160
but major direction pointing to the right and the gradient direction
11:50.160 --> 11:55.640
of the right boundary of the lane marking there, the gradient should
11:55.640 --> 11:57.720
more or less point to the left.
11:59.290 --> 12:04.100
So using these criterions, we could already filter out a lot of these
12:04.100 --> 12:09.460
points so that after a while we might end up with such a situation.
12:10.520 --> 12:14.540
So what I also did is I didn't consider the rows here at the upper
12:14.540 --> 12:18.300
half of the image because there typically we see sky and buildings, so
12:18.300 --> 12:20.000
no lane markings at all.
12:20.300 --> 12:24.500
So now this is the picture which we get when we use this very simple
12:24.500 --> 12:27.180
heuristics to find the lane markings.
12:27.360 --> 12:32.860
The green points here, those which are marked green, are points where
12:32.860 --> 12:38.460
we found left boundary of lane markings and the red points are where
12:38.460 --> 12:40.760
we found right boundaries of lane markings.
12:41.320 --> 12:44.980
Of course, there are some artifacts here in the delineator, here in
12:44.980 --> 12:48.780
the car, there are also some misdetections here in the grass.
12:48.780 --> 12:52.820
If we like, we can further filter out those misdetections if we have a
12:52.820 --> 12:58.320
binocular camera system by eliminating all points which are not on the
12:58.320 --> 13:02.880
ground, then we would get rid even of these points here.
13:04.800 --> 13:12.880
So this is a good heuristics to get a set of points which are very
13:12.880 --> 13:15.300
probably part of a lane marking.
13:15.300 --> 13:19.760
Of course, we see that it's not perfect in the sense that it does not
13:19.760 --> 13:21.200
recognize all lane markings.
13:21.380 --> 13:22.960
So here we have some shadows.
13:23.440 --> 13:26.760
So in the shadowed areas, the gradients are not long enough.
13:26.940 --> 13:31.560
So we don't find points here as well, here in the far away, we also
13:31.560 --> 13:32.620
don't find that line.
13:33.080 --> 13:37.400
But for our approach, it's not necessary to get all the points on the
13:37.400 --> 13:40.760
lane markings, but just enough to get a good estimate.
13:42.020 --> 13:46.960
Okay, that's the more or less classical approach for detecting road
13:46.960 --> 13:50.240
boundary and road areas and lane markings.
13:51.000 --> 13:56.980
Meanwhile, there are more modern approaches which do not explicitly
13:56.980 --> 14:01.240
use lane markings, but which use different approaches to determine
14:01.240 --> 14:04.980
which pixels of the image belong to the road or to the ego lane or
14:04.980 --> 14:05.260
not.
14:05.840 --> 14:07.980
And this is what I want to show with this slide.
14:08.240 --> 14:12.520
So the technique that is used here is called deep learning.
14:12.880 --> 14:17.900
So maybe you heard of that, that many people are quite excited about
14:17.900 --> 14:18.580
this technique.
14:19.220 --> 14:26.120
It's a technique where we train artificial neural network to solve the
14:26.120 --> 14:29.720
task of assigning pixels to object classes.
14:30.720 --> 14:36.000
So the task is, we take such an image here, and we want to know for
14:36.000 --> 14:39.000
each pixel to which kind of object it belongs.
14:39.500 --> 14:42.780
So whether it belongs to the road surface, whether it belongs to the
14:42.780 --> 14:47.140
sidewalk, whether it belongs to vegetation, whether it belongs to a
14:47.140 --> 14:51.080
car, whether it belongs to a pedestrian, whether it belongs to a
14:51.080 --> 14:56.060
bicycle, whether it belongs to a building, whether it belongs to sky,
14:56.600 --> 15:00.600
whether it belongs to a traffic light or a traffic sign, something
15:00.600 --> 15:01.140
like that.
15:01.140 --> 15:03.880
So, we define several categories.
15:04.780 --> 15:09.680
So, in this example, typical number is around between 20 and 30
15:09.680 --> 15:14.280
different categories for typical kind of objects that exist in these
15:14.280 --> 15:14.680
images.
15:16.100 --> 15:20.580
And then the task is to assign each pixel to one of these categories.
15:21.520 --> 15:26.820
So, we want that, for instance, the whole road area here is labeled by
15:26.820 --> 15:28.560
this neural network as road area.
15:28.560 --> 15:29.860
That's one of the categories.
15:31.000 --> 15:37.280
And the sidewalk here, which is indicated here with this bright violet
15:37.280 --> 15:42.960
color, this should be assigned, all these pixels should be assigned to
15:42.960 --> 15:44.320
the category sidewalk.
15:45.000 --> 15:48.900
And green in this case would be vegetation, light blue would be sky,
15:49.020 --> 15:56.020
dark blue would be cars, dark red would be bicycles, and bright red
15:56.020 --> 16:00.040
would be cyclists or pedestrians, and so on.
16:00.100 --> 16:05.080
So, yellow would be traffic signs, like that.
16:05.900 --> 16:14.800
So, neural network then is an approach, a kind of magic machine that
16:14.800 --> 16:16.260
you have to train.
16:17.040 --> 16:24.300
It's like training a pet to do something, you train a neural network
16:24.300 --> 16:28.860
to do something by providing examples and by, say,
16:32.480 --> 16:36.720
by providing examples of the task and then providing some ground
16:36.720 --> 16:36.980
rules.
16:37.120 --> 16:41.400
So, in this case, you would provide a set of images, many images,
16:41.660 --> 16:48.540
maybe thousands, 10,000 or 100,000 of images, typical images of these
16:48.540 --> 16:49.300
kind of scenes.
16:49.820 --> 16:53.620
And for each of these images, you would also provide for each pixel to
16:53.620 --> 16:55.000
which category it belongs.
16:56.440 --> 17:01.140
And then the neural network would start a kind of process that is
17:01.140 --> 17:02.360
called a training process.
17:02.780 --> 17:06.700
And this training process then adapts this neural network to the task
17:06.700 --> 17:07.240
to solve.
17:08.040 --> 17:12.240
And afterwards, if everything went well, the neural network is able to
17:12.240 --> 17:17.460
classify all images that you provide to it concerning the task, to
17:17.460 --> 17:18.260
solve the task.
17:18.940 --> 17:20.400
And this is done here.
17:21.440 --> 17:25.680
For instance, in this case, the categories that were used are, as I
17:25.680 --> 17:28.940
said, something like road and sidewalk and vegetation.
17:30.220 --> 17:33.900
And yeah, the performance of these approaches is rather good.
17:34.400 --> 17:38.540
So the best approaches at the moment achieve something like 80%
17:38.540 --> 17:39.020
accuracy.
17:39.620 --> 17:43.380
So for a single image, you get 80% of all pixels labeled correctly.
17:44.260 --> 17:50.020
And if you additionally use tracking to track the pixels over time,
17:50.420 --> 17:53.520
the accuracy, I guess, would be even higher than that.
17:53.940 --> 18:00.120
Let's have a look at how this looks like on a real data set, on a real
18:00.120 --> 18:00.520
video.
18:00.520 --> 18:04.300
So this is some work that one of our PhD students did.
18:05.760 --> 18:08.560
Yeah, let's just stop it for a moment.
18:08.920 --> 18:12.940
You see the different colors here indicate the different categories.
18:13.440 --> 18:15.680
And we see the broad area in front of us.
18:16.740 --> 18:20.960
So this one here, that is actually labeled as road.
18:21.840 --> 18:25.360
And you see, we also see that the boundaries of this road are met
18:25.360 --> 18:26.100
rather well.
18:26.560 --> 18:30.260
So if we look here at the curbstone, where is it?
18:30.480 --> 18:30.700
Here.
18:31.300 --> 18:35.460
If you look here at the curbstone, you see, yeah, it's really, the
18:35.460 --> 18:36.700
labeling is really good.
18:37.020 --> 18:41.160
So the boundary of the road is met very well.
18:41.980 --> 18:42.140
Yeah.
18:42.440 --> 18:45.440
Then the other labels, then you can see the other colors, for
18:45.440 --> 18:47.080
instance, dark blue means cars.
18:47.500 --> 18:51.060
And if you look at the boundary between cars and roads, it's really
18:51.060 --> 18:57.200
surprising good how the neural network is able to do the separation
18:57.200 --> 18:58.620
between the two areas.
18:59.220 --> 19:02.080
Then this light green is something like grass.
19:03.060 --> 19:06.640
Then this kind of green means trees and vegetation.
19:07.560 --> 19:11.200
Then the yellow things are traffic signs and traffic lights.
19:12.860 --> 19:14.400
Red means persons.
19:15.200 --> 19:16.560
Here we see some persons.
19:16.700 --> 19:18.040
Here we see a person on a bike.
19:19.640 --> 19:23.020
I think this color means sidewalk.
19:23.640 --> 19:26.360
Although in this case, we might discuss whether it's really a
19:26.360 --> 19:29.260
sidewalk, but it looks like a sidewalk.
19:29.960 --> 19:29.960
Yeah.
19:30.640 --> 19:32.540
And here, light blue is sky.
19:34.100 --> 19:34.660
Yeah.
19:34.880 --> 19:37.760
There's a category for building, this kind of brown.
19:38.820 --> 19:41.440
And we see that it's rather good.
19:41.660 --> 19:44.840
And especially the road boundaries can be found rather well.
19:45.280 --> 19:47.660
So this is impressing.
19:47.840 --> 19:51.880
And meanwhile, I would say for state-of-the-art, we would have to say
19:51.880 --> 19:53.560
that we need such a neural network.
19:53.680 --> 19:58.080
The old approach, which is gradient-based, would not work that well as
19:58.080 --> 20:02.840
this approach that is learning-based, based on deep learning.
20:08.260 --> 20:08.740
Yeah.
20:08.740 --> 20:10.460
So a very useful technique.
20:10.560 --> 20:13.940
Of course, you can imagine that this is also very much relevant for
20:13.940 --> 20:15.080
detecting objects.
20:15.700 --> 20:18.180
So what was interesting here, just to show you.
20:22.170 --> 20:26.670
So here you see this bicycle and the person on the bicycle are
20:26.670 --> 20:28.790
different, are labeled in a different way.
20:29.270 --> 20:35.150
So the dark red means bicycle, and the brighter red means person.
20:36.170 --> 20:41.270
So the person on the bicycle is, and the bicycle are separated well by
20:41.270 --> 20:41.930
the neural network.
20:41.930 --> 20:43.970
That is really, really amazing.
20:45.670 --> 20:49.230
And you also see, if you look here at the bicycle, the boundaries are
20:49.230 --> 20:50.310
quite well.
20:52.490 --> 20:56.610
So this, of course, is very interesting technique for us if you want
20:56.610 --> 20:57.790
to recognize the road.
20:58.230 --> 21:02.330
So from state-of-the-art systems, I would claim they must be based on
21:02.330 --> 21:02.870
deep learning.
21:04.570 --> 21:13.010
So, but it's not only possible to label the road versus the rest, but
21:13.010 --> 21:16.510
it's also possible to label individual lanes.
21:17.130 --> 21:21.030
So this is work that was done by Annika Meyer, so you know her
21:21.030 --> 21:21.450
already.
21:22.030 --> 21:24.530
And she did some labeling.
21:24.690 --> 21:28.630
She labeled the images in such a way that she said, okay, this is the
21:28.630 --> 21:32.270
ego lane, the lane on which the ego car is on.
21:32.810 --> 21:36.490
And then I have left neighboring lane and a right neighboring lane,
21:36.890 --> 21:40.030
and maybe lanes for the oncoming traffic.
21:40.330 --> 21:43.930
So all these lanes were labeled in the example images, and neural
21:43.930 --> 21:49.770
network was trained on it, and afterwards it was applied to new video
21:49.770 --> 21:50.290
sequences.
21:50.550 --> 21:52.810
And this is an example of the result.
21:52.810 --> 21:58.750
So green means ego lane, yellow means left neighboring lane, orange
21:58.750 --> 22:00.390
means right neighboring lane.
22:00.990 --> 22:05.210
And you can see that the boundaries of the lane are met rather well.
22:06.210 --> 22:10.410
So now, of course, during the lane change, it's unclear which is the
22:10.410 --> 22:10.890
ego lane.
22:11.030 --> 22:15.170
Therefore, it was a little bit, the results were not so good.
22:15.370 --> 22:19.550
But now the lane change has happened, and now we are on the left lane,
22:19.590 --> 22:22.930
and we see that the right lane is correctly labeled as right
22:22.930 --> 22:23.790
neighboring lane.
22:26.110 --> 22:30.830
Here, the red pixels belong to lanes for oncoming traffic.
22:31.630 --> 22:33.510
They are labeled here with red.
22:35.550 --> 22:39.770
Lane change to the right lane, and we see it's rather good.
22:39.970 --> 22:44.350
So, of course, at some pixels there are sometimes some problems, but
22:44.350 --> 22:48.370
in general, the quality of this labeling is rather good that we
22:48.370 --> 22:52.990
achieve, so that we can also determine not just this is road and this
22:52.990 --> 23:00.090
is not road, but this is ego lane and these are other lanes, so that
23:00.090 --> 23:05.550
we get really a subdivision of the partitioning of the road into
23:05.550 --> 23:06.290
lanes.
23:06.930 --> 23:14.530
Okay, so now we know how we can detect lane boundaries in images,
23:14.670 --> 23:17.670
either based on gradients or based on deep learning.
23:18.570 --> 23:23.650
And now we want to build a map, a geometric representation of the lane
23:23.650 --> 23:28.050
and estimate that just from the boundary, from the lane boundaries
23:28.050 --> 23:29.930
that we have detected in the image.
23:30.570 --> 23:34.490
So to do that, let's start with the simplest case of a straight road.
23:34.710 --> 23:37.210
So we assume the road is just straight.
23:37.570 --> 23:39.410
There are several lanes on the road.
23:39.570 --> 23:42.310
We are mainly interested in our own, in the ego lane.
23:43.110 --> 23:45.330
So how can we model that?
23:46.470 --> 23:54.910
So first, which variables do we need to estimate such a lane to
23:54.910 --> 23:56.050
describe the situation?
23:56.630 --> 24:03.890
So the red car is our ego car, which is driving on the lane.
24:04.470 --> 24:08.110
So the first thing that might happen is that the ego car is not
24:08.110 --> 24:11.050
driving perfectly in the middle of the ego lane.
24:11.210 --> 24:15.130
So what we have to consider is that while the lateral position of the
24:15.130 --> 24:18.570
ego vehicle might deviate from the center line of the lane.
24:19.510 --> 24:20.970
This is shown just by a shift.
24:21.330 --> 24:27.890
Of course, it might also be not perfectly aligned with the center
24:27.890 --> 24:30.070
line, but there might be some jaw angle.
24:30.510 --> 24:32.190
We must consider that as well.
24:32.930 --> 24:38.830
And of course, it's not, it might be located somewhere along the lane,
24:38.950 --> 24:44.430
not necessarily at a position which we might declare the origin of the
24:44.430 --> 24:45.730
road or something like that.
24:46.910 --> 24:52.930
So it's positioned somewhere on the road with a certain jaw angle and
24:52.930 --> 24:57.170
we do not know in advance where it is, where it is located.
24:57.330 --> 25:00.330
So to say, this is a self-localization part of the task.
25:00.970 --> 25:03.930
Determine where the car is, with which orientation.
25:04.730 --> 25:08.570
The second thing, we can describe that, we can describe that by
25:08.570 --> 25:11.070
providing a coordinate transform.
25:11.670 --> 25:15.790
So we assume that we have a coordinate system that is fixed to the
25:15.790 --> 25:18.850
world in which we want to describe the world.
25:19.530 --> 25:21.270
That means, in this case, the road.
25:21.670 --> 25:24.930
This is shown here as a green coordinate system with a green
25:24.930 --> 25:25.790
coordinate axis.
25:26.250 --> 25:30.670
The origin is, we assume, it's somewhere in the center line of the ego
25:30.670 --> 25:32.290
lane at a certain position.
25:32.870 --> 25:37.670
The x-axis is pointing just parallel to the road and the y-axis
25:37.670 --> 25:39.190
orthogonal to the road.
25:39.870 --> 25:43.710
And we have a vehicle coordinate system that is fixed to the vehicle.
25:44.650 --> 25:46.050
This is the blue one.
25:46.690 --> 25:51.030
Wherever you want to put the origin somewhere at the vehicle, here for
25:51.030 --> 25:55.830
in this visualization, I just put it into the center of the rear axle,
25:56.110 --> 26:00.470
but you might also choose another position for this coordinate system.
26:01.290 --> 26:03.270
But it must be fixed to the car.
26:04.970 --> 26:07.190
Then it also have an x-coordinate.
26:07.470 --> 26:11.450
This is typically the longitudinal axis of the car, so it points
26:11.450 --> 26:11.950
forward.
26:12.510 --> 26:17.450
And a y-coordinate that is typically pointing to the left.
26:18.750 --> 26:22.330
So we assume that both coordinate system use the same length unit.
26:22.510 --> 26:27.970
So both, say, calculate positions in meters, so that we don't have any
26:27.970 --> 26:28.810
scaling effects.
26:29.510 --> 26:33.850
And then we know that we can determine the position or the pose of the
26:33.850 --> 26:39.970
vehicle by a coordinate transform that explains how we must shift the
26:39.970 --> 26:43.630
coordinate system and rotate the coordinate system to transform the
26:43.630 --> 26:46.890
green into the blue or the blue into the green coordinate system.
26:47.690 --> 26:50.830
Furthermore, of course, if you want to describe the situation and we
26:50.830 --> 26:54.830
know there is a straight lane, straight road, what we need to know is
26:54.830 --> 26:58.190
the width of the road, or better in this case, the width of the lane,
26:58.350 --> 26:59.470
say capital B.
27:00.130 --> 27:04.810
And furthermore, of course, we need the parameters that describe the
27:04.810 --> 27:09.350
position of the vehicle, say DLUT for the lateral position, the
27:09.350 --> 27:12.810
lateral offset, longitudinal offset, and the joint.
27:13.850 --> 27:18.470
So the three, DLUT, DLONG, and PSI, determine the position of the
27:18.470 --> 27:22.370
vehicle, pose of the vehicle, and B describes the environment,
27:22.610 --> 27:25.610
actually the width of the lane.
27:27.130 --> 27:31.170
Okay, so that means the parameters that we have to determine when we
27:31.170 --> 27:35.610
want to solve this SLAM approach, which is the SLAM task, which it
27:35.610 --> 27:40.370
actually is, is we need to estimate B, the road width, DLONG, DLUT,
27:40.670 --> 27:40.970
and PSI.
27:42.670 --> 27:51.930
So to do that, first we need, of course, to think about how does this
27:51.930 --> 27:56.290
coordinate transform between the vehicle coordinate system and the
27:56.290 --> 27:57.850
road coordinate system works.
27:58.750 --> 28:03.730
Well, it's two right-handed coordinate systems that use the same
28:03.730 --> 28:04.130
scale.
28:05.470 --> 28:11.630
And that means the transformation is done with a shift and rotation
28:11.630 --> 28:12.210
matrix.
28:12.990 --> 28:13.990
And this is shown here.
28:14.210 --> 28:18.410
If we are given a position in vehicle coordinates, then we can
28:18.410 --> 28:21.970
determine the road coordinates of this position.
28:22.250 --> 28:26.310
So you see the color encoding, blue always stands for vehicle
28:26.310 --> 28:32.110
coordinates, and green for road coordinates, by first rotating the
28:32.110 --> 28:35.870
vehicle coordinates by a rotation matrix, so a two-by-two rotation
28:35.870 --> 28:42.670
matrix that reflects the jaw angle of the vehicle PSI, and then adding
28:42.670 --> 28:45.550
the position of that DLONG, DLUT to that.
28:46.010 --> 28:49.430
And of course, we can inverse, we can calculate the inverse mapping,
28:49.850 --> 28:50.810
and it looks like that.
28:50.930 --> 28:54.390
So nothing special, just a coordinate transform.
28:55.430 --> 29:00.090
So based on that, we can derive how we can estimate the road geometry
29:00.090 --> 29:03.890
from detected road boundaries.
29:04.730 --> 29:11.630
So first of all, we want to model the boundary of the road, so the
29:11.630 --> 29:13.770
markings, where the markings can be found.
29:14.570 --> 29:18.590
First, let's start in the road coordinate system, to describe it in
29:18.590 --> 29:19.350
road coordinates.
29:20.310 --> 29:27.230
Well, all the positions on the left boundary of the EGOLANE, well, in
29:27.230 --> 29:30.990
the green coordinate system, can be represented well.
29:31.530 --> 29:38.970
We first go to the left by the half width of the lane, so the origin
29:38.970 --> 29:42.690
of the road coordinate is in the center line of the lane, so we have
29:42.690 --> 29:49.290
to go by the half width of the lane to the left, and then we can go
29:49.290 --> 29:55.770
along the lane, the orientation of the road, as much as we like.
29:55.990 --> 30:01.010
So all the positions here on the left boundary of the lane can be
30:01.010 --> 30:08.270
represented by this representation, where we have plus the vector zero
30:08.270 --> 30:13.830
B half, plus tau times one zero, where tau is just a real number.
30:14.290 --> 30:18.030
This represents all the points on the left boundary.
30:19.230 --> 30:22.690
The points on the right boundary, for them, of course, we have to go
30:22.690 --> 30:29.390
to the right by an amount of B half, and then we can go any distance
30:29.390 --> 30:37.710
forth or back, as we like, and we will end up on the boundary of the
30:37.710 --> 30:37.950
lane.
30:38.270 --> 30:43.670
So this means minus the vector zero B half plus tau times one zero.
30:44.550 --> 30:50.210
That means in this representation, we can describe all the points
30:50.210 --> 30:52.530
which are on the lane boundary of interest.
30:53.770 --> 30:58.510
Now we can transform that into vehicle coordinates, then we get that
30:58.510 --> 31:02.190
in vehicle coordinates, those positions look like that.
31:02.310 --> 31:06.330
So this is just combining this transformation formula from the
31:06.330 --> 31:11.730
previous slide with the formula that we derived here for the road
31:11.730 --> 31:12.150
boundary.
31:13.150 --> 31:17.270
Now, so what can we do now?
31:17.470 --> 31:22.290
So now we have this equation here, which we have on top.
31:23.090 --> 31:29.090
It's actually indeed two equations, one for x and one for y.
31:29.670 --> 31:32.170
And those equations depend on tau.
31:32.990 --> 31:37.250
So now let's combine these two equations in order to eliminate tau.
31:37.910 --> 31:42.070
So it's not really difficult, it's just a little bit of linear
31:42.070 --> 31:44.770
transformations of a linear equation.
31:45.330 --> 31:49.450
And if we do that, we end up with this formula that is shown here.
31:49.530 --> 31:56.070
So we can say that for all points on the lane boundaries holds that y
31:56.070 --> 32:01.930
position in the vehicle coordinate system of that point is equal well
32:01.930 --> 32:07.190
to the minus tangent psi of x vehicle plus minus b half times 1
32:07.190 --> 32:07.730
over...
32:07.730 --> 32:08.990
and you can read it for yourself.
32:09.810 --> 32:14.650
So what we get is actually a linear dependency between the x position
32:14.650 --> 32:19.830
of the point on the road boundary and the y position.
32:20.730 --> 32:32.310
And the factors that we have to consider to establish the relationship
32:32.310 --> 32:39.310
depend on psi, the jaw angle of the vehicle, and on the lateral offset
32:39.310 --> 32:40.610
and on the road width.
32:43.010 --> 32:47.310
Okay, so now we have a linear relationship between x and y.
32:48.110 --> 32:55.090
And if we assume, now we assume that we received a camera image, an
32:55.090 --> 33:00.250
image from the camera, that we extracted all the boundary pixels from
33:00.250 --> 33:05.010
the image, that we transform those pixel positions into vehicle
33:05.010 --> 33:05.510
positions.
33:07.290 --> 33:10.950
And they are available as vehicle positions now.
33:11.470 --> 33:15.910
And our goal is we want to determine psi, b, and d lat.
33:18.570 --> 33:20.410
Based on those observations.
33:21.550 --> 33:26.690
So to simplify the task, what we do is we simplify this equation a
33:26.690 --> 33:27.150
little bit.
33:27.750 --> 33:32.290
So here comes in the tangent of psi, the cosine of psi.
33:33.510 --> 33:35.810
Psi is the jaw angle of a vehicle.
33:35.950 --> 33:40.230
In normal operation mode, when we are just driving on a rural road or
33:40.230 --> 33:43.970
on a highway, we can assume that the jaw angle of the vehicle is
33:43.970 --> 33:44.710
rather small.
33:45.850 --> 33:50.150
We will be more or less driving parallel to the line, to the
33:50.150 --> 33:51.730
orientation of the road.
33:52.350 --> 33:58.050
Now, we will not have a jaw angle of 45 degree or 90 degree.
33:58.490 --> 34:00.670
That's not in normal operation.
34:00.670 --> 34:06.770
Only if something strange happens, or when we enter such a road, but
34:06.770 --> 34:11.050
definitely not when we are just running on the road, we will have
34:11.050 --> 34:13.470
small jaw angles.
34:13.750 --> 34:17.610
Even if we overtake another vehicle, the jaw angles that a vehicle has
34:17.610 --> 34:18.690
is typically small.
34:19.390 --> 34:24.690
And of course for small jaw angles, psi, we can approximate the
34:24.690 --> 34:28.470
tangent of psi by psi and cosine psi by one.
34:29.250 --> 34:32.250
When psi, of course, is represented in radians.
34:32.890 --> 34:35.870
That's a typical approximation that is very often used.
34:36.370 --> 34:37.450
We use it here as well.
34:37.610 --> 34:41.850
So with this approximation, we can simplify this task here, and we get
34:41.850 --> 34:42.690
this relationship.
34:43.190 --> 34:47.690
And the interesting thing is that those unknown parameters psi, b, and
34:47.690 --> 34:50.770
elat occur now as well in a linear way here.
34:52.390 --> 34:57.710
The trigonometric nonlinear functions disappeared, and psi, b, and
34:57.710 --> 35:02.450
elat occur here as linear contributions to this, to the right-hand
35:02.450 --> 35:03.410
side of this equation.
35:05.030 --> 35:10.510
Now, we observed, say, from the camera image, we observed, say, N
35:10.510 --> 35:17.270
capital L points on the left side of the, from the left boundary of
35:17.270 --> 35:23.050
the ego lane, and NR points on the right boundary of the ego lane,
35:23.210 --> 35:31.530
say, with these coordinates xil, yil, or xjr, yjr.
35:34.710 --> 35:38.370
We assume that those are given in vehicle coordinates, yeah, after
35:38.370 --> 35:41.410
transforming image coordinates into vehicle coordinates.
35:42.350 --> 35:49.630
So now what we can say is, if we assume that we know the true jaw
35:49.630 --> 35:54.130
angle psi, and the true lane width b, and the true lateral offset
35:54.130 --> 36:00.950
delat, from the equation from the last slide, we get that when we put
36:00.950 --> 36:06.770
such a left point from the left lane marking onto the right-hand side
36:06.770 --> 36:13.070
of this equation, this should typically be equal to zero.
36:13.550 --> 36:21.330
If we go back, we see y should be equal to minus psi x plus minus b
36:21.330 --> 36:22.550
half minus delat.
36:22.910 --> 36:27.130
So if we subtract y on both sides of the equation, then it should be
36:27.130 --> 36:33.110
zero equal to minus pi times x plus minus b half minus delat minus y.
36:33.950 --> 36:37.130
And this is actually what we find here.
36:37.690 --> 36:40.250
So this is the right-hand side.
36:40.750 --> 36:45.670
So this should be, in an optimal case, be equal to zero.
36:46.450 --> 36:50.150
And for all the points on the right boundary of the lane marking,
36:50.330 --> 36:55.330
those ones, this term here, the second term here, should be equal to
36:55.330 --> 36:55.610
zero.
36:55.790 --> 36:58.890
The only difference, as you can see, is whether we have plus here or
36:58.890 --> 36:59.590
minus here.
36:59.910 --> 37:03.430
Now whether we consider the left boundary or the right boundary.
37:03.990 --> 37:08.090
So in an ideal case, if all the measurements are perfect and all
37:08.090 --> 37:12.530
assumptions are met perfectly, so all these terms here on the right
37:12.530 --> 37:14.470
-hand side should be equal to zero.
37:14.670 --> 37:17.950
Of course, in practice, they aren't, even if we would know the true
37:17.950 --> 37:21.890
position and orientation of the vehicle and the true lane width.
37:21.990 --> 37:27.210
Because there are measurement errors, the road might not be perfectly
37:27.210 --> 37:29.670
straight, etc., etc.
37:30.450 --> 37:37.290
But what we still can at least assume is that the right-hand side of
37:37.290 --> 37:41.350
these two terms are small, close to zero.
37:43.350 --> 37:49.370
And now what we can do when we don't know psi b and delat is that we
37:49.370 --> 37:51.470
state an optimization problem.
37:51.950 --> 37:58.430
We want to determine psi b and delat by minimizing the squared errors
37:58.430 --> 38:02.770
in these equations here that we get in these terms.
38:03.470 --> 38:07.830
If we denote these errors, remaining errors with these epsilon terms,
38:07.970 --> 38:13.890
epsilon il or epsilon jr, then what we want is we go over the pixels
38:13.890 --> 38:20.410
on the left boundary of the ego lane, calculate the squared errors
38:20.410 --> 38:26.250
that are remaining, add up all of those and do the same with the
38:26.250 --> 38:29.730
pixels on the right boundary that we have detected.
38:30.070 --> 38:33.950
Then we say a good estimate or maybe the best estimate for the road
38:33.950 --> 38:38.550
geometry and our ego position on this road geometry is when we
38:38.550 --> 38:42.770
minimize this term and use the parameters that minimize this term.
38:42.890 --> 38:45.570
So this is actually, again, a least squares approach.
38:45.910 --> 38:53.610
Now we minimize the sum over the squared errors of the model of our
38:53.610 --> 38:54.670
road geometry model.
38:56.070 --> 39:01.450
So if you do that, if you substitute these epsilon terms here by the
39:01.450 --> 39:06.610
right hand side of the equations above and then do just some simple
39:06.610 --> 39:12.450
transformations and rewrite everything in matrix form, we will end up
39:12.450 --> 39:16.330
with a system of linear equations as shown here.
39:16.710 --> 39:20.630
So a certain matrix, three by three matrix, times the three unknown
39:20.630 --> 39:25.490
parameters should be equal to a column vector.
39:26.210 --> 39:33.530
And as long, as soon as this matrix here has full rank, as long as it
39:33.530 --> 39:38.810
has rank three, we can solve this system of equations in a unique way,
39:38.990 --> 39:44.850
get a unique solution, and know the best parameters for b, d, lambda,
39:44.990 --> 39:45.330
and psi.
39:46.250 --> 39:46.470
Okay?
39:50.110 --> 39:59.010
So that means the very simple way to estimate the road geometry from a
39:59.010 --> 40:01.670
camera image is the first one.
40:01.930 --> 40:07.090
Well, the first thing is we detect the road boundaries in the image,
40:07.270 --> 40:12.450
either gradient-based or with deep learning-based and semantic
40:12.450 --> 40:13.410
segmentation -based.
40:13.810 --> 40:18.910
The second thing is we transform those pixel positions into a top
40:18.910 --> 40:19.230
view.
40:19.450 --> 40:24.490
That means we project them onto the ground plane around us, and this
40:24.490 --> 40:27.270
yields such a representation.
40:27.770 --> 40:33.830
So here is a vehicle coordinate system, and the red lines are those
40:33.830 --> 40:40.550
lines or the position where we found boundary pixels in the image, but
40:40.550 --> 40:44.010
now projected onto the road into bird's eye view.
40:45.230 --> 40:49.230
The next thing is that we estimate the position parameters and the
40:49.230 --> 40:49.710
lane width.
40:49.710 --> 40:55.390
That means we fit, so to say, this piece of a straight road to the
40:55.390 --> 40:57.050
observations that we had.
40:57.490 --> 41:02.070
So that's the third step, this least squares estimation step.
41:02.850 --> 41:06.850
And then at the end, the situation looks like that, and we see that
41:06.850 --> 41:13.010
those positions in red where we have seen lane boundaries should, in
41:13.010 --> 41:20.790
an ideal case, fit perfectly to the lane markings that are shown in
41:20.790 --> 41:21.190
the model.
41:22.390 --> 41:27.890
Okay, so that's the first and, well, simplest approach to estimate
41:27.890 --> 41:29.570
those lane boundaries.
41:29.890 --> 41:34.710
Of course, what we see is we started that with the fact that our model
41:34.710 --> 41:39.630
has four parameters, and we see that we can only estimate three of
41:39.630 --> 41:39.990
those.
41:41.070 --> 41:46.050
We can only estimate the draw angle, the lateral position, and the
41:46.050 --> 41:50.010
width of the lane, but we can't estimate the longitudinal position.
41:50.210 --> 41:53.390
It does not occur in the equations that we had.
41:53.990 --> 42:00.670
And of course, the reason is we could shift this piece of a road forth
42:00.670 --> 42:07.590
and back, and still it would fit as well to the observations as it
42:07.590 --> 42:09.170
fits in this configuration.
42:09.870 --> 42:12.030
That means we cannot determine d long.
42:12.390 --> 42:17.430
Some people, therefore, prefer to just say d long is equal to zero, or
42:17.430 --> 42:21.050
maybe we need some other sources of information from which we can
42:21.050 --> 42:21.910
estimate d long.
42:23.570 --> 42:31.030
So maybe we have something like markings on the road which are lines
42:31.030 --> 42:31.650
like that.
42:31.910 --> 42:35.850
So stop lines, for instance, those could be helpful, or other
42:35.850 --> 42:36.270
structures.
42:36.430 --> 42:39.970
We have seen in the example about self-localization with the particle
42:39.970 --> 42:48.310
filter that the poles next to the road were used as features to
42:48.310 --> 42:51.230
estimate the longitudinal position of the vehicle.
42:51.650 --> 42:53.950
Things like that could be used for that purpose.
42:55.210 --> 42:59.490
But just from the lane boundaries, it's not possible to determine the
42:59.490 --> 43:00.550
longitudinal position.
43:01.510 --> 43:04.790
Sometimes, of course, what you could do is, if you have dashed lines
43:04.790 --> 43:08.670
like here in the middle, you could use the position of the dashed
43:08.670 --> 43:11.950
lines or the endpoints of the dashed lines, for instance, also as
43:11.950 --> 43:15.670
features that enable you to estimate the longitudinal position, at
43:15.670 --> 43:20.130
least to restrict it to some possible positions.
43:23.300 --> 43:26.960
Okay, so now that was the first thing.
43:27.220 --> 43:30.760
So a one-shot localization, so to say, and a one-shot mapping.
43:32.000 --> 43:35.620
Like in the chapter number six, when we were talking about self
43:35.620 --> 43:38.040
-localization, this is the first step.
43:38.300 --> 43:43.000
But often we do not want to estimate our position just from a single
43:43.000 --> 43:45.500
image, but we want to track the position over time.
43:46.400 --> 43:51.420
So let's extend our modeling in such a way that we can track the
43:51.420 --> 43:57.180
unknown parameters over time, so that we track our position and also
43:57.180 --> 44:01.880
track the width of the road over time.
44:02.540 --> 44:06.920
So therefore, we have to go back to the full state vector with four
44:06.920 --> 44:10.940
variables, including also the longitudinal position of the vehicle.
44:11.820 --> 44:13.080
Now, what happens?
44:13.220 --> 44:17.780
So we assume this is the initial situation at point in time t.
44:18.000 --> 44:22.360
So the upper index t here should not be the power, but just upper
44:22.360 --> 44:24.260
index to indicate the time.
44:25.240 --> 44:30.080
So at point in time t, the vehicle is located like that, let's assume,
44:30.260 --> 44:33.420
as given here by the blue coordinate, vehicle coordinate system.
44:34.080 --> 44:36.820
Then it somehow moves.
44:37.480 --> 44:42.800
We assume that maybe we know the translation that it makes, that is
44:42.800 --> 44:44.420
shown here by the vector m.
44:46.020 --> 44:49.880
For instance, from on-board sensors like wheel encoders, or we can use
44:49.880 --> 44:53.560
visual odometry or things like that to determine these positions.
44:53.680 --> 44:57.880
But there is actually some shift going on, the vehicle is moving
44:57.880 --> 44:58.780
somehow forward.
44:59.740 --> 45:04.760
Furthermore, also the jaw angle might change a little bit, so due to
45:04.760 --> 45:06.360
steering it might change.
45:06.480 --> 45:11.960
We might use sensors that sense the steering angle or something like
45:11.960 --> 45:14.800
that to determine the shift in jaw angle.
45:14.920 --> 45:16.240
Let's determine it phi.
45:16.600 --> 45:20.960
So that afterwards the new vehicle coordinate system at point in time
45:20.960 --> 45:25.400
t plus one is given by the violet coordinate system.
45:25.480 --> 45:30.480
And we see it is shifted by this vector m and it's rotated by this
45:30.480 --> 45:31.500
angle phi.
45:32.120 --> 45:38.120
That means what we can conclude is that now the new position d long d
45:38.120 --> 45:43.840
lat at point in time t plus one is equal to the old position plus,
45:44.260 --> 45:50.260
well, a rotation matrix, namely the rotation matrix by the current jaw
45:50.260 --> 45:53.140
angle psi times the shift vector m.
45:55.200 --> 46:01.140
And the jaw angle has changed from psi t to psi t plus phi.
46:01.960 --> 46:05.660
Now, why do we need this rotation matrix here?
46:06.000 --> 46:06.920
You might ask.
46:07.240 --> 46:11.640
Well, we need it if we assume that this vector m is typically sensed
46:11.640 --> 46:16.020
by on-board sensors, so it provides the shift of the vehicle
46:16.740 --> 46:19.900
represented in the blue vehicle coordinate system.
46:20.500 --> 46:24.560
Now that is the typical way you represent such a movement.
46:25.240 --> 46:29.420
Now you represent in the blue vehicle coordinate system the shift, but
46:29.420 --> 46:33.560
of course what we want to determine is the new position of the vehicle
46:33.560 --> 46:37.940
in world coordinates, in road coordinates, so in coordinates which are
46:37.940 --> 46:40.200
fixed to the world and not fixed to the vehicle.
46:40.960 --> 46:47.620
And therefore, we have to consider first transform this vector m into
46:47.620 --> 46:52.400
road coordinates, and this is done by multiplying it here with this
46:52.400 --> 46:53.260
rotation matrix.
46:54.060 --> 46:58.260
So if you like, we can simplify the rotation matrix again if we assume
46:58.260 --> 46:59.980
that the jaw angles are small.
47:00.680 --> 47:06.680
We approximate cosine of psi by one, and we approximate sine of psi by
47:06.680 --> 47:12.460
psi itself, which again simplifies things and linearizes things which
47:12.460 --> 47:16.160
will turn out to be useful for later steps in tracking.
47:18.080 --> 47:22.080
So of course, the last thing that we have to consider is how does the
47:22.080 --> 47:23.640
lane widths change over time.
47:24.240 --> 47:28.840
So we also have to provide how does bt plus one for the next point in
47:28.840 --> 47:31.780
time depends on the present lane width.
47:31.900 --> 47:35.240
And here for simplicity, let's just assume that it stays the same,
47:35.660 --> 47:36.280
more or less.
47:37.020 --> 47:40.280
I think for most roads this assumption is met.
47:40.860 --> 47:43.420
So we assume bt plus one is equal to b.
47:44.580 --> 47:49.160
So that means we can represent the transformation of the state
47:49.160 --> 47:54.880
variables from point in time t to point in time t plus one in this way
47:54.880 --> 47:56.780
by this linear relationship.
47:57.660 --> 48:02.280
So this is actually just showing the same thing that we derived on the
48:02.280 --> 48:05.660
last slide, but now as a matrix vector multiplication.
48:06.720 --> 48:12.100
So if we have a look at that, bt plus one, that's the first entry here
48:12.100 --> 48:15.720
as we can see when we multiply the first row here with the state
48:15.720 --> 48:17.980
vector as t is equal to bt.
48:18.340 --> 48:21.620
So what comes here is bt plus one is equal to bt.
48:21.620 --> 48:26.420
Yeah, and for the jaw angle here, we get for the fourth component, we
48:26.420 --> 48:32.840
get the new jaw angle psi t plus one is equal, well, to one times psi
48:32.840 --> 48:36.860
t, the present jaw angle plus phi.
48:37.080 --> 48:41.800
That was actually this change of the orientation.
48:42.960 --> 48:48.240
And if we look at the position d long d lat, we see it's something the
48:48.240 --> 48:53.860
old position which is represented here by these two entries of one
48:53.860 --> 49:04.540
here in the diagonal plus, well, minus my times the jaw angle psi plus
49:04.540 --> 49:05.140
mx.
49:05.880 --> 49:10.360
And this, if you compare it to the slide on the last, to the last
49:10.360 --> 49:14.060
slide, you will see that this was actually the formula that we derived
49:14.060 --> 49:21.420
to represent this change of d long.
49:22.120 --> 49:27.740
Yeah, and in the next row, respect the change of d lat.
49:29.360 --> 49:32.020
So this is nothing else than what we just derived.
49:32.120 --> 49:34.120
And we see it's a nice linear relationship.
49:34.480 --> 49:37.820
So it has a perfect shape for a Kalman filter.
49:38.940 --> 49:40.740
Yeah, very nice.
49:41.660 --> 49:44.200
So that means we can represent it like that.
49:44.360 --> 49:50.240
So s t plus one is equal to a t plus times s t plus u t.
49:50.560 --> 49:54.280
And now you see why we need, for instance, such an offset in the
49:54.280 --> 49:56.100
Kalman filter model.
49:56.520 --> 50:04.940
And why these variables t here, this transition matrix a, they might
50:04.940 --> 50:07.000
depend on t on the point in time.
50:07.380 --> 50:11.780
Well, because here we that this mx my depends on the time.
50:12.500 --> 50:14.800
It's not the same for all points in time.
50:16.220 --> 50:20.680
Okay, but this is the shape that we said is required for a Kalman
50:20.680 --> 50:21.000
filter.
50:21.360 --> 50:24.720
Now let's have a look at the observation and measurements, whether
50:24.720 --> 50:29.280
they can also somehow model in such a way that they are suitable for a
50:29.280 --> 50:29.880
Kalman filter.
50:31.400 --> 50:33.180
So the observations.
50:33.560 --> 50:38.080
So again, we observe nl points on the left marking and nr points on
50:38.080 --> 50:41.600
the right marking in vehicle coordinates, already transformed into
50:41.600 --> 50:42.460
vehicle coordinates.
50:43.380 --> 50:54.320
So if we take one of those points, and we draw a line into this bird's
50:54.320 --> 51:02.100
eye view plot, the line that contains all points with the respective x
51:02.100 --> 51:05.900
-coordinate, then this line would look like that.
51:06.080 --> 51:10.000
So it would be parallel to the y-axis of the vehicle coordinate
51:10.000 --> 51:10.460
system.
51:10.600 --> 51:14.760
So the pilot coordinate system is the vehicle coordinate system.
51:15.200 --> 51:21.940
And of course, this x, this line here, with constant value of x, would
51:21.940 --> 51:26.240
just be parallel to the y-axis at a certain distance, which is
51:26.240 --> 51:27.120
actually x.
51:27.120 --> 51:32.500
And of course, we would assume that those points here that we observe
51:32.500 --> 51:39.360
on the boundaries, they have this point here for the left boundary and
51:39.360 --> 51:40.960
that point for the right boundary.
51:42.500 --> 51:47.040
Okay, so how can we determine them?
51:56.560 --> 52:00.660
We know the x-coordinate in the vehicle coordinate system.
52:03.360 --> 52:10.540
That means we can represent all these points here on the line, on this
52:10.540 --> 52:21.360
dotted line, as, in vehicle coordinates, as, well, the point x0,
52:21.760 --> 52:27.560
that's actually this point, plus tau times 0,1.
52:28.040 --> 52:33.680
So 0,1, the vector would be this vector parallel to the y-axis, and we
52:33.680 --> 52:38.640
just go a certain amount along this dashed line, a certain amount tau.
52:38.640 --> 52:41.520
So tau is a real number.
52:42.040 --> 52:46.840
Then we take this line representation and transform it into the road
52:46.840 --> 52:52.480
coordinate system, assuming a certain draw angle, a certain position
52:52.480 --> 52:53.240
of the vehicle.
52:54.820 --> 53:01.500
And afterwards, we eliminate the tau value for those points which are
53:01.500 --> 53:02.380
on the boundaries.
53:02.380 --> 53:08.380
We get a line representation, and we, so to say, intersect this line
53:08.380 --> 53:16.140
with the left boundary and the right boundary of the equal lane in the
53:16.140 --> 53:18.440
road coordinate system.
53:22.820 --> 53:23.300
Sorry, no.
53:25.520 --> 53:27.600
Okay, so somehow like that.
53:28.060 --> 53:32.900
So actually, so what we do is, we know the position of the lane
53:32.900 --> 53:35.640
boundaries in the road coordinate system.
53:35.740 --> 53:40.740
If we assume that we know the state parameters, the parameters of lane
53:40.740 --> 53:44.620
widths and everything, then we can transform it into the vehicle
53:44.620 --> 53:45.640
coordinate system.
53:46.060 --> 53:53.540
And then we can intersect that with this line x equals a certain
53:53.540 --> 54:00.240
value, and then we get the position of the y-coordinates, yl and yr.
54:01.020 --> 54:02.720
And that is actually what we see here.
54:02.720 --> 54:06.040
So I was wrong, so the transformation was just the other way around,
54:06.080 --> 54:06.980
as I explained it.
54:07.340 --> 54:10.400
So and that is, yields these equations.
54:11.040 --> 54:14.800
Yeah, with a little bit of calculation, not too much, we get this
54:14.800 --> 54:15.600
equation here.
54:16.660 --> 54:22.060
So, and when we have this equation, we can, we see, actually, it's a
54:22.060 --> 54:24.760
linear equation in the state variables.
54:25.380 --> 54:29.140
So psi, b, and elat occur in a linear way here.
54:29.140 --> 54:34.680
So again, these are linear observation equations, and therefore, we
54:34.680 --> 54:36.380
can represent it like that.
54:36.820 --> 54:43.460
So, we do this for all the observed positions on the lane markings.
54:44.180 --> 54:48.200
Each one is represented with a certain x and a certain y value, and
54:48.200 --> 54:53.620
each one creates one row in this observation equation that we see
54:53.620 --> 54:53.960
here.
54:54.420 --> 54:55.600
So, why?
54:56.260 --> 55:03.640
So for the left boundaries, boundary points, we know those should be
55:03.640 --> 55:10.800
equal to, well, a half times b plus zero times d long, that doesn't
55:10.800 --> 55:17.400
matter, minus delat minus psi times xil, the x-coordinate.
55:18.260 --> 55:22.140
So we treat the x-coordinate that we have sensed here as constants,
55:22.940 --> 55:28.360
which are not affected by noise, which we treat as being known, and
55:28.360 --> 55:34.200
the y-coordinate as a noisy term, a noisy measurement, that we model
55:34.200 --> 55:35.720
here in the measurement equation.
55:37.000 --> 55:43.480
So this is done for all points on the left boundary, leading the first
55:43.480 --> 55:50.340
capital NL rows here of the matrix, and then we continue with the
55:50.340 --> 55:57.400
points on the right boundary, yielding further N capital R rows here.
55:57.480 --> 56:01.060
And the only difference that you can see is that here in the first
56:01.060 --> 56:06.560
column, we have for the left boundary points the a half factor, while
56:06.560 --> 56:10.900
we have it for the right boundary points the minus a half factor.
56:11.840 --> 56:13.200
So that's actually
56:18.420 --> 56:21.980
the observation equation that we get.
56:22.360 --> 56:25.640
And we see it's a linear observation equation and it actually
56:26.300 --> 56:30.840
perfectly meets the requirements for a Kalman filter.
56:31.840 --> 56:35.940
So now we know that we have a linear state transition model and a
56:35.940 --> 56:40.120
linear observation model, and of course we can assume that both are
56:40.120 --> 56:41.600
affected by some noise.
56:41.620 --> 56:45.400
If we assume it's Gaussian noise, then we have the perfect conditions
56:45.400 --> 56:50.200
to use a Kalman filter to solve this estimation task for us.
56:52.460 --> 56:57.400
So linear system model, linear observation model, use a Kalman filter
56:57.400 --> 56:59.060
to estimate the state variables.
57:01.980 --> 57:02.260
Okay?
57:02.420 --> 57:04.140
And the rest is just implementation.
57:04.720 --> 57:07.620
The rest is just implement the Kalman filter for this system.
57:09.620 --> 57:09.940
Okay?
57:14.800 --> 57:15.340
Yeah?
57:16.100 --> 57:16.380
No?
57:21.380 --> 57:24.360
Okay, then let's assume that's clear.
57:26.020 --> 57:27.380
So next step.
57:28.280 --> 57:32.120
Unfortunately, okay, and of course if we have a Kalman filter, this
57:32.120 --> 57:33.320
means how does it work?
57:33.440 --> 57:35.240
Well, we have a prediction step.
57:35.500 --> 57:39.340
In the prediction step, we use the onboard sensors to predict in which
57:39.340 --> 57:40.620
way the vehicle has moved.
57:41.040 --> 57:44.040
So in which way the blue coordinate system changed to the violet
57:44.040 --> 57:45.000
coordinate system.
57:45.540 --> 57:47.900
Then we do the image pre-processing.
57:48.020 --> 57:54.400
We detect the lane markings and transform those into bird's eye view.
57:54.780 --> 57:59.160
So we do a coordinate transform and transform those lane markings into
57:59.160 --> 58:00.280
the top view here.
58:01.180 --> 58:05.060
And afterwards, we do the innovation step and then the innovation step
58:05.060 --> 58:08.520
is moving the coordinate system to a position at which the
58:08.520 --> 58:12.680
observations fit better to the model of the road than before.
58:12.920 --> 58:14.660
That's actually how this works.
58:14.820 --> 58:17.940
And then we go through that again and again.
58:19.720 --> 58:22.780
So unfortunately, the world is not just straight.
58:22.920 --> 58:25.380
So we have sometimes roads which are curved.
58:25.560 --> 58:30.200
So let's extend our this reasoning for roads which are not straight.
58:30.200 --> 58:36.460
The first extension is to consider roads which just follow a circle, a
58:36.460 --> 58:37.280
circular arc.
58:37.780 --> 58:40.060
So then we would be faced with this situation.
58:40.860 --> 58:45.680
And yeah, we have a world, a road coordinate system here.
58:46.060 --> 58:50.040
Again, the origin is on the center line of the ego lane, which is now
58:50.040 --> 58:50.560
curved.
58:51.480 --> 58:54.460
We have the vehicle coordinate system here.
58:56.000 --> 59:02.500
And yeah, we want to describe the situation now as well.
59:02.780 --> 59:06.060
Of course, what we still need is a lane width that's clear.
59:07.240 --> 59:09.580
That's the same as for the straight case.
59:10.120 --> 59:15.820
Then, of course, we need to describe somehow the curvature of the
59:15.820 --> 59:16.160
road.
59:17.100 --> 59:24.680
So in this case, we could use either the radius of this curve or we
59:24.680 --> 59:26.280
could use the curvature.
59:26.580 --> 59:29.900
And the curvature kappa is just one over the radius.
59:31.120 --> 59:34.040
And sometimes it's more convenient to use the curvature.
59:34.280 --> 59:36.600
Sometimes it's more convenient to use the radius.
59:37.080 --> 59:41.160
But if you want to, they are just related like that.
59:41.600 --> 59:46.400
Furthermore, let's consider that the radius here is a sine number.
59:47.780 --> 59:53.220
So this is a curve to the left, but we need also be able to represent
59:53.220 --> 59:54.320
curves to the right.
59:55.240 --> 59:59.840
And we do that by using the sine of the radius or the sine of the
59:59.840 --> 01:00:00.180
curvature.
01:00:01.500 --> 01:00:06.760
So if the sine is negative, it's a curve to one direction.
01:00:07.040 --> 01:00:11.020
And if the sine is positive, it's a curve to the other direction.
01:00:11.440 --> 01:00:16.140
In this case, since the radius points from left to right, while the
01:00:16.140 --> 01:00:20.480
road coordinate system is pointing from the right to the left, the
01:00:20.480 --> 01:00:23.300
radius here would be negative in the left turn.
01:00:24.120 --> 01:00:27.960
But for a right curve to the right, the radius would be positive.
01:00:28.660 --> 01:00:29.600
Consider that.
01:00:29.900 --> 01:00:32.060
So we need to consider the sine.
01:00:33.620 --> 01:00:38.360
Then, of course, we still have a lateral offset of the vehicle to the
01:00:38.360 --> 01:00:39.600
center line of the road.
01:00:40.180 --> 01:00:44.940
And we have a longitudinal position, which is now not just a straight
01:00:44.940 --> 01:00:50.820
vector, but it's the length of the arc from the initial position from
01:00:50.820 --> 01:00:55.360
this road center of the road coordinate system to that position here
01:00:55.360 --> 01:00:58.160
on the center line that is the closest to the real vehicle.
01:00:58.480 --> 01:01:02.420
So this arc length is now considered the longitudinal position.
01:01:04.400 --> 01:01:06.760
So then still we have a jaw angle.
01:01:07.100 --> 01:01:11.800
And of course, this jaw angle somehow is not a jaw angle with respect
01:01:11.800 --> 01:01:18.180
to the road coordinate system, but it's a jaw angle with respect to
01:01:18.180 --> 01:01:26.120
this radial line that intersects the vehicle position.
01:01:26.340 --> 01:01:31.760
So this angle now is the jaw angle that we typically want to model.
01:01:33.000 --> 01:01:37.340
So that means the entire situation can be expressed now with five
01:01:37.340 --> 01:01:38.220
parameters.
01:01:39.260 --> 01:01:44.640
Compared to the straight case, the additional parameter is the radius
01:01:44.640 --> 01:01:45.720
or the curvature.
01:01:46.240 --> 01:01:48.560
So we could use either one or the other.
01:01:49.900 --> 01:01:54.500
So however, the disadvantage of that is that the whole model becomes
01:01:54.500 --> 01:01:55.060
non -linear.
01:01:55.200 --> 01:01:59.020
I don't derive it here, but you can easily imagine that with such a
01:01:59.020 --> 01:02:04.940
curvature, we get some cosine and sine terms in the modeling.
01:02:05.480 --> 01:02:10.080
And we cannot assume anymore that this angle here is close to zero.
01:02:10.340 --> 01:02:13.020
So we cannot use a linearization of those terms.
01:02:13.480 --> 01:02:15.480
And therefore, the whole thing becomes non-linear.
01:02:16.160 --> 01:02:21.380
And that means if we want to estimate this vector here for this more
01:02:21.380 --> 01:02:27.480
general case of a road that might have a constant curvature, we will
01:02:27.480 --> 01:02:31.940
not be able to use a Kalman filter anymore, but we have to go to non
01:02:31.940 --> 01:02:32.780
-linear methods.
01:02:33.100 --> 01:02:36.640
So either to non-linear filters like the extended Kalman filter,
01:02:36.760 --> 01:02:40.440
unscented Kalman filter, particle filter, or if we want to use a
01:02:40.440 --> 01:02:43.700
regression approach, so a least squares approach, a regression
01:02:43.700 --> 01:02:48.980
approach, as we have derived the non-linear regression approach.
01:02:49.100 --> 01:02:55.840
So we would all still try to minimize the squared errors, the squared
01:02:55.840 --> 01:03:00.840
observation errors, but the term that we have to minimize cannot be
01:03:00.840 --> 01:03:04.940
resolved analytically, but we would need some numerical solver that
01:03:04.940 --> 01:03:06.840
minimizes functions.
01:03:08.960 --> 01:03:13.140
Okay, that would be then the case for this curved road case.
01:03:13.740 --> 01:03:20.820
So still this is not sufficient to describe the real shape of roads,
01:03:21.180 --> 01:03:26.160
though, because if we would assume that we have a situation like that,
01:03:26.220 --> 01:03:30.640
a straight segment of a road and then a curved segment with a constant
01:03:30.640 --> 01:03:35.140
curvature, then the curvature would change over the arc length like
01:03:35.140 --> 01:03:35.400
that.
01:03:35.480 --> 01:03:39.180
So in the straight line the curvature would be zero, then we would
01:03:39.180 --> 01:03:43.400
enter at that point the curve, and the curvature would be different
01:03:43.400 --> 01:03:44.780
from zero but constant.
01:03:45.180 --> 01:03:45.780
So like that.
01:03:46.240 --> 01:03:47.200
So it would be like that.
01:03:47.900 --> 01:03:52.020
That would mean for the driver here in this area the driver has to
01:03:52.020 --> 01:03:57.520
keep the steering wheel in the center position, and then at this
01:03:57.520 --> 01:04:02.600
position the driver has to turn it very quickly into a different
01:04:02.600 --> 01:04:05.520
position to follow the curve.
01:04:06.260 --> 01:04:12.620
That would be not that nice for the driver, and therefore roads are
01:04:12.620 --> 01:04:14.380
not built like that, if possible.
01:04:14.780 --> 01:04:17.560
Yeah, of course, sometimes you cannot build them differently than
01:04:17.560 --> 01:04:22.660
this, because there are buildings or whatever that do not allow other
01:04:22.660 --> 01:04:29.160
shapes, but on highways especially, highways are not built like that.
01:04:29.720 --> 01:04:31.400
They are built like that.
01:04:32.020 --> 01:04:34.800
So or more similar like that.
01:04:35.120 --> 01:04:39.760
So we start on a straight area, a straight segment of the road with
01:04:39.760 --> 01:04:45.520
curvature zero, then the curvature slowly increases up to it up to a
01:04:45.520 --> 01:04:48.600
certain point, and then it's constant for the rest.
01:04:49.000 --> 01:04:53.940
And that means throughout this part here of the road, the driver can
01:04:53.940 --> 01:05:00.020
slowly turn the steering wheel into the necessary position to follow
01:05:00.020 --> 01:05:00.700
this circle.
01:05:01.400 --> 01:05:08.440
And this shape that changes from straight to circle with constant
01:05:08.440 --> 01:05:11.100
radius is called a clothoid.
01:05:11.680 --> 01:05:17.680
So a clothoid is a curve that has a linearly increasing curvature.
01:05:19.080 --> 01:05:22.920
For instance, it starts with curvature zero, and then over the arc
01:05:22.920 --> 01:05:26.180
length of the clothoid, the curvature is increasing.
01:05:26.880 --> 01:05:32.180
And that's at least similar to how highways are built.
01:05:32.300 --> 01:05:37.300
And therefore, it's very useful to use assume a clothoid as a
01:05:37.300 --> 01:05:39.340
geometric model for roads.
01:05:39.580 --> 01:05:41.840
So let's have a look at these clothoids.
01:05:42.500 --> 01:05:48.360
So actually, a clothoid is defined that the curvature of the of the
01:05:48.360 --> 01:05:54.540
curve depends on the arc length, capital L, with a linear
01:05:54.540 --> 01:05:55.220
relationship.
01:05:55.720 --> 01:06:02.160
So it's an initial curvature kappa zero plus kappa one times L, where
01:06:02.160 --> 01:06:06.080
kappa one is a change rate for the curvature.
01:06:07.160 --> 01:06:11.720
That means the curvature is changing linearly over the arc length.
01:06:12.440 --> 01:06:15.200
So if you want to draw a clothoid, it looks like that.
01:06:15.440 --> 01:06:20.200
So here the curve curvature is changing linearly over the arc length,
01:06:20.360 --> 01:06:23.940
starting with zero curvature here at the beginning, and then
01:06:23.940 --> 01:06:25.500
increasing curvature.
01:06:26.440 --> 01:06:30.440
So for us, mainly, this part of the clothoid is interesting.
01:06:31.220 --> 01:06:34.260
That part here is not interesting.
01:06:34.420 --> 01:06:36.620
No one builds a road like that, of course.
01:06:37.140 --> 01:06:42.260
But this initial part here with small curvatures, this is interesting
01:06:42.260 --> 01:06:42.820
for us.
01:06:44.920 --> 01:06:49.640
So now let's have a look about the clothoid.
01:06:49.820 --> 01:06:54.200
We assume that the image that we have a coordinate system, a road
01:06:54.200 --> 01:07:00.580
coordinate system like that, and the clothoid represents the center
01:07:00.580 --> 01:07:01.800
line of our lane.
01:07:02.540 --> 01:07:09.000
So we assume that here in the origin, the clothoid, the tangent of the
01:07:09.000 --> 01:07:15.240
clothoid is equal to the x-axis, and that the initial curvature kappa
01:07:15.240 --> 01:07:17.340
zero in this case is equal to zero.
01:07:19.160 --> 01:07:24.740
Now let's have a look at the azimuth angle, the tangent angle at the
01:07:24.740 --> 01:07:26.280
clothoid for a certain point.
01:07:26.400 --> 01:07:27.840
So let's consider this point.
01:07:28.180 --> 01:07:30.100
It has a certain arc length L.
01:07:30.680 --> 01:07:36.860
And we might be interested in this angle here, which describes
01:07:36.860 --> 01:07:40.000
actually the tangential direction of the clothoid.
01:07:41.580 --> 01:07:43.680
How can we calculate it?
01:07:43.800 --> 01:07:50.240
Well, we know that the curvature of the clothoid is given by this law
01:07:50.240 --> 01:07:50.580
here.
01:07:52.020 --> 01:07:58.240
And that means when we follow the clothoid for a certain arc length,
01:07:58.360 --> 01:08:02.660
what we have to do is have to integrate the curvature over this arc
01:08:02.660 --> 01:08:02.940
length.
01:08:03.440 --> 01:08:07.540
So the present curvature tells us something about the change of the
01:08:07.540 --> 01:08:11.200
orientation, the local change of orientation, when we follow the
01:08:11.200 --> 01:08:11.660
clothoid.
01:08:12.100 --> 01:08:17.280
And when we follow a longer segment of the clothoid, we have to
01:08:17.280 --> 01:08:21.980
integrate these local changes of the orientation, that means these
01:08:21.980 --> 01:08:22.920
local curvatures.
01:08:23.220 --> 01:08:28.100
So what we do is we calculate the integral from zero to capital L of
01:08:28.100 --> 01:08:29.420
kappa of lambda d lambda.
01:08:30.060 --> 01:08:34.360
And of course, this is a polynomial, so everyone knows to calculate
01:08:34.360 --> 01:08:40.220
the integral, this is equal to kappa zero times L plus a half kappa
01:08:40.220 --> 01:08:41.340
one times L squared.
01:08:42.780 --> 01:08:47.680
Now we know this angle here, the tangent angle of the clothoid at each
01:08:47.680 --> 01:08:48.140
position.
01:08:48.320 --> 01:08:51.240
And as we can see, we can easily calculate that.
01:08:51.360 --> 01:08:54.760
We find a closed form solution for it.
01:08:55.280 --> 01:08:59.620
Now let's ask where is the position of this point.
01:08:59.780 --> 01:09:05.740
So if we know the arc length L, which is the position in this xy
01:09:05.740 --> 01:09:07.660
coordinate system of that point.
01:09:08.420 --> 01:09:09.720
What do we have to do?
01:09:09.860 --> 01:09:16.520
Well, we follow the clothoid from here to that position, and we just
01:09:16.520 --> 01:09:21.860
add up small vectors, small tangent vectors along this clothoid.
01:09:23.160 --> 01:09:28.120
You can imagine we calculate the tangent vector here, and we follow it
01:09:28.120 --> 01:09:29.220
by a certain amount.
01:09:29.600 --> 01:09:33.600
Then at this position, we again calculate the tangent vector, follow
01:09:33.600 --> 01:09:35.860
it by a certain distance, and so on.
01:09:36.140 --> 01:09:37.340
Like that, we...
01:09:39.620 --> 01:09:41.680
step by step, we go to this position.
01:09:41.820 --> 01:09:45.380
And now we shorten the length of the tangent vectors more and more,
01:09:45.600 --> 01:09:52.980
and in the infinitesimally short tangent case, we do not follow pieces
01:09:52.980 --> 01:09:58.420
of the tangent, but we just integrate those tangent vectors over the
01:09:58.420 --> 01:09:58.900
arc length.
01:09:59.500 --> 01:10:04.180
By doing that, we get the x and y position on the clothoid.
01:10:04.400 --> 01:10:07.860
So what we have to do, so the tangent vector, of course, here is
01:10:07.860 --> 01:10:13.680
cosine of xi, sine of xi.
01:10:15.460 --> 01:10:20.840
And so what we have to do is integrate from 0 to L cosine of xi of
01:10:20.840 --> 01:10:26.540
lambda d lambda, and to get y, we integrate sine of xi of lambda d
01:10:26.540 --> 01:10:28.120
lambda from 0 to L.
01:10:29.520 --> 01:10:33.720
So the problem here, of course, as you can see, is there is no closed
01:10:33.720 --> 01:10:35.860
form solution for this integral anymore.
01:10:36.360 --> 01:10:42.160
So we have a cosine of a polynomial, and no one could tell me what the
01:10:42.160 --> 01:10:43.980
closed form solution of this integral is.
01:10:44.000 --> 01:10:45.340
So it doesn't exist, actually.
01:10:45.980 --> 01:10:48.240
So the integral exists, of course.
01:10:49.060 --> 01:10:53.640
This is a potential series, and we can always calculate the integral
01:10:53.640 --> 01:10:57.720
of a potential series, but there is no closed form solution.
01:10:58.060 --> 01:10:59.920
That's the big disadvantage here.
01:11:01.020 --> 01:11:05.120
And if we don't have a closed form solution, again, we run into the
01:11:05.120 --> 01:11:08.540
problem that we have to solve everything numerically, and that's not
01:11:08.540 --> 01:11:09.080
really nice.
01:11:09.140 --> 01:11:13.040
But what we could do is, again, we could do some simplifications.
01:11:14.180 --> 01:11:20.080
So what we could do is, if we stay in this area close to curvature 0,
01:11:20.720 --> 01:11:26.980
then we can conclude that also this angle xi is not that large.
01:11:27.640 --> 01:11:32.860
And if it's not that large, we can simplify it and say, okay, we can
01:11:32.860 --> 01:11:37.540
use the standard linear approximations of sine of a small angle and
01:11:37.540 --> 01:11:38.900
cosine of a small angle.
01:11:39.280 --> 01:11:44.520
That means we can assume that cosine of xi of lambda, if xi of lambda
01:11:44.520 --> 01:11:49.840
is small, is, well, almost the same as 1.
01:11:50.580 --> 01:11:54.020
And therefore, we just integrate 1 from 0 to L.
01:11:55.520 --> 01:11:57.240
And of course, this is very simple.
01:11:57.440 --> 01:12:01.540
This is just capital L, so as you can see here.
01:12:02.100 --> 01:12:06.280
And for the y of L, we have to integrate sine of xi of lambda.
01:12:06.420 --> 01:12:11.880
If we assume that xi of lambda is small, close to 0, still close to 0,
01:12:12.240 --> 01:12:16.820
we approximate the sine of xi of lambda by xi of lambda, d lambda.
01:12:17.060 --> 01:12:21.520
Of course, this, again, can be calculated in closed form because xi of
01:12:21.520 --> 01:12:22.560
lambda is a polynomial.
01:12:23.240 --> 01:12:27.440
So if we do it like that, we get, as for y of L, we get this term
01:12:27.440 --> 01:12:30.000
here, like a polynomial of degree 3.
01:12:31.040 --> 01:12:36.140
So what we did is we approximate the positions, x and y positions, of
01:12:36.140 --> 01:12:44.980
the centerline of this closeline by a third-order polynomial.
01:12:44.980 --> 01:12:51.800
And that's, of course, nicer because this is, we can deal with it in a
01:12:51.800 --> 01:12:52.500
nicer way.
01:12:54.720 --> 01:13:01.620
Okay, so now, when we want to use a closeline to model a road like
01:13:01.620 --> 01:13:03.120
this, what do we need?
01:13:03.220 --> 01:13:06.960
Which variables do we need to represent it?
01:13:07.420 --> 01:13:10.240
Well, again, we have a road coordinate system like that.
01:13:10.640 --> 01:13:14.120
We have a vehicle coordinate system like that.
01:13:14.800 --> 01:13:17.140
Of course, again, what we need is a lane width.
01:13:18.580 --> 01:13:22.840
What we need is an initial curvature for the, of the closeline.
01:13:23.320 --> 01:13:25.840
So there's a curvature here at the beginning.
01:13:26.160 --> 01:13:29.080
We need a change of curvature, kappa 1.
01:13:29.540 --> 01:13:31.760
So these two describe the closeline.
01:13:31.940 --> 01:13:34.700
These two describe the centerline here.
01:13:35.600 --> 01:13:39.640
Then, of course, there is, again, a lateral offset between the vehicle
01:13:39.640 --> 01:13:41.040
position and the centerline.
01:13:41.440 --> 01:13:45.740
There is a longitudinal offset that is actually the arc length along
01:13:45.740 --> 01:13:46.320
the closeline.
01:13:47.080 --> 01:13:51.560
And there is still a jaw angle here in the vehicle.
01:13:51.960 --> 01:13:57.960
So now we have a six-parameter state space, b, kappa 0, kappa 1, d
01:13:57.960 --> 01:14:01.900
long, d lat, and psi, with which we can represent the situation.
01:14:02.060 --> 01:14:06.240
So three parameters, b, kappa 0, kappa 1, describe the geometry of the
01:14:06.240 --> 01:14:06.480
road.
01:14:07.040 --> 01:14:11.960
And three parameters, d long, d lat, and psi, to describe the pose of
01:14:11.960 --> 01:14:12.400
the vehicle.
01:14:13.960 --> 01:14:19.060
Of course, you can easily imagine that the whole problem is nonlinear.
01:14:19.160 --> 01:14:27.120
If you want to estimate those parameters from a set of observations,
01:14:27.380 --> 01:14:34.060
we run into a nonlinear system.
01:14:39.640 --> 01:14:44.080
So we have to use nonlinear methods, say an extended Kalman filter,
01:14:44.460 --> 01:14:47.560
unscented Kalman filter, or a nonlinear regression.
01:14:48.480 --> 01:14:53.720
But this is the most general case that is typically used in modeling
01:14:53.720 --> 01:14:54.800
at least highways.
01:14:56.380 --> 01:14:58.100
So here's an example.
01:14:58.420 --> 01:15:02.860
So at least 10 years old, no more than 10 years old, this work.
01:15:03.340 --> 01:15:07.140
So just estimating the geometry of the road.
01:15:07.440 --> 01:15:12.740
Here the width of the road was not estimated, but just the geometry of
01:15:12.740 --> 01:15:19.300
the lane indicated by the geometry of the right lane boundary that
01:15:19.300 --> 01:15:22.560
that is that is visualized here.
01:15:22.800 --> 01:15:27.400
Yeah, the whole thing was estimated with a clause, as we have derived
01:15:27.400 --> 01:15:27.600
it.
01:15:27.780 --> 01:15:29.600
And let's run this.
01:15:31.260 --> 01:15:36.200
And we can see how the shape of this road is estimated.
01:15:36.820 --> 01:15:39.320
Also the curvature of the road is estimated.
01:15:39.520 --> 01:15:44.520
We can now see, of course, the blue line that is shown here is not
01:15:44.520 --> 01:15:47.120
very long.
01:15:47.460 --> 01:15:53.860
Maybe in real world coordinates, maybe it's 20 meters long, because
01:15:53.860 --> 01:15:58.320
for longer distances, the approximation with the clause rate and the
01:15:58.320 --> 01:16:01.160
estimation from an image doesn't work that well.
01:16:01.480 --> 01:16:05.600
But at least for the next 20 or 30 meters, we can estimate the
01:16:05.600 --> 01:16:11.320
geometry of the road rather well, at least for highway scenarios, as
01:16:11.320 --> 01:16:12.380
we can see it here.
01:16:24.010 --> 01:16:29.010
And so bad contrast, of course, is difficult to detect the lane
01:16:29.010 --> 01:16:31.770
markings if the contrast is that bad.
01:16:33.190 --> 01:16:37.590
And now, of course, you will see that the whole thing doesn't work if
01:16:37.590 --> 01:16:42.190
the curvature of the road is too small, because the first thing is you
01:16:42.190 --> 01:16:46.910
don't see the lane markings anymore, because it's out of the visible
01:16:46.910 --> 01:16:48.990
area of the of the camera.
01:16:49.350 --> 01:16:54.870
And also this model doesn't fit very well anymore if we have a very
01:16:54.870 --> 01:16:57.990
highly curved boundaries here.
01:16:58.450 --> 01:17:03.390
But as soon as we are back on the highway, again, it works.
01:17:04.470 --> 01:17:07.130
At least, not that bad.
01:17:11.260 --> 01:17:13.300
Yeah, this
01:17:16.490 --> 01:17:17.270
one, okay.
01:17:18.170 --> 01:17:23.930
So, and this is already everything I want to tell today about the road
01:17:23.930 --> 01:17:24.730
estimation.
01:17:26.710 --> 01:17:31.470
Yeah, so we have now 10 minutes left, and I want to use the 10 minutes
01:17:31.470 --> 01:17:32.870
for the lecture evaluation.
01:17:33.310 --> 01:17:37.830
So you know the story, every lecture is evaluated by you, so you can
01:17:37.830 --> 01:17:39.130
give marks to me now.
01:17:39.890 --> 01:17:42.330
I distribute the evaluation sheets.
01:17:42.430 --> 01:17:49.350
I need one voluntary person that is willing to just go to the
01:17:49.350 --> 01:17:57.130
evaluation office over there and give the envelope to the evaluation
01:17:57.130 --> 01:18:00.170
office who's willing to do that.
01:18:01.170 --> 01:18:04.350
It just takes two or three minutes, but I'm not allowed to do it.
01:18:04.450 --> 01:18:05.050
You will do it?
01:18:05.110 --> 01:18:05.910
Thank you very much.
01:18:06.230 --> 01:18:16.570
Okay, then I will just distribute the evaluation sheets, and I'm
01:18:16.570 --> 01:18:18.910
always happy to get comments.
01:18:19.570 --> 01:18:23.310
Yeah, so if you have some ideas how to improve the lecture or what was
01:18:23.310 --> 01:18:25.850
not good, just write a comment on that.
01:18:26.090 --> 01:18:30.210
I really read it and try to improve it next time, so...