WEBVTT

00:00.800 --> 00:04.360
Hello, I welcome you to the lecture on basic computer science 2.

00:05.100 --> 00:07.980
Today it actually starts on time.

00:08.600 --> 00:09.980
I hope you can understand me in the back.

00:10.020 --> 00:12.300
I have the impression that the microphone is a bit quiet.

00:13.260 --> 00:16.960
If it's too quiet, you have to report, then we'll make it a bit louder

00:16.960 --> 00:17.440
in the hall.

00:18.120 --> 00:26.340
Okay, last time we dealt with various topics, again to the points that

00:26.340 --> 00:26.780
we had made.

00:26.780 --> 00:32.820
We had already looked at the regular languages ​​before and found that

00:32.820 --> 00:35.300
languages ​​can also be characterized by operations.

00:37.000 --> 00:42.800
And we had seen that you actually get something that makes sense as a

00:42.800 --> 00:46.580
description of individual languages.

00:46.580 --> 00:55.000
Such regular expressions that indicate which operations actually have

00:55.000 --> 00:58.140
to be performed to generate a certain language.

01:01.440 --> 01:04.140
And that was the definition of regular expressions.

01:04.320 --> 01:06.600
I had given you some examples.

01:15.760 --> 01:25.240
We had seen that you can construct an automaton for each expression.

01:26.720 --> 01:31.060
These were the examples of how to make automata from individual basic

01:31.060 --> 01:31.580
elements.

01:31.580 --> 01:35.580
To show that an automaton can actually be constructed for each regular

01:35.580 --> 01:35.960
expression.

01:37.920 --> 01:43.360
That is, we have the statement that any language that is defined by a

01:43.360 --> 01:46.840
regular expression can also be accepted by a finite automaton.

01:48.260 --> 01:49.900
You could also do it the other way around.

01:49.900 --> 01:54.760
We have looked at an example where we have seen how you can look at an

01:54.760 --> 01:58.980
automaton, in this case this one with four states, and be able to

01:58.980 --> 02:01.480
derive a regular expression by systematically looking at and analyzing

02:01.480 --> 02:02.520
the individual processes.

02:06.100 --> 02:09.320
You can do this very systematically through an algorithm that I don't

02:09.320 --> 02:12.740
present to you, because that would take too much time for me.

02:12.740 --> 02:18.600
But at least with the automata that we present to you, you can see it

02:18.600 --> 02:19.100
relatively quickly.

02:19.340 --> 02:21.360
If it gets too big, it gets a little more complex.

02:22.120 --> 02:26.220
But here you can see very quickly that you can describe the language

02:26.220 --> 02:28.480
of the automaton through a regular expression.

02:29.180 --> 02:32.360
And with a regular expression you can see which structure the words

02:32.360 --> 02:34.240
have, which words are actually in it.

02:34.560 --> 02:39.360
That means the regular expressions have the advantage that I know

02:39.360 --> 02:40.960
something about the structure of the words.

02:40.960 --> 02:45.000
The finite automata have the advantage that I know something about how

02:45.000 --> 02:48.540
I can actually analyze a word and determine whether it belongs to it

02:48.540 --> 02:49.000
or not.

02:49.640 --> 02:54.580
And with the grammars, which came only afterwards, we then looked at

02:54.580 --> 02:59.380
the type 3 languages, right-linear languages, and found that you can

02:59.380 --> 03:04.040
describe the regular languages or language finite automata very well

03:04.040 --> 03:04.040
with them.

03:04.850 --> 03:08.440
And we found this analogy between automata and grammars.

03:09.060 --> 03:16.580
And then we were able to define a grammar and vice versa with the help

03:16.580 --> 03:16.580
of an automaton.

03:16.820 --> 03:20.180
And then we found that in principle everything is the same language

03:20.180 --> 03:20.480
class.

03:20.900 --> 03:24.620
We only know that there are languages that we cannot recognize or

03:24.620 --> 03:25.500
generate with them.

03:27.080 --> 03:29.660
These were languages that contain these brackets.

03:30.300 --> 03:36.180
And that's why we went to the next chapter, namely to the chapter in

03:36.180 --> 03:37.640
which we deal with cellar automata.

03:38.240 --> 03:44.460
And there we saw that by adding another band, this cellar band, you

03:44.460 --> 03:46.160
are actually able to do more.

03:46.760 --> 03:50.540
In particular, you could actually generate languages such as a, o, n,

03:50.580 --> 03:51.820
b, o, n with such a cellar.

03:53.660 --> 03:56.680
And then we made some further characterizations.

03:58.460 --> 04:07.240
And we saw that there is a language, this language here, which looks

04:07.240 --> 04:12.260
similar, or it is a language of palindromes, in which all words have

04:12.260 --> 04:14.940
the structure that they start with a certain word.

04:16.000 --> 04:19.280
Then comes a c and then comes another word.

04:19.280 --> 04:24.200
The right part of the word is exactly the inversion of the initial

04:24.200 --> 04:26.920
part of the word, so that you can reproduce exactly the operation that

04:26.920 --> 04:28.600
you have in a cellar with it.

04:30.040 --> 04:32.600
We have seen that you can also easily construct an automaton for this,

04:32.900 --> 04:36.620
which can be seen here at the bottom of the automaton with these

04:36.620 --> 04:39.180
relatively few rules that are specified here.

04:39.500 --> 04:41.280
So you can actually recognize such a language.

04:41.280 --> 04:48.600
You can recognize or accept exactly these words with it.

04:50.500 --> 04:54.080
And the next thing was the consideration that there is a language.

04:54.900 --> 04:57.000
And that was the last slide from the last time.

04:57.120 --> 05:01.820
That's why I'm doing it again here as the next one.

05:02.040 --> 05:05.420
Oops, I always have to put this aside for a moment.

05:06.460 --> 05:08.740
And I can now slide it over here again.

05:10.200 --> 05:16.420
I then said, there are the words, or these actual palindromes, they

05:16.420 --> 05:20.100
don't have to have an odd number of characters, but they normally even

05:20.100 --> 05:22.100
have an even number of characters.

05:22.220 --> 05:28.340
The problem is, I don't necessarily know where I have to start to

05:28.340 --> 05:31.860
check whether what I have just seen, what I have written in the

05:31.860 --> 05:37.660
cellar, actually also follows as the next one in reverse order.

05:38.680 --> 05:42.440
And that means, if I don't have such an extra switch point, indicated

05:42.440 --> 05:47.140
by the sign C, as in the language on the previous slide, then I don't

05:47.140 --> 05:50.000
really know where I have to start to check whether it is a palindrome

05:50.000 --> 05:50.460
or not.

05:50.520 --> 05:52.440
I don't know when I got exactly in the middle of the word.

05:53.620 --> 05:55.900
Then I would have to go all the way through, would have to count that

05:55.900 --> 05:58.040
there are n characters, would have to go back to half.

05:58.680 --> 05:59.700
But I can't go back.

05:59.700 --> 06:02.620
I have to read it from left to right, I can only go through it once.

06:03.580 --> 06:07.240
And if I don't do it deterministically, then I can do exactly that.

06:07.960 --> 06:14.640
Then I know, if there is a possibility, a possibility to determine via

06:14.640 --> 06:21.020
a sequence of actions that this is a palindrome, then I can accept it.

06:21.800 --> 06:25.640
And if I say, not deterministically, I allow myself to decide at every

06:25.640 --> 06:27.000
point in this word,

06:33.180 --> 06:41.500
whether I want to continue now or whether I want to turn around, then

06:41.500 --> 06:46.040
I can decide, especially at the middle position, to turn around and

06:46.040 --> 06:49.060
check whether the first part occurs again.

06:49.060 --> 06:54.480
And that would be such a path through the many, if this is our start

06:54.480 --> 07:00.940
configuration, our S0 and some W on it, then there are many

07:00.940 --> 07:04.100
possibilities to continue from there, sequence configuration.

07:04.820 --> 07:08.940
And at some point you have a possibility to go through such a path,

07:10.760 --> 07:15.200
which then actually gives you the answer, is the word in it or not.

07:15.260 --> 07:17.560
And if there is a path that says the word is actually in the

07:17.560 --> 07:19.900
palindrome, then I have accepted that and I don't really care about

07:19.900 --> 07:21.540
all the other paths that don't lead to the goal.

07:24.100 --> 07:29.060
So, just like with the deterministic or finite automata, the expansion

07:29.060 --> 07:34.440
is simply so that we no longer have only one activity, a sequence

07:34.440 --> 07:37.960
state and something that we write in the cellar, but a lot of possible

07:37.960 --> 07:40.220
activities, therefore not deterministically.

07:40.720 --> 07:44.260
And accepting that with such a non-deterministic cellar automaton

07:44.260 --> 07:47.300
happens just like with the finite automata, does not mean

07:47.300 --> 07:52.300
deterministically, there is a sequence, a configuration sequence that

07:52.300 --> 07:53.220
leads to a final state.

07:53.860 --> 07:57.200
And if that is the case, then we are happy and say, aha, the word is

07:57.200 --> 07:59.240
actually an element of the language.

08:00.340 --> 08:03.260
And now we have an example of this.

08:03.400 --> 08:07.140
The automaton that you would have to build to recognize the language

08:07.140 --> 08:13.200
of the palindrome must have the possibility to continue reading at any

08:13.200 --> 08:13.560
position, i.e.

08:13.580 --> 08:18.080
to continue writing things in, or to check whether what is in it now

08:18.080 --> 08:19.300
comes back in reverse order.

08:20.480 --> 08:25.040
That means, what we do is very simple.

08:25.180 --> 08:30.520
We basically have the automaton as before, only we say at every point,

08:31.280 --> 08:38.680
so here we have the possibility from S1, the input of a sign E and

08:38.680 --> 08:42.060
something on top of the cellar.

08:42.760 --> 08:47.280
So there is already something in here, on top of it is now some K, and

08:47.280 --> 08:50.060
if we now read an E, then we just write the E on top.

08:51.000 --> 09:00.200
And now we can, if we have E and E here, i.e.

09:00.240 --> 09:06.820
if the sign is the same, that was if K is not equal to E, if a new

09:06.820 --> 09:08.900
sign comes, then it can not be a point of conversion.

09:09.440 --> 09:13.840
But if the next sign is the same as the one that is already on it, i

09:13.840 --> 09:13.920
.e.

09:14.000 --> 09:17.840
we read on our input tape, we are just in a position, there is also an

09:17.840 --> 09:21.960
E, and then we know, aha, these two are identical here, that could be

09:21.960 --> 09:22.420
a point of conversion.

09:23.380 --> 09:28.520
So you say, okay, then I write, I have both possibilities, I stay in

09:28.520 --> 09:33.860
the state and just write the E on top, or I say, I'll just try it and

09:33.860 --> 09:39.980
throw it, so I switch to check, throw the E that was on top of it away

09:40.880 --> 09:42.860
and go into a new state.

09:43.000 --> 09:46.560
I have to go into a new state, because I decide, now I want to check

09:46.560 --> 09:50.460
if everything that comes now is actually exactly the same in the

09:50.460 --> 09:50.640
cellar.

09:51.100 --> 09:58.060
That is, in the state S2 I only have transitions in which I can do

09:58.060 --> 10:03.040
something, as long as the sign and the input are the same as what is

10:03.040 --> 10:03.520
on top of the cellar.

10:04.300 --> 10:06.540
That is always thrown away, I stay in S2.

10:07.160 --> 10:12.540
And if I actually get to the point where the cellar is empty during

10:12.540 --> 10:17.120
this constant check, that means, no, here, there, that I am in a

10:17.120 --> 10:24.880
state, that I have come back to K0, that means, I see K0 again, all

10:24.880 --> 10:29.100
the signs that were up here, K0 is always at the bottom, the lowest

10:29.100 --> 10:33.860
sign, everything I wrote on top is at this passage of the state S2,

10:34.340 --> 10:38.040
via this only possibility to continue there, that always the same

10:38.040 --> 10:40.020
signs were there, as on the input tape.

10:40.780 --> 10:43.220
I threw everything out again, landed back at K0.

10:43.860 --> 10:46.940
At that moment I say, now I'm doing a lambda transition, so I don't

10:46.940 --> 10:53.700
read on, but go into a state S3 and leave the cellar intact, of

10:53.700 --> 10:53.700
course.

10:54.300 --> 10:56.780
From S3 there are no further transitions.

10:57.000 --> 11:02.040
So if my input tape is now fully processed, did I have a palindrome?

11:02.360 --> 11:07.040
If it is not fully processed, then it was a dead end, I'm not done, I

11:07.040 --> 11:07.760
switched somewhere too early.

11:10.380 --> 11:13.820
And yes, these are actually the only possibilities.

11:14.120 --> 11:18.620
If I am in a state S2 and I read something different on the cellar

11:18.620 --> 11:23.500
than what I just have on the input tape, at that moment I can't go on,

11:23.800 --> 11:27.940
because I just want to check here, is something identical?

11:28.160 --> 11:30.280
If it is not identical, the machine stops.

11:30.760 --> 11:36.600
That was also a path through the configuration, which did not lead to

11:36.600 --> 11:36.920
success.

11:37.480 --> 11:43.260
That means we have the opportunity to switch at the right place, by

11:43.260 --> 11:45.140
installing a real non-determinism here at this one point.

11:46.640 --> 11:53.040
We allow to switch at any point to this state S2 and check whether

11:53.040 --> 11:54.200
what comes is identical.

11:54.640 --> 11:57.740
This is what is usually done here, as it stands here.

11:58.120 --> 12:02.960
We already had this before, but we didn't have the opportunity to

12:02.960 --> 12:03.760
switch at any point.

12:04.720 --> 12:08.720
It was only possible to switch when we saw the sign C.

12:09.840 --> 12:13.480
And we don't have that here, so it's not deterministic.

12:14.200 --> 12:15.120
So that's it.

12:15.120 --> 12:21.300
The fact that you can recognize palindromes in this way, is obvious,

12:22.400 --> 12:26.640
because only then, when something is identical to what was already

12:26.640 --> 12:28.720
inside, will we actually come to the end state.

12:30.260 --> 12:34.580
So this is a possibility, this language we saw, obviously we can't

12:34.580 --> 12:35.420
recognize it deterministically.

12:37.860 --> 12:42.740
The language we can't recognize deterministically, we can still

12:42.740 --> 12:44.160
recognize it non-deterministically.

12:44.160 --> 12:54.240
We have a language class of languages that are accepted by

12:54.240 --> 12:58.200
deterministic cellar automata, and a language class that are accepted

12:58.200 --> 13:00.220
by non-deterministic cellar automata.

13:01.500 --> 13:05.640
And it's obvious that we have a real relation between the two.

13:05.680 --> 13:11.000
So we saw at least this one language of palindromes can't be accepted

13:11.000 --> 13:13.700
by deterministic but by non-deterministic cellar automata.

13:14.680 --> 13:17.520
And with that it's a real extension.

13:18.160 --> 13:22.500
Unlike the final automata, the transition from deterministic to non

13:22.500 --> 13:29.640
-deterministic was not a principled change of the expressiveness of

13:29.640 --> 13:30.600
the acceptance behavior.

13:31.120 --> 13:33.460
Here, non-determinism is actually something new.

13:34.440 --> 13:37.380
And this is shown by this one example.

13:38.660 --> 13:43.940
Now the question is whether there are languages that can't possibly be

13:43.940 --> 13:44.980
accepted by cellar automata.

13:46.560 --> 13:53.640
We asked ourselves the same question There is a language that can't be

13:53.640 --> 13:55.400
accepted by final automata.

13:56.260 --> 13:58.180
We can accept it now.

13:58.340 --> 14:00.540
Could there be something else?

14:01.060 --> 14:04.860
And surprisingly there is actually such a language.

14:05.780 --> 14:11.580
So this language a-n-b-n-c-n can't be recognized by a cellar

14:11.580 --> 14:12.060
automaton.

14:13.380 --> 14:14.820
Why is that so?

14:14.840 --> 14:22.420
Imagine you have a word consisting of the same number of a's, b's and

14:22.420 --> 14:23.220
c's.

14:23.340 --> 14:26.300
So here everything is a's, there b's, there c's.

14:27.340 --> 14:35.560
Now you read the beginning you read an a you read a whole time a's and

14:35.560 --> 14:36.900
you write them all into the cellar.

14:37.460 --> 14:45.920
Then you have all the a's and our beautiful k0 and now comes the first

14:45.920 --> 14:46.260
b.

14:47.400 --> 14:52.180
When the first b comes you have to check if what was in the cellar is

14:52.180 --> 14:54.160
actually as many a's as b's.

14:54.580 --> 14:55.900
We know how to do that.

14:56.640 --> 15:00.760
And then I would see if there are as many a's as b's.

15:01.400 --> 15:06.720
And that means after a while I ended up in the situation that I have

15:06.720 --> 15:09.560
here in the cellar where the k0 is.

15:09.560 --> 15:14.280
All the a's flew out because I realized there were as many b's as

15:14.280 --> 15:14.280
there were.

15:14.360 --> 15:16.860
And now I'm in this situation where a c comes.

15:17.520 --> 15:23.140
Now I would have to check if there are as many c's as there were a's

15:23.140 --> 15:23.720
and b's before.

15:24.940 --> 15:25.960
That was difficult.

15:26.820 --> 15:29.040
I just threw out what I had as a marker.

15:30.200 --> 15:31.000
All the a's flew out.

15:32.440 --> 15:33.880
And I couldn't remember them.

15:35.260 --> 15:36.920
So you see there's a problem.

15:37.360 --> 15:39.500
I can't build up random numbers here.

15:39.560 --> 15:42.780
I can't say I remember the number 17 if I want to recognize a to the

15:42.780 --> 15:44.200
power of 17, b to the power of 17, c to the power of 17.

15:45.040 --> 15:50.840
I remember the number 17 and then I count several times from 17 to 0.

15:51.940 --> 15:54.100
That doesn't work with this cellar.

15:55.600 --> 16:00.900
We would need a bigger cellar where we could write all the a's and

16:00.900 --> 16:01.560
b's.

16:02.640 --> 16:08.720
If we had another cellar where we could write the b's and then we

16:08.720 --> 16:09.320
could throw out both.

16:10.960 --> 16:12.180
That would be possible.

16:12.480 --> 16:13.540
But then we would have two cellars.

16:14.900 --> 16:19.400
And you will see later that if we have two cellar bands we can do a

16:19.400 --> 16:20.560
lot more.

16:20.560 --> 16:23.780
Then we are in an area that we will later describe with Turing

16:23.780 --> 16:24.080
machines.

16:25.320 --> 16:29.740
And that means another cellar band would increase the possibilities we

16:29.740 --> 16:34.880
have for order of magnitude, for a very large area.

16:35.640 --> 16:36.920
That means it's not that easy.

16:37.380 --> 16:41.940
So that was a short intuitive explanation why there can't be a cellar

16:41.940 --> 16:42.760
machine for this language.

16:43.920 --> 16:46.300
But that was purely intuitive.

16:46.960 --> 16:52.280
We can say right away he can do something else if I throw out

16:52.280 --> 16:52.300
something.

16:52.320 --> 16:56.420
I could always throw out two, write in two a's.

16:57.520 --> 17:03.960
I counted two n's and then with the b's and c's I have to determine if

17:03.960 --> 17:07.160
I have the same number of b's and c's.

17:07.560 --> 17:10.040
b to the power of n, c to the power of n, that's two n elements.

17:10.440 --> 17:11.380
That could work.

17:11.780 --> 17:13.080
It's not that easy.

17:13.080 --> 17:15.800
You always have to be critical when someone tries to make something

17:15.800 --> 17:16.200
plausible.

17:16.800 --> 17:17.880
There could be a catch.

17:18.700 --> 17:21.600
The argument I just had is not enough.

17:21.640 --> 17:26.140
It's plausible and intuitive but be careful when someone tells you

17:26.140 --> 17:26.560
something like that.

17:27.280 --> 17:28.640
I will prove it to you later.

17:29.680 --> 17:36.360
And then we will see that this language of the machines or the class

17:36.360 --> 17:40.000
of languages that are accepted by non-deterministic cellar machines is

17:40.000 --> 17:45.340
actually a real part of all languages that can be formally defined.

17:46.580 --> 17:49.620
This is a formal definition of a language.

17:50.220 --> 17:52.620
If it's not in there, there is more.

17:53.200 --> 17:54.920
We will take a closer look at that.

17:55.580 --> 17:58.740
First of all, we know that we have a certain hierarchy.

17:59.240 --> 18:00.060
It looks pretty long.

18:00.980 --> 18:02.580
Here in the front, it's identical.

18:03.340 --> 18:06.600
Here we have a real content for the deterministic cellar machines.

18:06.720 --> 18:08.840
There we have one for the non-deterministic cellar machines.

18:09.980 --> 18:14.520
And here we have content for even higher language classes.

18:15.040 --> 18:18.280
It may be that this is irrelevant because we don't need a language

18:18.280 --> 18:19.320
like A, B or C.

18:20.240 --> 18:26.440
Something where we have a box and then something in the same number.

18:26.920 --> 18:27.580
Maybe we don't even need that.

18:29.300 --> 18:34.780
And if you look at how it looks with the effort for language

18:34.780 --> 18:35.180
recognition.

18:35.600 --> 18:39.660
You remember when I said we want to determine the ability of a formal

18:39.660 --> 18:44.160
model regarding the formal description of programming languages.

18:45.120 --> 18:47.140
Whether they are expressive enough.

18:48.100 --> 18:50.780
Can all that I need be described in a programming language?

18:51.760 --> 18:58.520
And the other thing was whether I can still efficiently check the

18:58.520 --> 18:59.620
syntactic correctness.

18:59.720 --> 19:02.600
Whether a word actually belongs to the language or not.

19:02.600 --> 19:04.360
Or whether a program belongs to the language.

19:05.300 --> 19:08.300
And here we can see that there is a certain difference in language

19:08.300 --> 19:08.300
recognition.

19:08.900 --> 19:12.960
With the deterministic cellar machines the effort is certainly linear

19:12.960 --> 19:13.600
in the length of the words.

19:13.840 --> 19:18.420
With every sign that we read or we read the 3 through.

19:18.960 --> 19:25.420
If we read a word on the input then we read the beautiful successive

19:25.420 --> 19:25.860
through.

19:26.000 --> 19:30.920
We can of course stop somewhere and make lambda transitions.

19:31.800 --> 19:35.100
But such a lambda transition we can only go through a certain number

19:35.100 --> 19:37.540
of states and then we're done.

19:38.580 --> 19:42.220
That's just a finite number of possibilities to make transitions.

19:42.820 --> 19:48.300
I can't do something that depends on the length of the word is

19:48.300 --> 19:48.840
influenced.

19:49.500 --> 19:53.820
What I can do here with lambda transitions in the middle is only

19:53.820 --> 19:57.240
dependent on the number of states I have and the number of different

19:57.240 --> 19:58.660
signs that I have in the cellar alphabet.

19:59.500 --> 20:01.420
That means I can't do a lot.

20:01.660 --> 20:03.700
I can do a lot internally.

20:03.980 --> 20:06.840
We'll see later how we're going to do that.

20:07.660 --> 20:14.280
But I still have a constant effort per sign.

20:14.640 --> 20:19.080
If I make a constant number of steps per sign that means I'm linear in

20:19.080 --> 20:19.580
the word length.

20:22.580 --> 20:25.820
effort is capital O of length of W.

20:25.820 --> 20:28.600
That's different with non-deterministic cellar automata.

20:30.420 --> 20:33.360
There I can try as many possibilities as I want.

20:34.180 --> 20:39.680
With the finite non-deterministic automata it was still possible in

20:39.680 --> 20:40.940
linear time.

20:41.420 --> 20:48.620
But here it's different because here with the non-deterministic cellar

20:48.620 --> 20:58.360
automata we have different possibilities to look through different

20:58.360 --> 20:59.240
cellar content.

21:00.420 --> 21:03.900
And the different possibilities to store something in the cellar with

21:03.900 --> 21:08.660
the different threads this effort is clearly bigger.

21:09.740 --> 21:14.800
You could be afraid if I do something with backtracking then I have

21:14.800 --> 21:18.860
something that goes from polynomial to exponential very quickly.

21:19.340 --> 21:20.480
I can calm it down.

21:20.620 --> 21:22.000
The effort is not too high.

21:22.320 --> 21:24.040
The effort goes to O of n to the power of 3.

21:24.340 --> 21:26.080
You can't see that yet.

21:27.080 --> 21:28.780
You will see it when we go a bit further.

21:29.200 --> 21:36.180
Then I'll show you a method how to check a non-deterministic cellar

21:36.180 --> 21:40.400
automata in any context for the word problem.

21:40.600 --> 21:43.540
That works with n to the power of 3 but n to the power of 3 is of

21:43.540 --> 21:44.840
course a pretty high effort.

21:45.840 --> 21:48.880
That means it grows quite a lot.

21:48.960 --> 21:54.500
If you have a growth curve that goes up depending on the number of

21:54.500 --> 21:56.780
characters in the program that is not cheap.

21:57.220 --> 22:02.460
What we would like is that our programming languages can be described

22:02.460 --> 22:04.600
by deterministic cellar automata.

22:05.900 --> 22:07.340
And that is the case.

22:07.940 --> 22:10.780
Everything that is relevant in programming languages can be recognized

22:10.780 --> 22:14.740
with deterministic cellar automata or can be generated with

22:14.740 --> 22:16.420
corresponding grammar.

22:19.320 --> 22:21.180
You don't know yet what kind of grammar types these are.

22:22.100 --> 22:23.200
But we'll get to that in a moment.

22:24.080 --> 22:27.840
We'll talk about O of n to the power of 3 a bit later.

22:29.340 --> 22:35.720
At the end of this part we'll talk about cellar automata and then

22:35.720 --> 22:38.060
we'll talk about context-free languages.

22:38.240 --> 22:41.620
We have to do the following now.

22:53.560 --> 22:56.720
We want to see how we can generate languages.

22:57.440 --> 23:01.560
On the automaton side our goal would be deterministic cellar automata.

23:02.400 --> 23:05.920
But we want to not only accept languages but also be able to generate

23:05.920 --> 23:06.160
them.

23:06.600 --> 23:09.420
We want to know how we have to write the programs.

23:10.180 --> 23:13.120
I don't know how to write a program when I look at a cellar automaton.

23:14.040 --> 23:15.000
I need a grammar for that.

23:15.960 --> 23:17.400
Back to our grammars.

23:20.560 --> 23:22.600
We already looked at the type 3 grammars.

23:23.560 --> 23:25.600
Now we'll look at the context-free grammars.

23:27.980 --> 23:29.700
Grammar is a type 2 grammar.

23:29.960 --> 23:32.980
I defined it at the beginning when I presented the Chomsky hierarchy.

23:34.780 --> 23:44.380
If we want to see a terminal sign in a word in a line and there is a

23:44.380 --> 23:48.440
non -terminal sign then the rules of the context-free grammar are

23:48.440 --> 23:55.340
designed in such a way that regardless of what is written here we can

23:55.340 --> 24:00.060
replace the A if there is a rule with a C.

24:02.160 --> 24:03.960
Left and right are the same.

24:03.960 --> 24:09.320
Here we have the usual derivation step because we apply a direct rule.

24:10.360 --> 24:11.260
A can be replaced by C.

24:12.300 --> 24:18.220
You remember in the context-free grammars we allowed C to be empty.

24:18.520 --> 24:21.220
That means A can be thrown out.

24:21.760 --> 24:24.560
It was a small thing we can do without.

24:25.260 --> 24:25.860
But it's just written here.

24:26.960 --> 24:29.660
In general we are not allowed to have an empty word on the right.

24:30.380 --> 24:34.980
This grammar is therefore context-free because we can replace it

24:34.980 --> 24:39.700
regardless of the context of a non-terminal sign as long as there is a

24:39.700 --> 24:40.400
suitable rule.

24:41.240 --> 24:45.220
That was in contrast to the context-sensitive rules where non-terminal

24:45.220 --> 24:48.080
signs could only be replaced by a certain context.

24:49.000 --> 24:56.220
There we always had a 4.1 and a 4.2 which had to correspond to each

24:56.220 --> 24:56.220
other.

24:57.160 --> 24:59.100
But this is something we don't need in context-free grammar.

25:00.880 --> 25:02.980
Examples are very simple.

25:03.260 --> 25:05.260
Here we have the derivation again.

25:05.860 --> 25:09.220
What I drew up there is exactly what is written here.

25:09.900 --> 25:14.520
This would be a derivation from one word to the next word.

25:14.600 --> 25:16.660
I replace A with C.

25:17.580 --> 25:24.200
And if I create a simple example I take the simplest example for the

25:24.200 --> 25:26.480
one language we were interested in.

25:26.760 --> 25:28.660
The language A to B to N.

25:29.580 --> 25:33.920
We know that we can characterize any brackets in any way.

25:34.900 --> 25:39.640
We need a very small variation of our right-linear grammar.

25:40.280 --> 25:42.280
In this case we even have a linear grammar.

25:43.180 --> 25:47.460
This is even more special because we still only have one non-terminal

25:47.460 --> 25:49.280
sign on the right side.

25:49.800 --> 25:53.840
So, S can be replaced by A, S, B.

25:54.000 --> 26:01.180
That means I leave S and write A and B Or I simply delete S.

26:03.200 --> 26:06.060
This way I can create from the starting symbol.

26:09.240 --> 26:12.560
I replace S again and get A, A, B, B.

26:12.740 --> 26:15.680
I can replace it again and get A, A, A, S, B, B, B.

26:15.680 --> 26:21.520
I can replace S as much as I want or I can delete it and get A, A, B,

26:21.540 --> 26:21.900
B.

26:23.500 --> 26:27.240
It's obvious that I can only create such words and nothing else.

26:27.400 --> 26:28.140
It's very simple.

26:28.520 --> 26:30.400
It's even a special case of context-free languages.

26:30.680 --> 26:33.240
I don't need all the power of context-free languages to be able to

26:33.240 --> 26:34.940
create this simple language.

26:37.660 --> 26:40.920
It's clear that it's possible.

26:41.100 --> 26:46.160
It's obviously not right- or left-linear but it's linear and in this

26:46.160 --> 26:47.640
sense context-free.

26:48.400 --> 26:56.800
If I now imagine any grammar I want to make the derivations I've shown

26:56.800 --> 27:00.860
here a bit easier.

27:01.400 --> 27:03.460
We'll think about it in a moment.

27:03.560 --> 27:10.860
The problem is when I have a word form several different non-terminal

27:10.860 --> 27:11.260
characters appear.

27:11.260 --> 27:17.700
Then I could replace any non-terminal character at any point and I

27:17.700 --> 27:22.580
would get a sequence of all possible words that are derived one after

27:22.580 --> 27:24.300
the other by applying the rules of grammar.

27:24.780 --> 27:26.880
But there's very little structure.

27:27.420 --> 27:30.180
What we always want is to recognize a structure.

27:31.840 --> 27:37.500
Programming languages also have a certain structure and we like to

27:37.500 --> 27:37.900
represent it.

27:37.900 --> 27:44.140
So, if we want to see structures in a derivation, we do it with a so

27:44.140 --> 27:45.300
-called derivation tree.

27:45.960 --> 27:48.920
We have a tree here and we had a simple derivation sequence.

27:49.600 --> 27:51.340
So, how does the tree come about?

27:52.020 --> 27:58.580
We usually start with the starting symbol S.

27:59.460 --> 28:02.120
Then we can derive something from it.

28:03.120 --> 28:06.400
We can apply a rule S to anything.

28:07.860 --> 28:11.100
So, for example, let's say

28:15.580 --> 28:20.080
U1, U2, and so on up to Uk.

28:22.080 --> 28:30.340
To illustrate this, I have a rule in my grammar S to U1 up to Uk.

28:31.200 --> 28:32.200
That's my rule.

28:33.140 --> 28:34.380
I'm going to make a tree out of it.

28:36.040 --> 28:42.100
I can replace S with the sequence U1 up to Uk.

28:42.600 --> 28:45.580
The Uk would not be crossed out, but only lead to it.

28:45.900 --> 28:46.940
I drew a small tree.

28:47.820 --> 28:50.620
On top is S at the root, and below is U1 up to Uk.

28:51.240 --> 28:56.720
It may be that one of them is a non-terminal sign and there's another

28:57.920 --> 28:58.480
rule for that.

28:58.480 --> 29:11.020
So, I could replace U2 with a sequence of let's say U1 dash or U1 dash

29:11.020 --> 29:15.160
and so on up to U R dash.

29:15.380 --> 29:23.360
We can make a sequence of signs because we have a rule that says that

29:24.560 --> 29:31.600
U2 can be derived from U1 dash and so on up to U R dash.

29:31.700 --> 29:34.540
If the rule was in the grammar, we would write it down here.

29:34.720 --> 29:37.440
It's written in a more general form.

29:37.880 --> 29:46.120
If we have a word P1 A P2 in our sequence, we turn a rule on it A to

29:46.120 --> 29:57.920
Psi and every sign from P1 A P2 corresponds to exactly a leaf node of

29:57.920 --> 29:58.940
the previously generated tree.

29:59.880 --> 30:05.320
What we have seen here are the leaf nodes and for every sign XI is

30:05.320 --> 30:17.400
written as X1 to XK and for every sign X1 to XK we generate new nodes

30:17.400 --> 30:24.540
son nodes, daughter nodes or successor nodes and these nodes I will

30:24.540 --> 30:27.280
simply label with these signs in there.

30:28.060 --> 30:29.920
That's exactly what I indicated up here.

30:30.660 --> 30:35.760
That was U1 to UK or U1 dash to U R dash and here are the X1 to XK.

30:36.360 --> 30:40.940
And if we have Psi equals Lambda on the right side then we also

30:40.940 --> 30:43.000
generate a successor even though there is no successor.

30:43.640 --> 30:46.040
But the A is replaced by an empty word.

30:46.160 --> 30:51.140
That's why we write if we have such a rule then we would in such a

30:51.140 --> 30:57.600
tree with something on A the Lambda would arise.

30:58.320 --> 30:59.640
Lambda is an empty word.

30:59.720 --> 31:04.340
That means the word we generate is something we can read from a leaf

31:04.340 --> 31:04.680
node.

31:05.540 --> 31:06.660
We will see that in an example.

31:07.840 --> 31:13.220
The leaves of the result tree from left to right give the word W that

31:13.220 --> 31:15.700
we derive from our starting symbol.

31:15.700 --> 31:17.920
I can show you that with this simple example.

31:19.680 --> 31:27.400
We arrived at a point from S we have through a sequence of derivations

31:27.400 --> 31:38.040
using the rules of grammar a word I1A I2 and now the next step is to

31:38.040 --> 31:55.560
get I1C I2 and that is I1 in this case X1 to Xk and then I2 I1 and I2

31:55.560 --> 32:01.140
are the symbols on the leaves in the left and right part of the tree

32:01.140 --> 32:08.900
next to the A and to the A we added the symbols X1 to Xk I can leave

32:08.900 --> 32:12.980
it like this and at some point I don't have any non-terminal symbols

32:12.980 --> 32:18.380
on the leaves of the tree so I have on all the leaves either a

32:18.380 --> 32:23.500
terminal symbol or the empty word lambda and then I just read the word

32:23.500 --> 32:27.760
from left to right so I have an arrangement of the leaves from left to

32:27.760 --> 32:32.540
right and then I have the word that is generated here.

32:33.720 --> 32:40.960
In this case if we have here from S to ASB what I just showed you if

32:40.960 --> 32:47.380
we draw this as a tree then it looks like this so we have the S

32:48.240 --> 32:56.020
directed to ASB that was the first step the next step that says AASBB

32:56.020 --> 33:04.100
that is what we see here AASBB that is the next word and then the

33:04.100 --> 33:10.300
third word that is generated here AASBB that would be this if we run

33:10.300 --> 33:17.700
over here that would be our next and if I want to draw it differently

33:18.170 --> 33:22.600
then I would draw it like this so that I just write them next to each

33:22.600 --> 33:28.540
other there we see the structure exactly so we replaced our S four

33:28.540 --> 33:34.080
times four times three times by ASB one time by empty word that is a

33:34.080 --> 33:40.380
very simple tree because we only have a single symbol I will give you

33:40.380 --> 33:45.780
another example we want to do something that looks like a program you

33:45.780 --> 33:49.300
remember that we looked at a Bacchus-Nower form so if we want to

33:49.300 --> 33:54.280
define a procedural language in which some loop constructs occur for

33:54.280 --> 33:58.280
example something that looks like a repeat statement because you only

33:58.280 --> 34:01.060
know Java you don't even know what a repeat statement is it is

34:01.060 --> 34:06.220
basically a while loop where the condition is at the end and not at

34:06.220 --> 34:11.400
the beginning and it is just formulated like this that is a structure

34:11.400 --> 34:15.100
I can now describe it as a Bacchus-Nower form I can also describe it

34:15.100 --> 34:18.520
as a syntax diagram so that I have to include it in a block with

34:18.520 --> 34:23.440
beginning and end so that I have a statement sequence in there that a

34:23.440 --> 34:30.340
statement sequence a sequence of statements is divided or separated by

34:30.340 --> 34:37.760
some semicolon a single statement is either an instruction or a block

34:37.760 --> 34:42.640
or a repeat statement and as a repeat statement I have a repeat, an

34:42.640 --> 34:47.120
until, a statement sequence and a printout and the printout is some

34:48.150 --> 34:52.260
printout, some e whatever that means you can see the structure right

34:52.260 --> 34:56.920
away and now we want to write it down as a grammar this is a short

34:56.920 --> 35:02.480
version of grammar our rules look like that we can represent a block

35:03.640 --> 35:10.400
and this A can be either a block or a sequence of blocks and an

35:10.400 --> 35:19.340
instruction and the C this B can be on an A, on an S or C so it can

35:19.340 --> 35:24.820
have a loop and so on you can see the rules of grammar with which we

35:24.820 --> 35:29.900
want to write simple programs these are our variables a little

35:29.900 --> 35:34.240
shortened here with S, A, B, C, D and A, B, C, D should just stand

35:34.240 --> 35:38.140
here for this block, statement, sequence statement, repeat statement

35:39.020 --> 35:44.180
expression and the spec end, rep, and should be these key words

35:44.180 --> 35:49.960
beginning, end, repeat and until if we look at a word this word here

35:50.720 --> 36:00.320
of the language is a sequence of such constants of our alphabet yes if

36:00.320 --> 36:05.080
I go back here to the previous slide there is our terminal alphabet

36:05.940 --> 36:12.420
spec end, rep, and A, E and a semicolon and if I look at this I want

36:12.420 --> 36:15.840
to check is this really a word of the language or can I create it from

36:15.840 --> 36:21.400
the start symbol then we see we can create it from the start symbol by

36:21.400 --> 36:28.980
starting here with the S the one rule, S replaced by spec end and then

36:28.980 --> 36:35.300
we can pass the A in B semicolon A we can now replace the B for

36:35.300 --> 36:43.720
example by an A could have been replaced differently the semicolon A B

36:45.120 --> 36:56.240
C C D and so on this is the application of rules of grammar and these

36:56.240 --> 37:04.320
rules allow us to create such a word but if you only see this word you

37:04.320 --> 37:09.100
see much less than if you look at the terminal and here in the

37:09.100 --> 37:13.740
terminal you can see the structure of our program that we have

37:13.740 --> 37:18.800
formulated here the interesting thing is this representation of the

37:18.800 --> 37:25.120
terminal can lead to different terminal sequences or have been created

37:25.120 --> 37:32.760
from different terminal sequences if I replace the B by A or the A by

37:32.760 --> 37:39.280
B we don't really care as long as at some point I get to the terminal

37:39.280 --> 37:46.400
sequence and now we want to do this systematically and systematically

37:46.400 --> 37:55.700
means for example a right derivation I have my word that I derive and

37:55.700 --> 38:00.460
there is something in here this part here is something that is already

38:00.460 --> 38:08.760
from P star and there is some sign the rightmost sign and here is some

38:08.760 --> 38:14.260
U the rightmost sign I always want to replace that means I always look

38:14.260 --> 38:17.900
at my derivation that I systematically always take the rightmost not

38:17.900 --> 38:23.180
terminal sign then I have a right derivation at some point I do the

38:23.180 --> 38:28.180
others but I do it systematically from right to left I could also do

38:28.180 --> 38:32.180
it the other way around I could always take the leftmost sign and

38:32.180 --> 38:36.120
always replace that I can also take any other derivation but that

38:36.120 --> 38:42.440
would be two standard procedures always the rightmost not terminal

38:42.440 --> 38:47.900
sign or always take the leftmost these are standard derivations I can

38:47.900 --> 38:51.200
generate as many as I want and now you have to think what do we want

38:51.200 --> 38:58.600
to do later when we have a program we want to find out if there is a

38:58.600 --> 39:05.780
derivation tree from which this program was created and we do that by

39:09.000 --> 39:14.640
reconstructing a derivation from our word and we assume that we want

39:14.640 --> 39:21.200
to reconstruct a right derivation we can do that systematically we

39:21.200 --> 39:28.340
will see how to do that another example another grammar something we

39:28.340 --> 39:33.540
always need in programming languages arithmetic expressions we want to

39:33.540 --> 39:36.640
be able to calculate something we need the possibility to write some

39:36.640 --> 39:42.560
arithmetic expressions for example like this a plus b times a plus c

39:42.560 --> 39:46.580
in brackets we have certain structures we know how to combine the

39:46.580 --> 39:54.720
elements we have variables operation plus times that is enough we have

39:54.720 --> 40:00.840
brackets and we can do that in different ways we could do that with a

40:00.840 --> 40:06.840
simple grammar where I only need a single variable a single non

40:06.840 --> 40:11.080
-terminal sign we could do that with an s here it is a bit more

40:11.080 --> 40:14.800
complicated because we want to have more structure we say that every

40:14.800 --> 40:24.120
arithmetic expression is either a term or a sum of terms therefore t

40:24.120 --> 40:30.740
stands for term so I have a sum those are two terms that I add and the

40:30.740 --> 40:35.700
s can be any arithmetic expression and if I look at a term in a sum

40:35.700 --> 40:46.360
this can be a factor or a factor times a term here we see a structure

40:46.360 --> 40:53.540
here we have a product this can be a term a factor in there can be

40:53.540 --> 41:04.340
either an i, an identifier or it can also be a bracket so here the

41:04.340 --> 41:09.740
bracket s bracket so if I open and close brackets there can be any

41:09.740 --> 41:16.800
expression in there that is what is here at this point and then of

41:16.800 --> 41:22.940
course I can replace any identifier with any sign quite simply with a,

41:23.020 --> 41:26.860
b, c you could take any words we had the example of how to define

41:26.860 --> 41:32.620
names in programming languages that was our own grammar here we just

41:32.620 --> 41:38.620
have the brackets a, b, c that is enough so these are our let me move

41:38.620 --> 41:45.500
this a bit further I have a question by the way there are relatively

41:45.500 --> 41:47.980
few people who are involved in this you don't have to be involved in

41:47.980 --> 41:57.260
these tools I can ask you do you think it makes sense that I use these

41:57.260 --> 42:01.820
tools who is in favor of using these tools these feedback tools in the

42:01.820 --> 42:07.040
lecture who thinks it makes sense aha, ok if there are so many then

42:07.040 --> 42:12.480
that is enough then I will continue well not if you don't think it

42:12.480 --> 42:15.940
makes sense but it is enough if I have a number of approving messages

42:16.680 --> 42:22.140
and immediately four people reported that everything is fine so this

42:22.140 --> 42:28.420
is our grammar and you can see the structure we have a term, we have

42:28.420 --> 42:32.860
factors, we have identifiers and we can of course derive something

42:32.860 --> 42:39.580
oops, I didn't do that but it is easy to see if I start with s and I

42:39.580 --> 42:43.380
want to generate this expression how would I do that I would start

42:43.380 --> 42:54.080
with s derive something s plus t and then I could and so on derive

42:54.080 --> 42:56.880
more and more but that is too cumbersome for me I prefer to do it as a

42:56.880 --> 43:06.780
derivation tree that is easier from the s I can derive my s plus oops

43:06.780 --> 43:15.100
and then a t from the s if the a should come out I can derive a t I

43:15.100 --> 43:22.840
can derive an f I can derive an i I can derive an a that is one

43:22.840 --> 43:31.220
sequence I can continue with the plus from the t I can derive f times

43:31.220 --> 43:44.500
t and so on we can see from the f could be an i and from that a b that

43:44.500 --> 43:53.120
was a star and from the t could be for example an f and from that a

43:53.120 --> 44:00.060
on, s and close, and so on so you can see how the trees are generated

44:00.060 --> 44:08.460
because of this multiple variables s, t, f and i the paths in this

44:08.460 --> 44:13.420
tree become a bit long you can see the structure and at the end you

44:13.420 --> 44:17.420
can read your expression you can do this as an exercise and you will

44:17.420 --> 44:25.140
have to do it well now it is about how I can relate the context-free

44:25.140 --> 44:34.640
grammars to the cellar automata and we will see I can create a non

44:34.640 --> 44:41.420
-deterministic cellar automaton that accepts the same language as this

44:41.420 --> 44:50.980
grammar and I am impressed 12 people said it is ok how do we do this?

44:51.420 --> 45:01.360
first that we can create a context-free grammar that is a bit

45:01.360 --> 45:06.720
complicated a bit formal stuff I will leave it it takes too long then

45:06.720 --> 45:10.720
that I create a non-deterministic cellar automaton for each grammar it

45:10.720 --> 45:15.000
is relatively easy I can show you how to formulate the lambda

45:15.000 --> 45:25.340
transitions and we will do it as shown we have our context-free

45:25.340 --> 45:29.280
grammar that has non-terminal signs, terminal signs, rules and a

45:29.280 --> 45:34.060
starting symbol and now we want to create a cellar automaton that

45:34.060 --> 45:41.100
characterizes the same language how many people can be active at the

45:41.100 --> 45:46.060
same time good now we have something to do how do we start?

45:47.080 --> 45:52.780
the input alphabet has the terminal signs of our grammar the states

45:52.780 --> 45:59.000
you can not understand why there are three but we come up with three

45:59.000 --> 46:06.740
states for a random grammar it is enough to look at three states so we

46:06.740 --> 46:16.220
only have three different activities to do to determine whether a word

46:16.220 --> 46:21.660
belongs to the grammar or not then we need a cellar alphabet in the

46:21.660 --> 46:26.200
cellar we write the terminal signs, non-terminal signs and of course

46:26.200 --> 46:31.080
our cellar starting symbol and we have a final state that is the S2 so

46:31.080 --> 46:35.500
obviously we only need two different possibilities, namely S0 and S1

46:35.500 --> 46:39.360
the S2 will be the final state in which we go in and out we do not

46:39.360 --> 46:42.060
come out what do we do?

46:43.100 --> 46:50.880
we just start with imagine we have our input tape and there is a word

46:50.880 --> 46:58.460
on it something with ABC or something A1, A2 and so on and now we have

46:58.460 --> 47:02.880
to think what do we write in the cellar and in the cellar we write

47:02.880 --> 47:11.300
something we ignore the tape we write our starting symbol S and then

47:11.300 --> 47:17.860
we look can we replace this starting symbol by a right side of our

47:17.860 --> 47:23.420
rule so if I have a rule with which I can replace the starting symbol

47:23.420 --> 47:32.180
by a word then I write this word FI in there so I take out the S and

47:32.180 --> 47:38.200
make it a FI and now I have to answer a question what is the

47:38.200 --> 47:41.000
difference between a deterministic and a non-deterministic cellar

47:41.000 --> 47:48.140
machine aha I made it very clear on the previous slide this question

47:48.140 --> 47:51.120
can only indicate that someone did not pay attention a few minutes ago

47:52.320 --> 47:58.420
if we have several possibilities in a situation state, input sign and

47:58.420 --> 48:02.900
cellar the cellar symbol if we have several transition possibilities

48:02.900 --> 48:06.900
then it is a non-deterministic cellar machine so I think that is clear

48:06.900 --> 48:10.700
if you have questions about that please ask them in the exercises

48:10.700 --> 48:15.520
there you will have more opportunities to look at it here we have now

48:17.760 --> 48:22.840
a non-deterministic cellar machine because we say if I am in a state

48:23.640 --> 48:32.680
first in S0 in the initial state I do something with an empty cellar

48:32.680 --> 48:38.240
no, sorry, I make a lambda transition I ignore the input band I have

48:38.240 --> 48:42.880
the initial cellar symbol K0 now I write the S in there and go into

48:42.880 --> 48:48.040
state S1 and in state S1 I do the following that I continue to do a

48:48.040 --> 48:52.780
lambda transition as long as here are no terminal signs as the highest

48:52.780 --> 49:03.140
so if there is an A up here I try to replace it for every rule that I

49:03.140 --> 49:12.940
have from which I can replace the A for every such rule I can make

49:12.940 --> 49:17.720
such a transition that is, I stay in state S1 and I now have the

49:17.720 --> 49:21.240
possibility to write in various such right sides as long as they are

49:21.240 --> 49:27.100
in my grammar I can choose any of them here we have the non

49:27.100 --> 49:34.960
-determinism we had seen before was an example I could derive an S on

49:34.960 --> 49:42.820
A, S, B or on lambda two possibilities yes we already have two

49:42.820 --> 49:49.320
elements in here now is the other case that we as the highest cellar

49:49.320 --> 49:58.840
symbol see a terminal sign that is through this replacing of some

49:58.840 --> 50:04.560
signs as long as there were non-terminal signs we replaced them and

50:04.560 --> 50:10.480
then comes a constant and now we see if the sign that is up here, the

50:10.480 --> 50:19.360
terminal sign is this A1 this is the first sign so we have found

50:20.520 --> 50:26.320
starting from our oops, from our S we have through several transitions

50:26.320 --> 50:30.260
in principle we have always made a left derivation we have the

50:30.260 --> 50:35.180
leftmost sign that is always up here the leftmost sign as long as it

50:35.180 --> 50:43.960
is a non-terminal sign if it is a terminal sign some A then it will

50:43.960 --> 50:48.260
stay there and it has to be the first sign of my word so it can be an

50:48.260 --> 50:54.340
element of the language then I check if it is the first sign then I

50:54.340 --> 51:00.920
delete the A from the cellar and now I have a new cellar symbol so now

51:00.920 --> 51:11.260
I have this one transition and if afterwards it could have been if

51:11.260 --> 51:20.800
afterwards there was nothing left and I see that starting cellar sign

51:20.800 --> 51:24.820
K0 in state S1 then it means that I have deleted everything that I

51:24.820 --> 51:31.160
wrote in there in the cellar then all the signs that I have seen as

51:31.160 --> 51:36.820
terminal signs also on the tape and somewhere in the corresponding

51:36.820 --> 51:41.280
positions I have obviously seen every sign that I could generate in

51:41.280 --> 51:46.080
the input tape then it was a correct word and therefore I can go to

51:46.080 --> 51:50.140
the final state and leave the cellar intact this is the lambda

51:50.140 --> 51:54.480
transition to the final state and the other possibility we don't need

51:54.480 --> 51:57.280
any other possibility we are done why are we done?

51:57.960 --> 52:02.840
the other possibility was here if we have deleted the A it could be

52:02.840 --> 52:08.020
that there is a non-terminal sign for example this A and then we look

52:08.020 --> 52:13.760
again can we replace it with lambda transitions internally we simulate

52:13.760 --> 52:19.900
internally our rules of grammar this is a sequence and if we have

52:19.900 --> 52:24.380
generated a terminal sign again then we look again is the next sign

52:24.380 --> 52:31.260
here maybe some B is that also on the tape and accordingly we can in

52:31.260 --> 52:36.360
this way through the lambda transitions through such transitions

52:36.360 --> 52:42.680
simulate a sequence in which we on the left side have no further

52:42.680 --> 52:47.700
terminal signs as soon as a terminal sign is created there we check is

52:47.700 --> 52:54.100
this sign on the tape in the input if yes, we are happy if not, we

52:54.100 --> 52:58.000
stop and cancel the recognition because we know the word is not in the

52:58.000 --> 53:04.320
language so this is the simple way how we can with a non-deterministic

53:04.320 --> 53:11.820
cellar machine simulate a grammar and again the non-determinism can be

53:11.820 --> 53:19.480
seen here we can or we have to for every possible right side a rule

53:19.480 --> 53:25.240
with A on the left side make a transition to S1, V we have to be able

53:25.240 --> 53:31.980
to write V in the cellar so this is our simulation lambda transitions

53:31.980 --> 53:36.720
for the simulation of derivations real transitions to check if the

53:36.720 --> 53:42.380
sign that we have generated actually also in the input was in the

53:42.380 --> 53:48.960
input with this we can show or we have to show that this is really

53:48.960 --> 53:54.500
correct so we have to show that from the start configuration in the

53:54.500 --> 54:01.800
initial state with input word W and the initial cellar symbol K0 we

54:01.800 --> 54:07.520
actually have a sequence of configuration transitions where at the end

54:07.520 --> 54:11.400
I come to the final state that I have read the input word completely

54:11.400 --> 54:18.700
and it is actually only the K0 in my memory, that means I have thrown

54:18.700 --> 54:23.620
out everything that I have simulated in the cellar of derivations you

54:23.620 --> 54:31.380
can like by induction about the length of the word prove I actually on

54:31.380 --> 54:36.240
this formal proof make such an example here for the configuration

54:36.240 --> 54:40.700
transitions that we get this is of course again a very simple example

54:40.700 --> 54:44.900
the other with our little programming language with the repeat loop or

54:44.900 --> 54:47.960
with the arithmetic expressions would be a bit more complex but I

54:47.960 --> 54:53.620
don't think that's so problematic we start with our A3 B3 on our

54:53.620 --> 54:58.560
memory not on the memory sorry, on the input band are in the initial

54:58.560 --> 55:04.820
state first write the start symbol our grammar in the memory now we

55:04.820 --> 55:10.840
have this S1 with a non-terminal sign as the top cellar sign and must

55:10.840 --> 55:17.320
first internally simulate a derivation so turn the rule SG to ASGB

55:17.960 --> 55:25.340
that means the SG is replaced by ASGB and now we can by application of

55:25.340 --> 55:30.340
the rule where we check whether the characters agree throw out the A

55:31.480 --> 55:35.300
then again have the SG as the top cellar sign must again the SG

55:35.300 --> 55:39.600
somehow internally simulate can do that for example as it stands here

55:39.600 --> 55:43.320
we could also have used the lambda rule but it would not have brought

55:43.320 --> 55:47.100
us any further I want to show that this is correct I can do that again

55:47.100 --> 55:57.020
so I can again throw out the A then again the SG stand there I can

55:57.020 --> 56:03.100
replace it again but in this case by my lambda then the S falls away

56:03.100 --> 56:09.760
and now you see here I have BBB stand in the cellar so here is now BBB

56:09.760 --> 56:14.580
in the cellar as the top sign BBB is also still on the band I can now

56:14.580 --> 56:19.660
read through the series and then come through a sequence of transfers

56:19.660 --> 56:25.540
to S1 lambda K0 because I can throw out all Bs and so I can because I

56:25.540 --> 56:32.500
have S1 and K0 I can with the lambda transition to S2 and I'm done

56:32.500 --> 56:37.040
with the sequence of configurations, I have a final configuration

56:37.040 --> 56:37.440
reached.

56:38.240 --> 56:46.040
So I have now shown how to a grammar with a cellar can simulate namely

56:46.040 --> 56:55.080
because we look at here the sequence of words like 1 like 2 and so on

56:55.080 --> 56:59.380
which we here successive deduce using our grammar.

56:59.960 --> 57:04.840
What we do here is the following, that we always have words and we

57:04.840 --> 57:10.060
replace always the leftmost sign, i.e.

57:10.160 --> 57:13.680
what we in front of terminal signs deduce.

57:13.980 --> 57:17.800
Later we have here a word, our word here consists only of terminal

57:17.800 --> 57:22.700
signs because we always have the leftmost sign as the top cellar sign

57:22.700 --> 57:26.180
have the leftmost non-terminal sign we have here a left derivation

57:26.180 --> 57:27.660
which we simulate.

57:28.940 --> 57:34.580
It is also just below that this in this construction always a left

57:34.580 --> 57:35.420
derivation is.

57:36.260 --> 57:40.160
It is often done with cellar machines so that you actually left

57:40.160 --> 57:41.340
derivations simulated.

57:41.800 --> 57:46.360
This here is an example of how automatically a left derivation is

57:46.360 --> 57:46.760
created.

57:48.020 --> 57:51.080
Well, now I have animated something wrong here.

57:51.780 --> 57:56.340
Let's see what this kA star means but it will certainly be a sequence

57:56.340 --> 57:57.460
of configuration transitions.

57:58.140 --> 58:00.920
Exactly, here it is about the arithmetic expressions.

58:01.560 --> 58:04.300
Also for them we can of course do something like this.

58:04.440 --> 58:06.400
Here I have thrown out a little something simplified a little.

58:07.820 --> 58:12.940
No longer a factor and an identifier written in, but only S and T used

58:12.940 --> 58:15.280
as non-terminal signs.

58:15.680 --> 58:18.220
Now you see that it is also easier to construct such arithmetic

58:18.220 --> 58:18.820
expressions.

58:20.520 --> 58:26.740
If we do this in cellar machines then we would start with our S0 would

58:26.740 --> 58:30.820
have, for example, a word a plus a in brackets on the input band.

58:30.960 --> 58:35.800
At the beginning the k0, first write the S in the cellar, go to S1,

58:36.280 --> 58:41.620
first simulate diligently here our grammar and for example when we

58:41.620 --> 58:43.920
apply this rule S...

58:44.740 --> 58:46.440
It's not that easy.

58:47.100 --> 58:52.540
First we have to replace the S the S...

58:52.540 --> 58:53.620
Yes, there it is.

58:55.040 --> 58:57.040
S goes to S in brackets.

58:57.560 --> 58:58.700
That would be this one transition.

58:58.920 --> 59:02.520
Then we have a bracket open here and we can immediately throw the

59:02.520 --> 59:04.280
bracket out from the cellar.

59:04.440 --> 59:06.600
Then we only have a plus a here.

59:06.880 --> 59:10.120
We go to the next sign, we can replace the S again.

59:10.580 --> 59:13.220
We can, for example, replace the S by S plus T.

59:13.740 --> 59:14.640
There is still a S left.

59:15.420 --> 59:18.560
We can replace the S by a T.

59:19.280 --> 59:20.860
Still a non-terminal sign.

59:21.160 --> 59:22.440
We stay in S1.

59:23.060 --> 59:24.180
Nothing changes on the input band.

59:25.420 --> 59:27.640
We replace the T by an A in this case.

59:28.040 --> 59:29.600
Then we have a non-terminal sign there.

59:30.500 --> 59:35.340
And if we now throw out the A, it stays on the input band plus A.

59:35.340 --> 59:41.640
If I simulate our input band here, it stays on the input band.

59:41.760 --> 59:43.100
Oops, bracket open.

59:43.260 --> 59:46.080
A plus A bracket closed.

59:46.880 --> 59:48.600
First we had this sign.

59:49.700 --> 59:54.500
We saw at some point that this was actually the highest sign on our

59:54.500 --> 59:55.540
cellar.

59:55.980 --> 59:58.740
We moved on and came to this sign.

59:58.820 --> 01:00:00.860
We waited a while until we saw it again.

01:00:01.080 --> 01:00:01.620
Namely at

01:00:05.120 --> 01:00:06.860
this position here.

01:00:07.060 --> 01:00:08.660
Then we could move on to another sign.

01:00:09.560 --> 01:00:12.200
The next thing we see is another terminal sign.

01:00:13.760 --> 01:00:17.500
That's why the plus sign is there.

01:00:18.240 --> 01:00:19.500
It's also there in the cellar.

01:00:19.560 --> 01:00:20.420
We can erase that.

01:00:20.500 --> 01:00:22.300
That's why we come to A and T.

01:00:22.460 --> 01:00:23.880
We have non-terminal signs there.

01:00:24.660 --> 01:00:28.820
We can replace the non-terminal sign T internally.

01:00:29.460 --> 01:00:34.280
For example we can replace the terminal sign T with anything.

01:00:35.020 --> 01:00:36.440
But also with an A.

01:00:37.300 --> 01:00:38.280
That's one possibility.

01:00:39.260 --> 01:00:42.480
If we do it this way, we have the same sign on the input band and on

01:00:42.480 --> 01:00:44.900
the highest sign in the cellar.

01:00:46.200 --> 01:00:47.560
We can delete that.

01:00:48.080 --> 01:00:51.940
Now we have the bracket and only the bracket left.

01:00:52.140 --> 01:00:53.460
We can also throw out the bracket.

01:00:54.040 --> 01:00:58.380
Now we have the S1 and in the cellar only the bracket K0.

01:00:59.460 --> 01:01:01.620
We can move on to S2 because of the lambda transition.

01:01:02.800 --> 01:01:06.140
We would have erased everything because we threw them out.

01:01:07.700 --> 01:01:13.340
That's a simple simulation of a branch sequence or a branch tree or a

01:01:13.340 --> 01:01:19.100
left branch of an expression bracket to A plus A bracket to.

01:01:19.400 --> 01:01:21.460
We saw that it works with the rules of grammar.

01:01:22.040 --> 01:01:26.780
We can integrate it Now I should do it slower.

01:01:28.520 --> 01:01:29.080
A bit fast.

01:01:30.260 --> 01:01:31.520
I'm sorry if it's too fast.

01:01:32.020 --> 01:01:34.380
But I know what I still have to do in this lecture.

01:01:34.640 --> 01:01:35.280
That's the problem.

01:01:36.080 --> 01:01:37.020
And I'm already a bit behind.

01:01:38.580 --> 01:01:41.060
I think the example is quite instructive.

01:01:41.180 --> 01:01:43.100
We go in order from left to right.

01:01:44.200 --> 01:01:48.580
We always simulate on the band what we can delete.

01:01:49.520 --> 01:01:52.940
And we can only do something if we have a non-terminal sign which we

01:01:52.940 --> 01:01:55.920
replace and go from left to right.

01:01:56.900 --> 01:02:00.660
My remark did not lead to fewer people saying it slower but rather

01:02:00.660 --> 01:02:01.900
more people saying it slower.

01:02:02.320 --> 01:02:05.680
It could be that they just came later and therefore those who said

01:02:05.680 --> 01:02:11.980
earlier that's right, they fall out again It's like that they are

01:02:11.980 --> 01:02:16.500
displayed here for a while and then they are forgotten again.

01:02:16.740 --> 01:02:20.500
There is only a certain section of the whole report which is displayed

01:02:20.500 --> 01:02:20.660
there.

01:02:21.100 --> 01:02:26.440
So now I will show you the next thing.

01:02:27.720 --> 01:02:31.580
The next idea is the grammars can be quite complicated.

01:02:32.540 --> 01:02:35.980
And now we want to think about can I simplify the grammars?

01:02:37.440 --> 01:02:42.480
Can I therefore define grammars with a very simple structure?

01:02:43.000 --> 01:02:47.480
Now I will tell you about two possibilities to construct a so-called

01:02:47.480 --> 01:02:49.280
normal form for grammars.

01:02:50.780 --> 01:02:57.160
That means I would like to have a kind which tells me that every

01:02:57.160 --> 01:03:02.060
grammar which is supposed to describe a context-free language can be

01:03:02.060 --> 01:03:04.320
described in a very simple way.

01:03:06.500 --> 01:03:10.240
First of all I want to throw out these lambda rules.

01:03:11.400 --> 01:03:12.340
They are a bit annoying.

01:03:12.340 --> 01:03:13.780
They also bothered us when we looked at the hierarchy.

01:03:16.780 --> 01:03:26.120
That's why I have or rather I say now we want to look at can we throw

01:03:26.120 --> 01:03:26.780
them out?

01:03:27.360 --> 01:03:32.920
So we say a context-free grammar is lambda-free if there are no rules

01:03:32.920 --> 01:03:34.260
of this form A on lambda.

01:03:35.680 --> 01:03:39.040
One thing is clear then the empty word is certainly not an element of

01:03:39.040 --> 01:03:39.400
the language.

01:03:39.400 --> 01:03:40.520
It can't create it.

01:03:40.920 --> 01:03:45.140
But that's not so relevant or probably we will see that it's not

01:03:45.140 --> 01:03:45.500
relevant.

01:03:46.160 --> 01:03:52.440
In any case there is a nice sentence which says that for every context

01:03:52.440 --> 01:03:57.300
-free grammar there is a lambda-free grammar which creates the same

01:03:57.300 --> 01:04:00.500
language except for the empty word.

01:04:00.820 --> 01:04:01.740
I can't create that.

01:04:02.320 --> 01:04:04.960
But the empty word is usually not so relevant.

01:04:07.040 --> 01:04:08.780
That means I can't create all the relevant words.

01:04:10.220 --> 01:04:14.100
And I can very easily present this grammar and construct it.

01:04:14.380 --> 01:04:18.480
And I do that because it shows how to argue when you work with

01:04:18.480 --> 01:04:20.180
grammatical conversions.

01:04:21.980 --> 01:04:22.940
How can you do that?

01:04:24.020 --> 01:04:26.280
When can I throw out an empty word?

01:04:26.800 --> 01:04:32.220
What can actually happen if I have an empty word on the right side of

01:04:32.220 --> 01:04:32.780
a rule?

01:04:33.980 --> 01:04:44.840
For example imagine that you created some words like 1 and so on and

01:04:44.840 --> 01:04:50.100
came to a word which consists only of a's.

01:04:50.660 --> 01:04:51.660
A on lambda.

01:04:52.020 --> 01:04:58.620
Then you created a big word and suddenly you only use this a on lambda

01:04:58.620 --> 01:05:02.060
and that merges and you have created this empty word.

01:05:03.810 --> 01:05:04.420
With stars.

01:05:05.460 --> 01:05:10.500
So I could use this a on lambda and I would have the empty word there.

01:05:11.620 --> 01:05:14.580
That means I can't look at a word of a certain length.

01:05:16.860 --> 01:05:21.840
Does that mean that the word resulting from it e.g.

01:05:22.580 --> 01:05:26.360
the programming language or the final word has a certain minimum

01:05:26.360 --> 01:05:31.960
length because I see that a whole series of non-terminal symbols

01:05:31.960 --> 01:05:32.420
disappear.

01:05:33.740 --> 01:05:36.200
I would like to do that in the following way.

01:05:36.960 --> 01:05:42.840
I would like to find all non-terminal symbols from which the lambda

01:05:42.840 --> 01:05:43.760
can be derived.

01:05:45.440 --> 01:05:52.580
So here is how I can replace a non-terminal symbol with an empty word.

01:05:53.520 --> 01:05:57.540
Ableitbar means I have a non-terminal symbol and from it I can derive

01:05:57.540 --> 01:05:59.160
by a sequence of derivations.

01:06:03.020 --> 01:06:08.780
That means somehow through transitions to other variables that all go

01:06:08.780 --> 01:06:10.040
to lambda at some point I can derive it.

01:06:10.920 --> 01:06:13.220
When can I derive the lambda from a non-terminal symbol?

01:06:14.520 --> 01:06:19.680
First of all, if there is such a rule that A goes directly to lambda

01:06:19.680 --> 01:06:21.800
then I can derive the lambda from it.

01:06:22.380 --> 01:06:34.400
Then it may be that I have a rule A to B and the right side of my rule

01:06:34.400 --> 01:06:39.600
the phi consists of symbols all in the set U.

01:06:41.240 --> 01:06:53.620
So I can derive the word B, C and B and C are each elements of U.

01:06:54.060 --> 01:06:57.480
In U were all that go directly to the empty word.

01:06:59.080 --> 01:07:01.840
Now I have all of them with me.

01:07:02.700 --> 01:07:08.960
That means if I can replace both B and C then I could derive B, C from

01:07:08.960 --> 01:07:09.720
A.

01:07:09.720 --> 01:07:12.740
First of all derive B, C.

01:07:13.580 --> 01:07:15.580
From that I could derive C.

01:07:16.240 --> 01:07:17.380
From that I could derive lambda.

01:07:17.880 --> 01:07:20.140
That means I could derive lambda from A through several steps.

01:07:21.340 --> 01:07:22.400
Because I can derive B and C.

01:07:24.180 --> 01:07:30.020
And here through this one rule I can derive lambda from it.

01:07:30.060 --> 01:07:31.580
So I have generated the empty word.

01:07:31.920 --> 01:07:32.840
That makes sense.

01:07:33.780 --> 01:07:40.860
An A, a non-terminal symbol that can be replaced by a sequence of

01:07:40.860 --> 01:07:44.080
symbols that can all be derived from an empty word.

01:07:45.420 --> 01:07:49.120
So I can take A because then I can derive A from the empty word.

01:07:50.100 --> 01:07:53.500
And I do that until the U changes.

01:07:53.600 --> 01:07:56.520
If the U remains constant then I'm done.

01:07:57.460 --> 01:08:04.080
Now I have all non-terminal symbols from which I can derive the empty

01:08:04.080 --> 01:08:07.660
word either directly or through a sequence of steps.

01:08:10.300 --> 01:08:13.780
And now I create a uniform rule system.

01:08:14.040 --> 01:08:14.900
In the following way.

01:08:15.980 --> 01:08:18.660
First of all, all the rules I had are in there.

01:08:20.180 --> 01:08:22.560
I just leave them all in there.

01:08:23.060 --> 01:08:29.100
And then I look at a rule A to phi 1 B to phi 2.

01:08:29.460 --> 01:08:34.000
A rule where A does not go to lambda but I have on the right a non

01:08:34.000 --> 01:08:34.400
-empty word.

01:08:35.020 --> 01:08:40.220
A symbol with B from U.

01:08:40.860 --> 01:08:47.280
So if on the right side of a rule a non-terminal symbol appears that

01:08:47.280 --> 01:08:52.880
is from my set U and this set U contains all the symbols I can derive

01:08:52.880 --> 01:09:01.680
from an empty word then that means if I take a word that was here this

01:09:01.680 --> 01:09:09.680
phi 1 B phi 2, if I have that on the right then I can replace B with

01:09:09.680 --> 01:09:14.820
lambda at some point and could also make phi 1 phi 2 directly from A.

01:09:15.880 --> 01:09:17.300
At least that's a possibility.

01:09:18.660 --> 01:09:22.380
There is the possibility to derive lambda from B so I just write the

01:09:22.380 --> 01:09:26.980
rule in that I can generate phi 1 phi 2 from A.

01:09:27.000 --> 01:09:30.620
I leave the other rule in because I don't need to throw it out.

01:09:31.200 --> 01:09:38.120
But I know that I can derive the empty word from B and then I just

01:09:38.120 --> 01:09:40.540
write the rule A to phi 1 phi 2 directly in.

01:09:42.900 --> 01:09:45.440
And I do that until I don't have any changes.

01:09:45.840 --> 01:09:50.300
So wherever on the right side of my rule non-terminal symbols appear

01:09:50.300 --> 01:09:56.640
that can be derived from an empty word I do that so that I just erase

01:09:56.640 --> 01:10:00.740
the B and draw a shortened word on the right.

01:10:02.310 --> 01:10:08.900
And now I throw out all lambda rules, that means I throw out all rules

01:10:08.900 --> 01:10:12.320
that have this form a non-terminal symbol is replaced by the empty

01:10:12.320 --> 01:10:12.580
word.

01:10:14.160 --> 01:10:15.660
Why am I allowed to do that?

01:10:16.540 --> 01:10:21.580
Because I replace all possibilities to replace non-terminal symbols by

01:10:21.580 --> 01:10:25.840
the empty word by these rules that I have built in.

01:10:26.960 --> 01:10:32.180
Whenever I am able to throw out a symbol I have built in these rules.

01:10:33.480 --> 01:10:37.520
And that's why I have the same expression as before.

01:10:37.580 --> 01:10:41.320
But you have to show that I didn't create new possibilities by writing

01:10:41.320 --> 01:10:42.580
something else on the right.

01:10:44.320 --> 01:10:47.220
In this case everything is relatively easy to see.

01:10:47.340 --> 01:10:53.280
We have to think about which consequences can arise and what can I

01:10:53.280 --> 01:10:55.960
change so that I don't get new consequences.

01:10:56.960 --> 01:11:00.180
And the ones that bothered me namely this replacement by the empty

01:11:00.180 --> 01:11:01.520
word I threw out all of them.

01:11:02.840 --> 01:11:07.920
And now on the right of my rules I have at least one symbol everywhere

01:11:08.780 --> 01:11:10.540
non -terminal words.

01:11:10.720 --> 01:11:12.680
And with that I achieved exactly what I wanted.

01:11:13.360 --> 01:11:18.620
Now I have all the words that I could create in here except for the

01:11:18.620 --> 01:11:19.180
empty word.

01:11:19.780 --> 01:11:20.520
That's not possible anymore.

01:11:21.120 --> 01:11:23.760
And now you can do a simple trick and say, if I want to have the empty

01:11:23.760 --> 01:11:29.160
word I can simply create a new starting symbol and this new starting

01:11:29.160 --> 01:11:36.580
symbol let's call it S goes to lambda and it also goes to the old

01:11:36.580 --> 01:11:39.900
starting symbol S and then I only have for the new starting symbol S'

01:11:40.060 --> 01:11:42.280
that doesn't appear anywhere else in the new starting symbol that

01:11:42.280 --> 01:11:48.820
creates lambda and apart from that I do everything I can with my rules

01:11:48.820 --> 01:11:51.280
of grammar with the rules P'.

01:11:51.280 --> 01:11:54.920
And with that I even created the whole language L of G.

01:11:55.680 --> 01:12:00.120
So that's something you can easily do and with that I now have the

01:12:00.120 --> 01:12:01.560
possibility to throw out the lambda rules.

01:12:03.120 --> 01:12:07.280
Now comes the next normal form or now comes a real normal form.

01:12:07.780 --> 01:12:11.940
We didn't want to have lambda rules and now I do something that looks

01:12:11.940 --> 01:12:12.820
very drastic.

01:12:13.480 --> 01:12:18.680
I claim that I can of course I can always do that I can say I have a

01:12:18.680 --> 01:12:21.760
normal form where I only allow very simple rules.

01:12:22.620 --> 01:12:27.500
Either there are two on the right not terminal symbols or one terminal

01:12:27.500 --> 01:12:29.460
symbol and that's it.

01:12:30.780 --> 01:12:32.000
Only rules of this kind.

01:12:33.660 --> 01:12:35.520
But that looks very drastic.

01:12:36.600 --> 01:12:39.180
Only two possibilities on the right.

01:12:39.740 --> 01:12:41.140
Two not terminal symbols.

01:12:41.680 --> 01:12:47.660
How can that ever be something that is a normal form in the sense that

01:12:47.660 --> 01:12:57.080
I can find a grammar in normal form that creates the same language.

01:12:57.740 --> 01:12:59.620
But that looks pretty drastic.

01:13:00.460 --> 01:13:05.700
And it's actually the case that you can find such a context-free

01:13:05.700 --> 01:13:12.000
grammar in normal form which, of course, since it's lambda-free can't

01:13:12.000 --> 01:13:12.560
generate an empty word.

01:13:13.980 --> 01:13:17.000
But if that works it's a great thing.

01:13:17.320 --> 01:13:19.600
If I only have such rules, they are all very simple.

01:13:20.560 --> 01:13:24.720
That means our machine that we build will also be very simple with all

01:13:24.720 --> 01:13:28.080
the rules that I write in our cellar.

01:13:29.820 --> 01:13:32.480
I'm not making any formal evidence here.

01:13:32.480 --> 01:13:36.300
I'm just giving you an example of what you have to watch out for.

01:13:37.460 --> 01:13:44.400
And we'll first look at a grammar that should generate all well-formed

01:13:44.400 --> 01:13:44.840
brackets.

01:13:45.920 --> 01:13:52.260
You remember, well-formed brackets were all those words where the

01:13:52.260 --> 01:13:56.860
number of open brackets in the prefix is never smaller than the number

01:13:56.860 --> 01:13:57.780
of closed brackets.

01:13:58.940 --> 01:14:07.080
So, a grammar that generates exactly such well-formed brackets is this

01:14:07.080 --> 01:14:10.460
one with these three rules S on SS.

01:14:11.160 --> 01:14:13.880
So I can write brackets next to each other.

01:14:14.720 --> 01:14:18.240
I can make brackets open brackets, S closed brackets.

01:14:19.040 --> 01:14:22.400
And I can delete the S and make an empty word out of it.

01:14:23.240 --> 01:14:27.220
First, I'd have to make the whole thing lambda-free.

01:14:28.200 --> 01:14:32.140
If I make this grammar lambda-free, the Chomsky grammar is lambda

01:14:32.140 --> 01:14:32.460
-free.

01:14:33.480 --> 01:14:38.280
If I want to make it lambda-free, I have to see that the S goes to

01:14:38.280 --> 01:14:38.640
lambda.

01:14:39.660 --> 01:14:43.100
I only have a non-terminal sign, so the S is in the set U.

01:14:44.020 --> 01:14:45.800
Which new rule do I have to generate?

01:14:45.920 --> 01:14:48.300
I know that the S goes to lambda.

01:14:48.880 --> 01:14:51.300
So I can't just generate 2S, but I also have to be able to generate

01:14:51.300 --> 01:14:51.840
1S.

01:14:53.200 --> 01:14:55.780
I can't just generate open brackets, S closed brackets.

01:14:56.440 --> 01:14:59.100
I also have to generate open brackets, closed brackets.

01:14:59.320 --> 01:15:01.180
Because the S could also go to lambda.

01:15:02.520 --> 01:15:04.040
And I'm done with that.

01:15:04.260 --> 01:15:13.900
This is the lambda-free variant of our set P. I gave it the same

01:15:13.900 --> 01:15:14.380
letters.

01:15:14.960 --> 01:15:16.100
Actually, I should say P'.

01:15:16.100 --> 01:15:22.820
So this is now our lambda-free variant of rules with which I generate

01:15:22.820 --> 01:15:25.100
open brackets, except for the blank word.

01:15:25.920 --> 01:15:28.500
And now I eliminate pure renaming.

01:15:29.180 --> 01:15:30.280
This is the next step.

01:15:30.920 --> 01:15:33.180
In the general process, you proceed in the same way.

01:15:33.800 --> 01:15:38.640
First make everything lambda-free, then throw out all simple renaming.

01:15:40.860 --> 01:15:44.740
If I only replace a variable with another, I can also do without it.

01:15:44.840 --> 01:15:46.100
This is not an essential step.

01:15:47.260 --> 01:15:50.720
In this case, we have a rule where S is replaced by S.

01:15:51.040 --> 01:15:52.120
I don't need that, of course.

01:15:52.380 --> 01:15:53.340
I can delete it right away.

01:15:55.500 --> 01:15:59.640
Then I have P' here.

01:16:00.400 --> 01:16:05.520
S goes to S S, to brackets S closed brackets, or to brackets closed

01:16:05.520 --> 01:16:05.700
brackets.

01:16:07.200 --> 01:16:12.240
This looks very good, but it is not yet in the form we need for the

01:16:12.240 --> 01:16:13.160
Chomsky normal form.

01:16:14.800 --> 01:16:18.400
Here we have two non-terminal characters.

01:16:18.700 --> 01:16:21.320
Here we have a combination of non-terminal characters and terminal

01:16:21.320 --> 01:16:21.760
characters.

01:16:22.180 --> 01:16:24.260
And here we have two terminal characters.

01:16:24.400 --> 01:16:28.620
This also does not work in the Chomsky grammar.

01:16:28.980 --> 01:16:34.660
The next step is that we know that in the Chomsky grammar on the right

01:16:34.660 --> 01:16:40.680
there are either only non-terminal characters, possibly only two, or

01:16:40.680 --> 01:16:42.260
there is a terminal character.

01:16:43.240 --> 01:16:48.020
Here we have the case that we have more than one terminal character.

01:16:48.660 --> 01:16:49.980
So we do the following.

01:16:50.220 --> 01:16:58.020
First of all, on each right side either only one terminal character or

01:16:59.360 --> 01:17:02.340
there are many non-terminal symbols.

01:17:02.640 --> 01:17:06.420
That means we simply replace every terminal character that appears

01:17:06.420 --> 01:17:11.900
here on the right side, there and there, by a new variable.

01:17:11.900 --> 01:17:18.720
We simply add a new non-terminal character for each such symbol, ca,

01:17:20.420 --> 01:17:26.500
and add another rule in which we can replace this symbol directly by

01:17:26.500 --> 01:17:26.940
an a.

01:17:28.120 --> 01:17:29.600
That means, now we have made it a bit more complicated.

01:17:30.680 --> 01:17:33.500
We have inserted at a point where the terminal symbol already existed,

01:17:36.860 --> 01:17:43.000
but we can replace it again by the terminal symbol This on the example

01:17:43.000 --> 01:17:50.260
means we have to write a variable c, on and c, to at the points where

01:17:51.620 --> 01:17:52.780
And c,

01:17:56.200 --> 01:17:59.200
on is replaced by ca, on and c, to by ca, to.

01:17:59.680 --> 01:18:03.020
Now we are very close to the Chomsky normal form.

01:18:03.340 --> 01:18:07.980
We are already at the point where we have two non-terminal symbols,

01:18:08.880 --> 01:18:14.660
here a rule, two non-terminal symbols on the right, here a rule with

01:18:14.660 --> 01:18:17.640
one terminal symbol on the right, one terminal symbol on the right.

01:18:17.900 --> 01:18:22.060
The only rule that still gives us problems is this one, where we have

01:18:22.060 --> 01:18:23.460
three symbols.

01:18:24.480 --> 01:18:26.340
But they are all non-terminal symbols.

01:18:27.060 --> 01:18:33.860
Now we have to make sure in general that rules where the number of non

01:18:33.860 --> 01:18:38.020
-terminal symbols on the right is greater than two are replaced by

01:18:38.020 --> 01:18:40.080
rules with only two non-terminal symbols.

01:18:41.300 --> 01:18:42.840
What I am doing now is the following.

01:18:43.560 --> 01:18:51.920
I have a rule a, on b1 to bn up there it says s, on c, on sc, to.

01:18:53.300 --> 01:18:56.280
If I want to simulate something like this with a sequence of new

01:18:56.280 --> 01:18:58.900
rules, then I make the following assumption.

01:19:00.460 --> 01:19:03.600
I want to simulate a left derivation with this.

01:19:07.460 --> 01:19:09.840
Here it is indeed a right derivation.

01:19:10.160 --> 01:19:10.780
So what do I do?

01:19:11.860 --> 01:19:14.180
I have the b1 to bn, that should arise.

01:19:15.860 --> 01:19:18.840
And I simulate it with a left derivation.

01:19:19.360 --> 01:19:20.240
Not quite.

01:19:21.180 --> 01:19:23.500
I take the word back with right or left derivation.

01:19:23.800 --> 01:19:24.720
I make the following construction.

01:19:25.900 --> 01:19:28.680
I simply notice that I first want to generate the b1.

01:19:29.600 --> 01:19:32.900
The first part of my word here, which was on the right side.

01:19:33.820 --> 01:19:37.380
And I have generated the b1 and I notice that I have to do more.

01:19:38.640 --> 01:19:40.820
On the right it said b2 to bn.

01:19:41.140 --> 01:19:42.320
I have to generate them.

01:19:42.860 --> 01:19:50.740
Then I add a rule where this new sign d1 is replaced by b2 d2.

01:19:50.900 --> 01:19:52.040
So I have already generated the b2.

01:19:53.760 --> 01:19:55.880
And then I notice that I have to do more.

01:19:56.560 --> 01:20:01.400
Until I have generated my last sign.

01:20:02.660 --> 01:20:04.580
That means I always have two signs.

01:20:04.820 --> 01:20:11.020
With each rule that is here I generate one of the signs that were in

01:20:11.020 --> 01:20:12.260
the original rule.

01:20:12.980 --> 01:20:13.940
b1 to bn.

01:20:15.000 --> 01:20:17.440
And I still notice that I have to do more.

01:20:18.540 --> 01:20:23.720
And this I have to do more are my variables d1 to dn-2.

01:20:24.600 --> 01:20:28.180
From dn-2 I can generate bn-1 and bn-1.

01:20:29.320 --> 01:20:32.920
That means I have here non-terminal signs that were only generated for

01:20:32.920 --> 01:20:33.960
this one rule.

01:20:36.600 --> 01:20:40.860
These are new non-terminal symbols and I only have these rules for

01:20:40.860 --> 01:20:41.940
these non-terminal signs.

01:20:43.280 --> 01:20:49.400
So this d1 to dn-2 cannot appear on any other right side of a rule.

01:20:50.440 --> 01:20:55.920
That means they only appear in this context that I want to subtract b1

01:20:55.920 --> 01:20:56.560
to bn.

01:20:57.060 --> 01:21:12.620
So in the changed grammar I can generate b1 to bn I can do it like

01:21:12.620 --> 01:21:16.540
this that I generate b1 to bk.

01:21:17.100 --> 01:21:23.500
Then I still have dk here and accordingly it continues until I have

01:21:23.500 --> 01:21:25.380
generated b1 to bn.

01:21:26.420 --> 01:21:29.300
So I always notice that I still have to do something on the right side

01:21:29.300 --> 01:21:32.400
and these variables d only appear for this one rule.

01:21:33.100 --> 01:21:34.500
Otherwise they don't appear at all.

01:21:35.320 --> 01:21:38.000
In the example it is very simple.

01:21:38.540 --> 01:21:41.080
I only have to generate a d1.

01:21:41.960 --> 01:21:48.680
c on d1 with the d1 that is already my dn-2 in this case.

01:21:48.680 --> 01:21:51.260
I directly generate s and c.

01:21:52.260 --> 01:21:57.560
These are the last two characters and I'm done.

01:21:58.400 --> 01:22:04.780
Now I have rules that all have two non-terminal characters on the

01:22:04.780 --> 01:22:04.780
right side.

01:22:05.340 --> 01:22:11.160
And the other possibility is that I have one terminal character on the

01:22:11.160 --> 01:22:11.380
right side.

01:22:11.960 --> 01:22:15.500
Now I'm in a form that corresponds to the Chomsky normal form.

01:22:16.320 --> 01:22:19.940
And I have shown in the example what it looks like when you generate

01:22:19.940 --> 01:22:20.780
the Chomsky normal form.

01:22:21.780 --> 01:22:23.880
By the way, I have shown you the elementary steps.

01:22:24.680 --> 01:22:25.800
The first was to make lambda free.

01:22:26.480 --> 01:22:28.440
The second was to throw out the simple renamings.

01:22:29.540 --> 01:22:32.700
The third was to replace the terminal characters with non-terminal

01:22:32.700 --> 01:22:33.060
characters.

01:22:34.060 --> 01:22:41.020
The fourth was to shorten the longer right sides by a sequence of new

01:22:41.020 --> 01:22:47.280
rules where I always notice how much I still have to do to be able to

01:22:47.280 --> 01:22:48.140
derive the whole word.

01:22:49.280 --> 01:22:52.040
This is the way to construct it.

01:22:52.320 --> 01:22:53.620
It generally goes like this.

01:22:54.560 --> 01:22:59.720
This example has shown how to do it and what kind of grammar is

01:22:59.720 --> 01:23:00.160
generated.

01:23:01.400 --> 01:23:05.420
Just briefly, there is another normal form, the Greibach normal form.

01:23:06.420 --> 01:23:08.700
The Greibach normal form looks different.

01:23:09.760 --> 01:23:14.680
The Greibach normal form has very easily describable rules.

01:23:15.440 --> 01:23:18.080
Non-terminal characters are context-free.

01:23:18.960 --> 01:23:24.740
The first character is an a followed by a phi, a rest, a phi of n

01:23:24.740 --> 01:23:24.980
stars.

01:23:25.680 --> 01:23:28.380
It looks very similar to our right-linear grammars.

01:23:29.000 --> 01:23:30.920
But there was a significant difference.

01:23:31.120 --> 01:23:33.540
The phi was always a single non-terminal character.

01:23:34.100 --> 01:23:35.560
Here, the phi can be anything.

01:23:36.620 --> 01:23:40.240
That means, we have a sequence of characters and the first character

01:23:40.240 --> 01:23:42.700
is always a terminal character.

01:23:43.260 --> 01:23:46.620
Think about what that means when we look at our cellar construction.

01:23:47.540 --> 01:23:53.620
Down here we had our imac 0 and a word was always written in here.

01:23:54.240 --> 01:23:56.660
There was always a phi in here.

01:23:57.220 --> 01:24:01.900
Now the highest character on the right side when we replaced something

01:24:02.640 --> 01:24:04.780
is immediately a terminal character.

01:24:05.420 --> 01:24:09.640
That means, with this construction of a cellar automaton from a

01:24:09.640 --> 01:24:14.800
grammar, we would replace a non-terminal character in the cellar with

01:24:14.800 --> 01:24:17.020
the highest character on the right side.

01:24:17.020 --> 01:24:19.380
We would immediately have a terminal character.

01:24:19.580 --> 01:24:21.520
We would have to see if it was on the board.

01:24:22.000 --> 01:24:26.420
Then I can do another derivation, replace a variable and go on.

01:24:27.100 --> 01:24:32.440
That means, if I translate a grammar with this form in my cellar

01:24:32.440 --> 01:24:37.680
automaton as I have shown you, who always simulates the derivation in

01:24:37.680 --> 01:24:40.540
the cellar and as soon as terminal characters appear, they are also on

01:24:40.540 --> 01:24:43.560
the board, I would only have to do one step internally.

01:24:43.980 --> 01:24:46.700
One step to read characters, one step internally to read a character.

01:24:47.500 --> 01:24:49.120
That goes very quickly.

01:24:50.360 --> 01:24:53.820
Of course, I have to choose the right consequences, so the whole thing

01:24:53.820 --> 01:24:56.220
is a bit more complicated.

01:24:56.480 --> 01:25:02.880
But that shows, I can basically derive such a word very quickly.

01:25:06.140 --> 01:25:10.400
In every second step, I have a terminal character.

01:25:10.700 --> 01:25:16.720
Then I have time to go through it.

01:25:17.400 --> 01:25:24.240
So, I have S on W in exactly the same derivation steps, because in

01:25:24.240 --> 01:25:25.960
every step a variable is replaced by a constant.

01:25:25.960 --> 01:25:37.080
That's obvious, if I derive S by some constant a1 and some phi, then

01:25:37.080 --> 01:25:42.340
it would be derived to some a1, a2 and some phi'.

01:25:42.600 --> 01:25:43.660
And so on.

01:25:43.820 --> 01:25:48.180
And then I would have derived my a1 to an by internal derivation

01:25:48.180 --> 01:25:48.180
steps.

01:25:48.900 --> 01:25:51.000
So, there are very short derivations.

01:25:51.180 --> 01:25:52.040
Very simple.

01:25:52.040 --> 01:25:56.140
But can you say that this works with every context-free grammar?

01:25:57.140 --> 01:25:58.140
It actually works.

01:25:59.400 --> 01:26:02.400
But I don't suggest that it's so extensive that you can do it.

01:26:03.160 --> 01:26:06.400
So, we can actually generate a Greibach normal form for every context

01:26:06.400 --> 01:26:07.120
-free grammar.

01:26:09.080 --> 01:26:12.140
Just so you can see, you can proceed differently with such a normal

01:26:12.140 --> 01:26:12.440
form.

01:26:15.360 --> 01:26:18.400
And this connection between the Greibach normal form and the cellar

01:26:18.400 --> 01:26:19.780
automaton, I've already shown you.

01:26:20.480 --> 01:26:25.400
So, there's always a derivative in the cellar, and then you check if

01:26:25.400 --> 01:26:26.500
the sign is actually there.

01:26:26.760 --> 01:26:28.040
So, it's very fast.

01:26:28.580 --> 01:26:31.360
And now we come to the Chomsky normal form, after I've shown you this

01:26:31.360 --> 01:26:32.580
normal form,

01:26:36.000 --> 01:26:38.980
to characterize the structure of context-free grammar.

01:26:39.780 --> 01:26:44.320
You remember, we also looked at structures of context-free grammar.

01:26:44.340 --> 01:26:45.820
And then we came to the pumping lemma.

01:26:46.380 --> 01:26:47.440
That's exactly what we're going to do now.

01:26:48.600 --> 01:26:50.440
We're looking for a corresponding pumping lemma.

01:26:51.220 --> 01:26:54.140
And now you just need to remember, you know what a pumping lemma is.

01:26:54.640 --> 01:26:58.120
An infinite system, an infinitely long sequence of situations.

01:26:59.040 --> 01:27:03.080
And then we know, if I can finally distinguish many situations, one

01:27:03.080 --> 01:27:03.840
situation appears twice.

01:27:04.520 --> 01:27:05.360
At least one.

01:27:07.000 --> 01:27:12.300
So, in context-free language, we also want to make a corresponding

01:27:12.300 --> 01:27:12.920
pumping lemma.

01:27:13.980 --> 01:27:15.620
And I'll take that into account.

01:27:16.260 --> 01:27:21.660
In context-free language, we'll find that for long words, we get two

01:27:21.660 --> 01:27:22.400
pumping points.

01:27:23.480 --> 01:27:25.780
And you can just look at that.

01:27:26.420 --> 01:27:32.680
You just look at a derivative tree for a sufficiently large word.

01:27:33.720 --> 01:27:37.580
And if I look at a derivative tree, then I have all kinds of paths in

01:27:37.580 --> 01:27:39.080
there that run through here.

01:27:40.020 --> 01:27:45.500
And I know the length of each node, there's a non-terminal sign here,

01:27:45.500 --> 01:27:47.820
until I get to a terminal sign at some point.

01:27:48.660 --> 01:27:53.280
Since I only have a lot of non-terminal signs, I know that at some

01:27:53.280 --> 01:27:56.060
point a sign has to reappear.

01:27:56.420 --> 01:27:57.820
For example, this A.

01:27:59.380 --> 01:28:00.780
It reappears again.

01:28:02.360 --> 01:28:06.800
On the way to the first appearance of the A, something has appeared in

01:28:06.800 --> 01:28:11.880
the tree, which also has something to do with terminal signs.

01:28:13.560 --> 01:28:19.340
On the left side, I derived a word from my S.

01:28:19.960 --> 01:28:24.640
Through a series of derivations, I come to a U.

01:28:25.780 --> 01:28:27.040
Then the A can be there.

01:28:28.000 --> 01:28:29.340
It says here that

01:28:32.540 --> 01:28:33.740
Y is next to it.

01:28:33.740 --> 01:28:35.940
So that's what we have at some point.

01:28:37.020 --> 01:28:40.040
U, the U, the A, and the Y.

01:28:41.180 --> 01:28:48.160
And then I can derive a series of derivations, a series of

01:28:48.160 --> 01:28:50.920
replacements, which looks like this.

01:28:51.400 --> 01:28:54.840
Then the A reappears, but in between I have a V and an X on the left

01:28:54.840 --> 01:28:57.800
and right next to the A.

01:28:58.980 --> 01:29:01.760
The A has already generated a V and an X.

01:29:03.640 --> 01:29:05.280
And there is the A again.

01:29:06.360 --> 01:29:11.800
And then it may be that I do even more and at some point the A is

01:29:11.800 --> 01:29:14.680
actually replaced by a series of terminal signs.

01:29:15.960 --> 01:29:18.140
Then something like this appears.

01:29:18.880 --> 01:29:23.080
This is the word that appears at the end of the derivation tree.

01:29:23.100 --> 01:29:24.240
These are the signs on the leaves.

01:29:24.240 --> 01:29:29.900
And obviously I can do whatever happens here as often as I want.

01:29:30.260 --> 01:29:34.800
This is exactly the situation where one sign appears twice and

01:29:34.800 --> 01:29:39.020
everything that happens I can do as often as I want.

01:29:40.020 --> 01:29:43.180
Everything is still in accordance with the rules of my grammar.

01:29:43.900 --> 01:29:48.240
That means, in between the V and the X are generated.

01:29:49.180 --> 01:29:52.820
If I can do this as often as I want, that means I can write an I here

01:29:52.820 --> 01:29:54.680
and there as well.

01:29:55.340 --> 01:29:58.160
So I can write U V I W X I Y.

01:29:59.440 --> 01:30:02.820
And that means we have these two pumping points here.

01:30:03.180 --> 01:30:07.620
I could also write V V X X here.

01:30:08.740 --> 01:30:10.040
No problem at all.

01:30:10.920 --> 01:30:15.300
This is definitely the right word of the language because it would

01:30:15.300 --> 01:30:19.840
have been generated in the same way by corresponding derivations.

01:30:20.160 --> 01:30:23.100
But just like the first V and the first X.

01:30:24.180 --> 01:30:26.160
And with that I immediately have this sentence.

01:30:26.700 --> 01:30:30.220
You could immediately write down how these boundary conditions of this

01:30:30.220 --> 01:30:31.080
sentence look like.

01:30:31.480 --> 01:30:34.140
This is the pumping lemma for context-free language.

01:30:35.040 --> 01:30:39.140
You must already have understood it because it is only this idea that

01:30:39.140 --> 01:30:43.320
can distinguish a finite number of events from an infinite number of

01:30:43.320 --> 01:30:43.320
events.

01:30:44.320 --> 01:30:48.200
With an infinitely long sequence one event appears twice.

01:30:48.960 --> 01:30:51.000
In this case, these are the non-terminal signs.

01:30:51.440 --> 01:30:53.460
With the finite automata, these were the conditions.

01:30:53.960 --> 01:30:56.440
We'll take a closer look at that on Wednesday.

01:30:57.140 --> 01:30:57.560
Thank you very much.

