WEBVTT

00:00.920 --> 00:03.780
Hello and welcome back to the pipe course.

00:04.260 --> 00:08.280
We are now here at the last part under the main part Tools and

00:08.280 --> 00:12.760
Packages at 3.7, additional packages that I want to show you.

00:14.780 --> 00:19.220
In principle, this will be a very short summary here in the relatively

00:19.220 --> 00:25.240
small part, where it's just about a few useful packages that you

00:25.240 --> 00:27.080
should still know if you are dealing with pipes.

00:27.420 --> 00:31.100
Of course, there are an extremely large number of packages that are

00:31.100 --> 00:35.080
interesting, but here is again a small selection of which are useful

00:35.080 --> 00:36.040
for certain questions.

00:38.000 --> 00:41.600
One main part with pipes or what, if you work with it a little longer,

00:42.600 --> 00:46.540
often comes up also in comparison to other programming languages

00:46.540 --> 00:48.700
​​that are pre-compiled, i.e.

00:48.760 --> 00:53.720
statically compiled languages ​​like C or C++, is that with larger

00:53.720 --> 00:59.220
projects pipes are sometimes comes to speed limits because it is

00:59.220 --> 01:04.460
interpreted first and there are also some solutions to improve that if

01:04.460 --> 01:10.740
you have a certain algorithm or a function in it that consumes the

01:10.740 --> 01:14.120
main part of the time and, for example, is called up again and again

01:14.120 --> 01:16.820
in such an iteration .

01:16.820 --> 01:19.280
There are some possibilities how to accelerate pipe code.

01:20.960 --> 01:24.560
There are now three examples shown here.

01:24.820 --> 01:28.220
So there are really several ways and several people have already come

01:28.220 --> 01:29.360
up with solutions to this in great packages.

01:31.520 --> 01:35.440
The first is what I would like to briefly mention is called Siphon.

01:36.660 --> 01:39.080
The idea is actually already a bit in the name.

01:39.540 --> 01:45.320
So the P of pipe was exchanged with the big C and that is also the

01:45.320 --> 01:46.440
background of the package.

01:46.860 --> 01:55.240
You can convert pipe code into C code and then compile it and either

01:55.240 --> 02:03.580
make a DLL on a Windows computer or such executable pre-compiled files

02:03.580 --> 02:07.360
on other platforms and thus of course reduce the execution time again

02:07.360 --> 02:07.760
very much.

02:09.900 --> 02:16.260
What is also possible in Siphon is that you have a mixture of pipe

02:16.260 --> 02:19.740
code and pipe syntax and C syntax.

02:20.380 --> 02:24.680
The language itself is also called Siphon and you can then write C

02:24.680 --> 02:30.700
code in a very pipe-like style and then embed it directly into your

02:30.700 --> 02:36.040
larger scripts of pipes, so to speak, and then it is also

02:36.040 --> 02:38.120
automatically compiled.

02:38.960 --> 02:40.520
Exactly, you can go into the overview here for a moment.

02:41.560 --> 02:45.480
So it's basically a static compiler that optimizes pipe code.

02:46.360 --> 02:50.720
So you can either start directly with pipe code and then execute a few

02:50.720 --> 02:57.000
Siphon commands and then improve this code or, as I said, go all the

02:57.000 --> 03:04.940
way to the direct C code and write it in C and then pre-compile and

03:04.940 --> 03:06.480
then accelerate the execution time.

03:07.520 --> 03:10.500
So this is a very popular package that is in use a lot.

03:11.840 --> 03:18.080
In general, if you already have certain things in C code and want to

03:18.080 --> 03:22.780
use them in your pipe environment, of course that is also possible.

03:22.780 --> 03:28.920
This is already possible with the standard C interface of Python and

03:28.920 --> 03:30.220
there is a nice explanation of how to do this here in these SciPy

03:30.220 --> 03:31.300
lecture notes.

03:35.140 --> 03:38.060
There are also several options for how to embed C code.

03:39.340 --> 03:45.820
I have also used it myself that you use this Python C API and that is

03:45.820 --> 03:49.780
explained here in principle, so how to build it together.

03:50.060 --> 03:56.160
There are also nice examples and you can actually write C code and

03:56.160 --> 04:00.740
then use certain options that are explained here to embed it directly

04:00.740 --> 04:03.060
into your Python script, so to speak.

04:05.120 --> 04:07.880
This is also possible, for example, with numpy.

04:09.360 --> 04:10.320
Just that you heard that.

04:12.060 --> 04:16.840
Otherwise there is still the possibility, so if we now use the pre

04:16.840 --> 04:22.980
-compiled static programming language a bit, there is also the

04:22.980 --> 04:27.600
possibility to parallelize code, of course, and there is actually the

04:27.600 --> 04:34.320
most famous library that you can use for Python, Dask, and that is

04:34.320 --> 04:37.920
really such a parallel computing package.

04:38.960 --> 04:44.900
You can basically analyze a Python code that you have and then it is

04:44.900 --> 04:50.060
already optimized in parallelizable processes.

04:51.920 --> 04:56.300
There is also a nice tutorial that you can take a look at.

04:56.420 --> 05:05.620
There is also a great documentation where a few overviews are shown,

05:05.940 --> 05:07.740
how they distribute everything and so on.

05:09.580 --> 05:15.320
And with the package you can basically run your code on parallel

05:15.320 --> 05:17.280
processes quite automatically.

05:21.920 --> 05:27.420
The last package that is also used here and there is numba.

05:27.960 --> 05:30.540
This is another possibility to actually recompile and make numpy

05:30.540 --> 05:35.940
Python code executable again.

05:37.160 --> 05:42.200
And yes, colleagues of mine also use that to accelerate certain larger

05:42.200 --> 05:43.840
algorithms, so to speak.

05:45.300 --> 05:50.400
So if you get the feedback that Python is so terribly slow and

05:50.400 --> 05:56.980
therefore actually not applicable for certain problems, you now have a

05:56.980 --> 06:00.300
few tips here with which you can then accelerate the question.

06:01.300 --> 06:04.180
So a lot of people have already thought about it and the packages,

06:04.660 --> 06:09.660
namely Siphon, Dask and Numba, are very, very mature and there are

06:09.660 --> 06:14.200
also large communities where you can see in tutorials how to

06:14.200 --> 06:17.440
accelerate certain problems either by pre-compiling or by

06:17.440 --> 06:18.240
parallelizing.

06:20.440 --> 06:21.960
That would be one topic.

06:23.720 --> 06:26.540
Then again a bit in the direction of machine learning.

06:26.760 --> 06:35.840
So we had previously got to know the packet Scikit-Learn, which was

06:35.840 --> 06:41.840
mainly developed for learning machine learning and self-study, but is

06:41.840 --> 06:42.420
also used productively.

06:42.420 --> 06:49.020
But there are other machine learning packages that go in the direction

06:49.020 --> 06:54.380
of scalability and acceleration and also more in the direction of deep

06:54.380 --> 06:54.680
learning.

06:55.720 --> 06:59.340
These are the most famous packages TensorFlow and Keras.

07:00.960 --> 07:11.280
Keras is basically a user-friendly user interface, you could say, to

07:11.280 --> 07:12.760
create such deep learning networks.

07:13.840 --> 07:18.260
And then it builds on two backends, that is either Theano or

07:18.260 --> 07:18.700
TensorFlow.

07:19.560 --> 07:23.100
And these are then backends that deal with tensor calculations.

07:25.420 --> 07:31.660
And yes, that's all written in Python, partly pre-compiled and so the

07:31.660 --> 07:35.540
state -of-the-art machine learning packages that are then really used

07:35.540 --> 07:43.060
in the big scale to deploy and operate larger machine learning

07:43.060 --> 07:43.060
applications.

07:43.720 --> 07:45.640
So that you just heard the name.

07:46.440 --> 07:50.420
And if you want to go deeper, there is of course also a nice

07:50.420 --> 07:53.660
documentation with Keras.

07:54.180 --> 07:59.660
So these are really packages that are actively maintained and are in

07:59.660 --> 08:00.940
use in many, many companies.

08:02.460 --> 08:07.060
And then there is also a so-called get-it-started and how that works

08:07.060 --> 08:07.720
with the API.

08:08.860 --> 08:12.780
So if you want to go deeper into machine learning, these are actually

08:12.780 --> 08:14.800
the packages that are used in practice.

08:21.000 --> 08:26.080
There is also such a special area, the probabilistic programming.

08:27.000 --> 08:33.140
This is about trying to determine parameter distribution with Bayes

08:33.140 --> 08:33.140
methods.

08:34.720 --> 08:39.240
This is also available for the field of machine learning and Gauss

08:39.240 --> 08:40.120
process modeling.

08:40.680 --> 08:43.500
And there is a package, if you want to deal with something like that,

08:44.360 --> 08:45.300
which I can highly recommend.

08:45.520 --> 08:47.160
This is the PyMC3 package.

08:48.600 --> 08:54.260
And the big difference to the well-known approach to mathematical

08:54.260 --> 08:59.760
optimization or the frequentist approach is that with the Bayesian

08:59.760 --> 09:05.200
approach, which is another main category of statistics, it basically

09:05.200 --> 09:10.540
accepts a few prior distributions of its unknown parameters and then

09:10.540 --> 09:15.340
makes a Bayes inference based on data and then gets the posterior

09:15.340 --> 09:17.340
distribution of these parameters.

09:19.080 --> 09:21.960
So in principle, you don't just get a point estimate like from a

09:21.960 --> 09:25.880
classic optimizer for its question, but really such a full

09:25.880 --> 09:26.380
distribution.

09:27.620 --> 09:32.980
And the package is also very, very well documented and you also get a

09:32.980 --> 09:36.860
lot of start-up and start-up tutorials to understand the method behind

09:36.860 --> 09:37.160
it.

09:38.800 --> 09:45.000
As I said, this is such a statistical area that will come into

09:45.000 --> 09:47.400
practice much more in the near future.

09:47.940 --> 09:50.520
So this is now also being used more in practice.

09:51.960 --> 09:54.180
It's a bit out of the pure theory drawer.

09:55.420 --> 10:01.820
And yes, there will soon be much more broad distribution.

10:02.780 --> 10:06.260
And you can take a look at the package if you want to take a look at

10:06.260 --> 10:08.560
this probabilistic programming.

10:11.160 --> 10:15.320
This is also written in Python and is one of the leading packages

10:15.320 --> 10:17.240
where you can find the most up-to-date algorithms.

10:20.840 --> 10:26.620
And finally, another package that is also very interesting for

10:26.620 --> 10:27.820
statistical modeling.

10:28.120 --> 10:29.580
This is called StatsModels.

10:30.100 --> 10:32.860
This is such a sister package to Scikit-learn.

10:34.120 --> 10:38.780
And there are really many, here we are more in the frequentist area of

10:38.780 --> 10:42.200
​​statistics, so many classic statistical methods are implemented

10:42.200 --> 10:43.240
there in algorithms.

10:44.000 --> 10:47.600
So then of course you get linear regression, but also robust

10:47.600 --> 10:50.940
estimators, Huber estimators, for example.

10:51.780 --> 10:57.420
Statistical tests, there are also such online estimators that you can

10:57.420 --> 10:59.820
work with and time-lapse modeling.

11:00.360 --> 11:04.000
And this package also has a really great documentation.

11:08.780 --> 11:11.140
Let's see, what about the website?

11:12.240 --> 11:12.640
There it is.

11:14.460 --> 11:17.100
So you can easily install that via Conta, of course.

11:19.940 --> 11:24.940
And then you also have hypothesis tests that you can perform.

11:27.680 --> 11:29.820
Ordinary least squares has been shown here.

11:38.670 --> 11:43.720
And here you will find all kinds of estimates that are in there.

11:44.040 --> 11:48.080
So with the regression and linear models you have linear regression,

11:48.680 --> 11:52.820
but also such robust methods, for example ANOVA methods.

11:53.660 --> 11:55.740
So there you will find a lot already implemented.

11:56.320 --> 11:57.660
Also here in the time-lapse modeling.

11:58.000 --> 12:02.260
So if you want to model time-lapse data, there is a lot of it.

12:03.480 --> 12:09.720
And the package is definitely worth taking a look at if you want to

12:09.720 --> 12:10.500
continue working in modeling.

12:11.420 --> 12:13.360
A lot has already been implemented.

12:13.780 --> 12:15.840
And there are also a few datasets and so on.

12:17.560 --> 12:23.120
Exactly, that would actually be this very short part here, just a few

12:23.120 --> 12:28.480
packages that are very well documented and are used very, very often

12:29.360 --> 12:30.920
and are actively being developed.

12:31.700 --> 12:39.400
And also packages that work on topics that are simply very much in

12:39.400 --> 12:45.040
demand at the moment, such as deep learning or parallelization of any

12:45.040 --> 12:45.760
algorithms.

12:46.840 --> 12:54.840
Or here this frequentist, classic statistical modeling, or modeling

12:54.840 --> 12:56.900
with the Bayes process.

12:58.600 --> 13:01.780
Then thank you for listening and then we'll see you in the next main part.

