Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2023/2024
UPOZORNĚNÍ: Jsou dostupné studijní plány pro následující akademický rok.

Distributed Data Mining

The course is not on the list Without time-table
Code Completion Credits Range Language
NI-DDM KZ 4 3C English
Garant předmětu:
Lecturer:
Tutor:
Supervisor:
Department of Applied Mathematics
Synopsis:

Course focuses on state-of-the-art approaches for distributed data mining and parallelization of machine learning algorithms. Students will gain hands on experience with large scale data processing framework Apache Spark and with existing distributed DM / ML algorithms. They will learn principles of their parallel implementations and will be capable to propose approaches to parallelize other algorithms.

The course is prezented in czech language.

Requirements:

Knowledge of at least one of the programming languages Python, Java or Scala. Knowledge of fundamentals of machine learning algorithms.

Syllabus of lectures:

There are not lectures.

Syllabus of tutorials:

1) Introduction to MapReduce, Apache Spark and cluster infrastructure

2) Data structures of Apache Spark framework: RDDs, Dataframes, Datasets

3) Apache Spark ML pipelines, ML Lib

4) Distributed data, data exploration, basic statistics

5) Distributed data-preprocessing (feature extraction and transformation, feature selection, dimensionality reduction)

6) Association rule mining, collaborative filtering, alternating least squares

7) Distributed classification and regression algorithms

8) Distributed clustering algorithms

9) Distributed ensemble algorithms

10) Algorithms for information retrieval and text mining

11) Deep learning and artificial neural networks

12) Stream processing, online algorithms

Study Objective:
Study materials:

Pentreath, Nick. Machine Learning with Spark. Packt Publishing Ltd, 2015.

Note:
Further information:
https://courses.fit.cvut.cz/MI-DDM/
No time-table has been prepared for this course
The course is a part of the following study plans:
Data valid to 2024-03-27
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet6175606.html