Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2018/2019

Distributed Data Mining

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
MI-DDM KZ 4 0+3
Lecturer:
Tomáš Borovička (guarantor), Ondřej Stuchlík
Tutor:
Tomáš Borovička (guarantor), Ondřej Stuchlík
Supervisor:
Department of Applied Mathematics
Synopsis:

Course focuses on state-of-the-art approaches for distributed data mining and parallelization of machine learning algorithms. Students will gain hands on experience with large scale data processing framework Apache Spark and with existing distributed DM / ML algorithms. They will learn principles of their parallel implementations and will be capable to propose approaches to parallelize other algorithms.

Requirements:

Knowledge of at least one of the programming languages Python, Java or Scala. Knowledge of fundamentals of machine learning algorithms.

Syllabus of lectures:
Syllabus of tutorials:

1) Introduction to MapReduce, Apache Spark and cluster infrastructure

2) Data structures of Apache Spark framework: RDDs, Dataframes, Datasets

3) Apache Spark ML pipelines, ML Lib

4) Distributed data, data exploration, basic statistics

5) Distributed data-preprocessing (feature extraction and transformation, feature selection, dimensionality reduction)

6) Association rule mining, collaborative filtering, alternating least squares

7) Distributed classification and regression algorithms

8) Distributed clustering algorithms

9) Distributed ensemble algorithms

10) Algorithms for information retrieval and text mining

11) Deep learning and artificial neural networks

12) Stream processing, online algorithms

Study Objective:
Study materials:

Pentreath, Nick. Machine Learning with Spark. Packt Publishing Ltd, 2015.

Note:
Time-table for winter semester 2018/2019:
Time-table is not available yet
Time-table for summer semester 2018/2019:
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon
roomT9:349
Borovička T.
Stuchlík O.

16:15–18:45
(parallel nr.101)
Dejvice
NBFIT PC učebna
Tue
Fri
Thu
Fri
The course is a part of the following study plans:
Data valid to 2019-04-18
For updated information see http://bilakniha.cvut.cz/en/predmet5463206.html