Distributed Data Mining
Code | Completion | Credits | Range | Language |
---|---|---|---|---|
MI-DDM | KZ | 4 | 3C | English |
- Course guarantor:
- Lecturer:
- Tutor:
- Supervisor:
- Department of Applied Mathematics
- Synopsis:
-
Course focuses on state-of-the-art approaches for distributed data mining and parallelization of machine learning algorithms. Students will gain hands on experience with large scale data processing framework Apache Spark and with existing distributed DM / ML algorithms. They will learn principles of their parallel implementations and will be capable to propose approaches to parallelize other algorithms.
The course is prezented in czech language.
- Requirements:
-
Knowledge of at least one of the programming languages Python, Java or Scala. Knowledge of fundamentals of machine learning algorithms.
- Syllabus of lectures:
- Syllabus of tutorials:
-
1) Introduction to MapReduce, Apache Spark and cluster infrastructure
2) Data structures of Apache Spark framework: RDDs, Dataframes, Datasets
3) Apache Spark ML pipelines, ML Lib
4) Distributed data, data exploration, basic statistics
5) Distributed data-preprocessing (feature extraction and transformation, feature selection, dimensionality reduction)
6) Association rule mining, collaborative filtering, alternating least squares
7) Distributed classification and regression algorithms
8) Distributed clustering algorithms
9) Distributed ensemble algorithms
10) Algorithms for information retrieval and text mining
11) Deep learning and artificial neural networks
12) Stream processing, online algorithms
- Study Objective:
- Study materials:
-
Pentreath, Nick. Machine Learning with Spark. Packt Publishing Ltd, 2015.
- Note:
- Further information:
- https://courses.fit.cvut.cz/MI-DDM/
- No time-table has been prepared for this course
- The course is a part of the following study plans:
-
- Master branch Knowledge Engineering, in Czech, 2016-2017 (elective course)
- Master branch Computer Security, in Czech, 2016-2019 (elective course)
- Master branch Computer Systems and Networks, in Czech, 2016-2019 (elective course)
- Master branch Design and Programming of Embedded Systems, in Czech, 2016-2019 (elective course)
- Master branch Web and Software Engineering, spec. Info. Systems and Management, in Czech, 2016-2019 (elective course)
- Master branch Web and Software Engineering, spec. Software Engineering, in Czech, 2016-2019 (elective course)
- Master branch Web and Software Engineering, spec. Web Engineering, in Czech, 2016-2019 (elective course)
- Master program Informatics, unspecified branch, in Czech, version 2016-2019 (elective course)
- Master branch System Programming, spec. System Programming, in Czech, 2016-2019 (elective course)
- Master branch System Programming, spec. Computer Science, in Czech, 2016-2017 (elective course)
- Master specialization Computer Science, in Czech, 2018-2019 (elective course)
- Master branch Knowledge Engineering, in Czech, 2018-2019 (elective course)