Data Mining
Code  Completion  Credits  Range  Language 

BIVZD  Z,ZK  4  2P+2C  Czech 
 Lecturer:
 Daniel Vašata, Karel Klouda
 Tutor:
 Daniel Vašata, Klára Hájková, Karel Klouda
 Supervisor:
 Department of Applied Mathematics
 Synopsis:

Students are introduced to the basic methods of discovering knowledge in data. In particular, they learn the basic techniques of data preprocessing, multidimensional data visualization, statistical techniques of data transformation, and fundamental principles of knowledge discovery methods. Students will be aware of the relationships between model bias and variance, and know the fundamentals of assessing model quality. Data mining software is extensively used in the module. Students will be able to apply basic data mining tools to common problems (classification, regression, clustering).
 Requirements:

The knowledge of calculus, linear algebra and probability theory is assumed.
 Syllabus of lectures:

1. Introduction to the field and applications
2. Decision trees, test, train, validation set
3. Ensemble methods (random forest, AdaBoost)
4. Hierarchical clustering, kmeans algorithm
5. kNN (knearest neighbours)
6. Naive Bayes
7. Linear regression
8. Logistic regression
9. Ridge regression and regularisation
10. Dimensionality reduction
11. Neural networks
12. Natural language processing
 Syllabus of tutorials:

1. Jupyter notebooks and machine learning packages
2. Decision trees, hyperparameters tuning
3. Ensemble methods (random forest, AdaBoost)
4. Hierarchical clustering, kmeans algorithm
5. kNN (knearest neighbours), crossvalidation
6. Naive Bayes classifier
7. Linear regression
8. Logistic regression
9. Ridge regression
10. Dimensionality reduction
11. Neural networks
12. Natural language processing
 Study Objective:

The module aims to introduce students to a rapidly developing field  knowledge discovery in data.
 Study materials:

1. Data Mining: Practical Machine Learning Tools and Techniques, I. H. Witten, E. Frank, M. A. Hall, Elsevier, 2011, ISBN 9780080890364.
2. Deep Learning, I. Goodfellow, Y. Bengio, A. Courville, MIT Press, 2016, ISBN 9780262035613.
3. Machine Learning: A Probabilistic Perspective, K. P. Murphy, MIT Press, 2012, ISBN 9780262018029.
 Note:
 Further information:
 https://courses.fit.cvut.cz/BIVZD/
 Timetable for winter semester 2019/2020:

06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon Tue Fri Thu Fri  Timetable for summer semester 2019/2020:
 Timetable is not available yet
 The course is a part of the following study plans:

 Information Technology  Version for those who Enrolled in 2014 (in Czech) (elective course)
 Information Systems and Management  Version for those who Enrolled in 2014 (in Czech) (elective course)
 Bc. Programme Informatics, in Czech, Version 2015 to 2019 (VO)
 Bc. Branch Security and Information Technology, in Czech, Version 2015 to 2019 (elective course)
 Bc. Branch Computer Science, in Czech, Version 2015 to 2019 (compulsory course of the specialization)
 Bc. Branch Computer Engineering, in Czech, Version 2015 to 2019 (elective course)
 Bachelor Branch Information Systems and Management, in Czech, Version 2015 to 2019 (elective course)
 Bachelor Branch Knowledge Engineering, in Czech, Version 2015, 2016 and 2017 (compulsory course of the specialization)
 Bachelor Branch WSI, Specialization Software Engineering, in Czech, Version 2015 to 2019 (elective course)
 Bachelor Branch, Specialization Web Engineering, in Czech, Version 2015 to 2019 (elective course)
 Bachelor Branch WSI, Specialization Computer Grafics, in Czech, Version 2015 to 2019 (elective course)
 Bachelor Branch Knowledge Engineering, in Czech, Version 2018 to 2019 (compulsory course of the specialization)