Data Mining
Code | Completion | Credits | Range | Language |
---|---|---|---|---|
BI-VZD | Z,ZK | 4 | 2P+2C | Czech |
- Course guarantor:
- Pavel Kordík
- Lecturer:
- Karel Klouda, Alexander Kovalenko, Ondřej Tichý, Daniel Vašata
- Tutor:
- Karel Klouda, Alexander Kovalenko, Ivan Rychtera, Ladislava Smítková Janků, Ondřej Tichý, Daniel Vašata
- Supervisor:
- Department of Applied Mathematics
- Synopsis:
-
Students are introduced to the basic methods of discovering knowledge in data. In particular, they learn the basic techniques of data preprocessing, multidimensional data visualization, statistical techniques of data transformation, and fundamental principles of knowledge discovery methods. Students will be aware of the relationships between model bias and variance, and know the fundamentals of assessing model quality. Data mining software is extensively used in the module. Students will be able to apply basic data mining tools to common problems (classification, regression, clustering).
- Requirements:
-
The knowledge of calculus, linear algebra and probability theory is assumed.
- Syllabus of lectures:
-
1. Introduction to the field and applications
2. Decision trees, test, train, validation set
3. Ensemble methods (random forest, AdaBoost)
4. Hierarchical clustering, k-means algorithm
5. kNN (k-nearest neighbours)
6. Naive Bayes
7. Linear regression
8. Logistic regression
9. Ridge regression and regularisation
10. Dimensionality reduction
11. Neural networks
12. Natural language processing
- Syllabus of tutorials:
-
1. Jupyter notebooks and machine learning packages
2. Decision trees, hyperparameters tuning
3. Ensemble methods (random forest, AdaBoost)
4. Hierarchical clustering, k-means algorithm
5. kNN (k-nearest neighbours), cross-validation
6. Naive Bayes classifier
7. Linear regression
8. Logistic regression
9. Ridge regression
10. Dimensionality reduction
11. Neural networks
12. Natural language processing
- Study Objective:
-
The module aims to introduce students to a rapidly developing field - knowledge discovery in data.
- Study materials:
-
1. Data Mining: Practical Machine Learning Tools and Techniques, I. H. Witten, E. Frank, M. A. Hall, Elsevier, 2011, ISBN 978-0080890364.
2. Deep Learning, I. Goodfellow, Y. Bengio, A. Courville, MIT Press, 2016, ISBN 978-0262035613.
3. Machine Learning: A Probabilistic Perspective, K. P. Murphy, MIT Press, 2012, ISBN 978-0262018029.
- Note:
- Further information:
- https://courses.fit.cvut.cz/BI-VZD/
- Time-table for winter semester 2024/2025:
- Time-table is not available yet
- Time-table for summer semester 2024/2025:
- Time-table is not available yet
- The course is a part of the following study plans:
-
- Bachelor program Informatics, unspecified branch, in Czech, 2015-2020 (VO)
- Bachelor branch Security and Information Technology, in Czech, 2015-2020 (elective course)
- Bachelor branch Computer Science, in Czech, 2015-2020 (compulsory course of the specialization)
- Bachelor branch Computer Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Information Systems and Management, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Software Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Web Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Computer Graphics, in Czech, 2015-2020 (elective course)
- Bachelor branch Knowledge Engineering, in Czech, 2018-2020 (compulsory course of the specialization)