Data Preprocessing
Code | Completion | Credits | Range | Language |
---|---|---|---|---|
NI-PDD | Z,ZK | 5 | 2P+1C | Czech |
- Garant předmětu:
- Marcel Jiřina
- Lecturer:
- Marcel Jiřina
- Tutor:
- Magda Friedjungová, Marcel Jiřina, Daniel Vašata
- Supervisor:
- Department of Applied Mathematics
- Synopsis:
-
Students learn to prepare raw data for further processing and analysis. They learn what algorithms can be used to extract information from various data sources, such as images, texts, time series, etc., and learn the skills to apply these theoretical concepts to solve specific problems in individual projects - e.g., extraction of characteristics from images or from web pages.
- Requirements:
-
Fundamentals of statistics, FCD course in data mining.
The recommended prerequisite is BIE-VZD.
- Syllabus of lectures:
-
1. Introduction, KDDM standards, CRISP-DM, DM software.
2. Visualization and data exploration.
3. Methods for determining the significance of features.
4. Problems in data: preparation, representation, validation, cleaning, missing values, date format, conversion of non-numeric data.
5. Problems in data: discretization / binning, outliers, cluster analysis, false predictors, group balancing, transformation, sampling.
6. Data reduction: nearest neighbor rule, boundaries between groups, CNN, distance graphs, Wilson editing, multi-edit method.
7. Data reduction: class balancing, Tomek links, SMOTE method, extended nearest neighbor rule.
8. Design methods PCA, ICA, LDA.
9. Preprocessing of time series and extraction of features.
10. Text preprocessing and feature extraction.
11. Image preprocessing and feature extraction: image description, filtering, edge detection, Fourier transform.
12. Image preprocessing and feature extraction: edge and area segmentation, description of objects in the image, feature and structural methods.
- Syllabus of tutorials:
-
1. Assignment of course projects.
2. Consultations.
3. Presentation of course projects.
- Study Objective:
-
Data preprocessing is crucial for successful data processing and takes a lot of time - usually more than the data processing itself. Knowledge of algorithms for extraction of parameters from various data sources is a fundamental part of knowledge engineering,
- Study materials:
-
1. Pyle, D. : Data Preparation for Data Mining. Morgan Kaufmann, 1999. ISBN 1558605290.
2. Guyon, I. - Gunn, S. - Nikravesh, M. - Zadeh, L. A. : Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer, 2006. ISBN 3540354875.
3. García , S. - Luengo, J. - Herrera F. : Data Preprocessing in Data Mining (Intelligent Systems Reference Library). Springer, 2015. ISBN 978-3319102467.
4. Blokdyk, G. : Data pre-processing (2nd Edition). CreateSpace Independent Publishing Platform, 2018. ISBN 978-1987493245.
- Note:
- Further information:
- https://courses.fit.cvut.cz/MI-PDD/
- Time-table for winter semester 2024/2025:
- Time-table is not available yet
- Time-table for summer semester 2024/2025:
- Time-table is not available yet
- The course is a part of the following study plans:
-
- Bachelor program Informatics, unspecified branch, in Czech, 2015-2020 (elective course)
- Bachelor branch Security and Information Technology, in Czech, 2015-2020 (elective course)
- Bachelor branch Computer Science, in Czech, 2015-2020 (elective course)
- Bachelor branch Computer Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Information Systems and Management, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Software Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Web Engineering, in Czech, 2015-2020 (elective course)
- Bachelor branch Web and Software Engineering, spec. Computer Graphics, in Czech, 2015-2020 (elective course)
- Master specialization Computer Science, in Czech, 2018-2019 (elective course)
- Bachelor branch Knowledge Engineering, in Czech, 2018-2020 (elective course)
- Master specialization Computer Security, in Czech, 2020 (elective course)
- Master specialization Design and Programming of Embedded Systems, in Czech, 2020 (elective course)
- Master specialization Computer Systems and Networks, in Czech, 202 (elective course)
- Master specialization Management Informatics, in Czech, 2020 (elective course)
- Master specialization Software Engineering, in Czech, 2020 (elective course)
- Master specialization System Programming, in Czech, version from 2020 (elective course)
- Master specialization Web Engineering, in Czech, 2020 (elective course)
- Master specialization Knowledge Engineering, in Czech, 2020 (PS)
- Master specialization Computer Science, in Czech, 2020 (elective course)
- Mgr. programme, for the phase of study without specialisation, ver. for 2020 and higher (VO, elective course)
- Bachelor specialization Information Security, in Czech, 2021 (elective course)
- Bachelor specialization Management Informatics, in Czech, 2021 (elective course)
- Bachelor specialization Computer Graphics, in Czech, 2021 (elective course)
- Bachelor specialization Computer Engineering, in Czech, 2021 (elective course)
- Bachelor program, unspecified specialization, in Czech, 2021 (elective course)
- Bachelor specialization Web Engineering, in Czech, 2021 (elective course)
- Bachelor specialization Artificial Intelligence, in Czech, 2021 (elective course)
- Bachelor specialization Computer Science, in Czech, 2021 (elective course)
- Bachelor specialization Software Engineering, in Czech, 2021 (elective course)
- Bachelor specialization Computer Systems and Virtualization, in Czech, 2021 (elective course)
- Bachelor specialization Computer Networks and Internet, in Czech, 2021 (elective course)
- Study plan for Ukrainian refugees (elective course)
- Master Specialization Digital Business Engineering, 2023 (VO)
- Master specialization System Programming, in Czech, version from 2023 (elective course)
- Master specialization Computer Science, in Czech, 2023 (elective course)
- Bachelor specialization Information Security, in Czech, 2024 (elective course)
- Bachelor program, unspecified specialization, in Czech, 2024 (elective course)
- Bachelor specialization Management Informatics, in Czech, 2024 (elective course)
- Bachelor specialization Computer Graphics, in Czech, 2024 (elective course)
- Bachelor specialization Software Engineering, in Czech, 2024 (elective course)
- Bachelor specialization Web Engineering, in Czech, 2024 (elective course)
- Bachelor specialization Computer Networks and Internet, in Czech, 2024 (elective course)
- Bachelor specialization Computer Engineering, in Czech, 2024 (elective course)
- Bachelor specialization Computer Systems and Virtualization, in Czech, 2024 (elective course)
- Bachelor specialization Artificial Intelligence, in Czech, 2024 (elective course)
- Bachelor specialization Computer Science, in Czech, 20214 (elective course)