Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2024/2025

Analysis and Recognition of Multidimensional Data

The course is not on the list Without time-table
Code Completion Credits Range Language
F7ADTARVD ZK 5 14P+7C English
Garant předmětu:
Olga Štěpánková
Lecturer:
Olga Štěpánková
Tutor:
Václav Křemen, Olga Štěpánková, Lenka Vysloužilová
Supervisor:
Department of Natural Sciences
Synopsis:

The course offers an overview of tools for knowledge extraction from data and demonstrates their use in practical tasks using the open source tool Project R. Special attention is paid to the illustrative presentation of sequentially obtained results, which will greatly facilitate communication with the data owner (e.g. a doctor), who can then better cooperate in choosing further search directions. Clustering. Improving model quality by combining multiple base models - bagging, boosting, AdaBoost. Data dimension reduction and feature selection (e.g. PCA, ICA, factor analysis). Anomaly detection.

Requirements:

Form of verification of study results: oral examination.

As a standard, the course is taught in contact form and the course has lectures and exercises . In case the number of students is less than 5, the teaching can take place in the form of guided self-study with regular consultations. In this case, in addition to the examination, the student is required to produce a written study on the assigned topic.

For combined study:

Teaching takes the form of guided self-study with regular consultations. In addition to the examination, the student is required to prepare a written study on a given topic.

Syllabus of lectures:

1. Basic concepts for data description, machine learning and recognition: observation, symptom, symptom space, classification.

2. Knowledge mining - description and methodology of the CRISP process. Exploratory analysis and visualization of multidimensional data.

3. Clustering for modelling unclassified data - basic algorithms. Evaluation of the resulting model and its application.

4. Basic procedures for modeling classified data - nearest neighbor method, decision tree formation, and their properties. Examples of applications.

5. Measures for comparing the performance of different classification models (accuracy, specificity, ..., ROC curve). Methods for estimating model performance: cross-validation, bootstrapping, learning curve.

6. SVM data representation change. Example illustrating the use of a derived attribute to replace several others.

7. Construction of association rules for unclassified data and their use.

8. Different methods for improving the quality of processed data - identification of outliers and incorrect values. Understanding data and data preparation: procedures for discretization, normalization and completion of missing values, data aggregation.

9. Improving model quality by combining multiple base models - bagging, boosting, AdaBoost.

10. Data dimension reduction and feature selection (principal component analysis - PCA, PCA for classification tasks, factor analysis, regression, partial least squares).

11. Several strategies for testing the emerging models (multiple testing and various corrections).

12. Examples of other tools for data modelling: creation of regression trees, use of neural networks.

13. Recognition of anomalies in multivariate data.

14. Prospective topics in DM, e.g. working with structured data.

Exercises will be solved in the form of practical projects in which students will verify the knowledge acquired in lectures.

Syllabus of tutorials:

Exercises will be solved in the form of practical projects in which students will verify the knowledge acquired in lectures.

Study Objective:
Study materials:

Qurban A Memon Q.A., Khoja S. A. Data Science. Theory, Analysis and Applications. CRC Press, 2019

Recommended:

Daróczi G.: Mastering Data Analysis with R. Packt Publishing, 2015, 978-1783982028

R software volně stažitelný na https://www.r-project.org/

Note:
Further information:
No time-table has been prepared for this course
The course is a part of the following study plans:
Data valid to 2024-05-29
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet7803206.html