CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2023/2024
UPOZORNĚNÍ: Jsou dostupné studijní plány pro následující akademický rok.

Statistical Data Analysis

Code Completion Credits Range Language
B4M36SAN Z,ZK 6 2P+2C Czech

It is not possible to register for the course B4M36SAN if the student is concurrently registered for or has already completed the course BE4M36SAN (mutually exclusive courses).

The requirement for course B4M36SAN can be fulfilled by substitution with the course BE4M36SAN.

It is not possible to register for the course B4M36SAN if the student is concurrently registered for or has previously completed the course BE4M36SAN (mutually exclusive courses).

Garant předmětu:
Jiří Kléma
Lecturer:
Jiří Kléma
Tutor:
Alikhan Anuarbekov, Jan Blaha, Jiří Kléma, Zdeněk Míkovec, Tomáš Pevný
Supervisor:
Department of Computer Science
Synopsis:

This course builds on the skills developed in introductory statistics courses. It is practically oriented and gives an introduction to applied statistics. It mainly aims at multivariate statistical analysis and modelling, i.e., the methods that help to understand, interpret, visualize and model potentially high-dimensional data. It can be seen as a purely statistical counterpart to machine learning and data mining courses.

Requirements:

The general statistical concepts covered in the course B0B01PST. The knowledge of linear classification, clustering and dimensionality reduction, see B4B33RPZ for details.

Syllabus of lectures:

1. Introduction, motivation, a course map, review of the basic statistical terms and methods.

2. Dimension reduction (PCA and kernel PCA).

3. Dimension reduction (other non-linear methods).

4. Clustering (basic methods, spectral clustering).

5. Clustering (biclustering, semi-supervised clustering)

6. Multivariate confirmation analysis (ANOVA and MANOVA).

7. Discriminant analysis (categorical dependent variable, LDA, logistic regression).

8. Multivariate regression (continuous dependent variable, linear regression, p-values, overfitting)

9. Multivariate regression (non-linear models, polynomial and local regression).

10. Anomaly detection.

11. Robust statistics.

12. Empirical studies, their design and evaluation.

13. Power analysis.

14. The final review, spare lecture.

Syllabus of tutorials:

1. Programming in R, introduction.

2. R libraries, statistical packages, learning package Swirl.

3. Data visualization in R.

4. Dimension reduction - assignment.

5. Clustering - assignment.

6. Multivariate confirmation analysis - assignment.

7. Discriminant analysis - assignment.

8. Mid-term test.

9. Multivariate linear regression - assignment.

10. Multivariate non-linear regression - assignment.

11. Anomaly detection - assignment.

12. Empirical study design - assignment.

13. Power analysis - assignment.

14. Spare lab, credits.

Study Objective:
Study materials:

1. Hair, J. F., et al.: Multivariate Data Analysis: A Global Perspective. 7th ed., Prentice Hall, 2009.

2. James, G. et al.: An Introduction to Statistical Learning with Applications in R., Springer, 2013.

Note:
Further information:
https://cw.fel.cvut.cz/wiki/courses/B4M36SAN
Time-table for winter semester 2023/2024:
 06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00 roomKN:E-301Kléma J.12:45–14:15(lecture parallel1)Karlovo nám.Šrámkova posluchárna K9roomKN:E-311Kléma J.Míkovec Z.14:30–16:00(lecture parallel1parallel nr.101)Karlovo nám.Lab K311roomKN:E-311Blaha J.Pevný T.16:15–17:45(lecture parallel1parallel nr.102)Karlovo nám.Lab K311roomKN:E-311Anuarbekov A.18:00–19:30(lecture parallel1parallel nr.103)Karlovo nám.Lab K311 roomKN:E-328Blaha J.14:30–16:00(lecture parallel1parallel nr.104)Karlovo nám.Bourací učebna
Time-table for summer semester 2023/2024:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2024-04-22
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet4702306.html