Data processing
Code | Completion | Credits | Range |
---|---|---|---|
14PD | Z,ZK | 6 | 2P+4C |
- Course guarantor:
- Martin Šrotýř
- Lecturer:
- Michal Jeřábek, Martin Šrotýř, Miroslav Vaniš
- Tutor:
- Michal Jeřábek, Martin Šrotýř, Miroslav Vaniš
- Supervisor:
- Department of Applied Informatics in Transportation
- Synopsis:
-
Students will learn about tools for data processing and analysis, using practical examples to try out the most common options used in data processing, including advanced options for presenting the results of analyses. In advanced methods, students will also perform specific analysis using Bayesian networks. Students will then independently perform data analysis on data from existing open systems.
- Requirements:
-
Ability to think logically, knowledge of the basics of algorithmization and the basics of any programming language at a level appropriate to the year of study at a technical university.
- Syllabus of lectures:
-
Part 1 introduces data processing tools and is divided into 3 blocks:
Block 1: introduction to R - environment, concept, basics, simple examples, basic libraries, examples and usage (students install R)
Block 2: applied R - applied examples from practice, map library, data retrieval from different sources and their modification (GIS, RDBMS, CSV, etc.)
Block 3: advanced R - interactive presentation module (shiny), other modules by agreement
Part 2 deals with a specific model for data processing, Bayesian networks and is also divided into 3 blocks:
Block 1: Basics of Bayesian networks, specialized software for Bayesian networks, modeling, basics of graph theory and probability.
Block 2: Preparing data for subsequent use of Bayesian networks, plotting the first Bayesian network, algorithms for network learning, parameters, inference; linking with GeNia.
Block 3: Performing inference in Bayesian networks.
- Syllabus of tutorials:
-
Part 1 introduces data processing tools and is divided into 3 blocks:
Block 1: introduction to R - environment, concept, basics, simple examples, basic libraries, examples and usage (students install R)
Block 2: applied R - applied examples from practice, map library, data retrieval from different sources and their modification (GIS, RDBMS, CSV, etc.)
Block 3: advanced R - interactive presentation module (shiny), other modules by agreement
Part 2 deals with a specific model for data processing, Bayesian networks and is also divided into 3 blocks:
Block 1: Basics of Bayesian networks, specialized software for Bayesian networks, modeling, basics of graph theory and probability.
Block 2: Preparing data for subsequent use of Bayesian networks, plotting the first Bayesian network, algorithms for network learning, parameters, inference; linking with GeNia.
Block 3: Performing inference in Bayesian networks.
- Study Objective:
-
The aim of the course is primarily to familiarize students with tools for data processing and analysis, to test the most common options used in data processing, including advanced options for presenting analysis results.
- Study materials:
-
Jan Rauch, Milan Šimůnek: Dobývání znalostí z databází, LISp-Miner a GUHA. Praha: Oeconomica VŠE, 2014.
Petr Berka: Dobývání znalostí z databází. Praha: Academia, 2003.
Irena Holubová, Karel Minařík, David Novák, Jiří Kosek: Big Data a NoSQL databáze.
Arun K. Somani, Ganesh Chandra Deka: Big Data Analytics. CRC Press, 2017.
- Note:
- Time-table for winter semester 2024/2025:
- Time-table is not available yet
- Time-table for summer semester 2024/2025:
- Time-table is not available yet
- The course is a part of the following study plans:
-
- Master Full-Time IS (CS) from 2022/23 (compulsory course)
- Master Full-Time IS (CS) from 2023/24 (compulsory course)
- Master Full-Time IS (CS) from 2024/25 (compulsory course)