Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2024/2025

Big Data tools and architecture

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
18BIG Z 3 1P+1C Czech
Course guarantor:
Petr Pokorný
Lecturer:
Petr Pokorný
Tutor:
Petr Pokorný
Supervisor:
Department of Software Engineering
Synopsis:

Practically oriented course, after completing which the student will understand the basic tools and procedures used in modern Big Data repositories – Lakehouses. The student will have a basic understanding of integration with other systems (data consumption and data provisioning), understand the architecture of modern analytics platforms with respect to the business data model, data governance, orchestration and freshness of data. The course will also introduce the Spark distributed computing framework, machine learning model management tools (MLOps) and data visualization.

Requirements:

Knowledge of SQL databases is a benefit.

Syllabus of lectures:

1. Basic description and evolution of DWH, Data Lake, Lakehouse, PaaS, IaaS, Business Driven Development

2. Data sources and their advantages: queues (Kafka, Event Hub), object storage, JDBC, some integration patterns

3. Lakehouse system design, medallion architecture layers, their purpose

4. Creating a Core Business Model in Silver layer, physical model, DDL SQL, dbt

5. Data governance: security, sharing models, sensitivity, quality, data lineage

6. Data flows, Spark processing, scaling, impact of distributed computing

7. Orchestration: different approaches, change data capture. Tools: Airflow, Delta Live Tables, Dagster

8. NoSQL, Key-Value stores (CosmosDB, Redis Cache), Operational Data Store

9. MLOps - versioning, monitoring of ML models, their orchestration

10. Data presentation and visualization: PowerBI

Syllabus of tutorials:
Study Objective:

The aim of the course is to introduce students to modern technological approaches and tools for working with big data.

Study materials:

Recommended literature:

[1] Bill Inmon, Building the Data Lakehouse

[2] The Big Book of Data Engineering 2nd Edition - A collection of technical blogs, including code samples and notebooks

Note:
Time-table for winter semester 2024/2025:
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon
Tue
Wed
roomTR:115
Pokorný P.
14:00–15:50
(lecture parallel1)
Trojanova 13
Thu
Fri
Time-table for summer semester 2024/2025:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2024-12-13
For updated information see http://bilakniha.cvut.cz/en/predmet7917006.html