Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2023/2024
UPOZORNĚNÍ: Jsou dostupné studijní plány pro následující akademický rok.

Big Data tools and architecture

The course is not on the list Without time-table
Code Completion Credits Range Language
18BIG Z 3 1P+1C Czech
Garant předmětu:
Lecturer:
Tutor:
Supervisor:
Department of Software Engineering
Synopsis:

Practically oriented course, after completing which the student will understand the basic tools and procedures used in modern Big Data repositories – Lakehouses. The student will have a basic understanding of integration with other systems (data consumption and data provisioning), understand the architecture of modern analytics platforms with respect to the business data model, data governance, orchestration and freshness of data. The course will also introduce the Spark distributed computing framework, machine learning model management tools (MLOps) and data visualization.

Requirements:

Knowledge of SQL databases is a benefit.

Syllabus of lectures:

1. Basic description and evolution of DWH, Data Lake, Lakehouse, PaaS, IaaS, Business Driven Development

2. Data sources and their advantages: queues (Kafka, Event Hub), object storage, JDBC, some integration patterns

3. Lakehouse system design, medallion architecture layers, their purpose

4. Creating a Core Business Model in Silver layer, physical model, DDL SQL, dbt

5. Data governance: security, sharing models, sensitivity, quality, data lineage

6. Data flows, Spark processing, scaling, impact of distributed computing

7. Orchestration: different approaches, change data capture. Tools: Airflow, Delta Live Tables, Dagster

8. NoSQL, Key-Value stores (CosmosDB, Redis Cache), Operational Data Store

9. MLOps - versioning, monitoring of ML models, their orchestration

10. Data presentation and visualization: PowerBI

Syllabus of tutorials:
Study Objective:

The aim of the course is to introduce students to modern technological approaches and tools for working with big data.

Study materials:

Recommended literature:

[1] Bill Inmon, Building the Data Lakehouse

[2] The Big Book of Data Engineering 2nd Edition - A collection of technical blogs, including code samples and notebooks

Note:
Further information:
No time-table has been prepared for this course
The course is a part of the following study plans:
Data valid to 2024-05-03
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet7917006.html