Big Data tools and architecture
Code | Completion | Credits | Range | Language |
---|---|---|---|---|
18BIG | Z | 3 | 1P+1C | Czech |
- Course guarantor:
- Petr Pokorný
- Lecturer:
- Petr Pokorný
- Tutor:
- Petr Pokorný
- Supervisor:
- Department of Software Engineering
- Synopsis:
-
Practically oriented course, after completing which the student will understand the basic tools and procedures used in modern Big Data repositories – Lakehouses. The student will have a basic understanding of integration with other systems (data consumption and data provisioning), understand the architecture of modern analytics platforms with respect to the business data model, data governance, orchestration and freshness of data. The course will also introduce the Spark distributed computing framework, machine learning model management tools (MLOps) and data visualization.
- Requirements:
-
Knowledge of SQL databases is a benefit.
- Syllabus of lectures:
-
1. Basic description and evolution of DWH, Data Lake, Lakehouse, PaaS, IaaS, Business Driven Development
2. Data sources and their advantages: queues (Kafka, Event Hub), object storage, JDBC, some integration patterns
3. Lakehouse system design, medallion architecture layers, their purpose
4. Creating a Core Business Model in Silver layer, physical model, DDL SQL, dbt
5. Data governance: security, sharing models, sensitivity, quality, data lineage
6. Data flows, Spark processing, scaling, impact of distributed computing
7. Orchestration: different approaches, change data capture. Tools: Airflow, Delta Live Tables, Dagster
8. NoSQL, Key-Value stores (CosmosDB, Redis Cache), Operational Data Store
9. MLOps - versioning, monitoring of ML models, their orchestration
10. Data presentation and visualization: PowerBI
- Syllabus of tutorials:
- Study Objective:
-
The aim of the course is to introduce students to modern technological approaches and tools for working with big data.
- Study materials:
-
Recommended literature:
[1] Bill Inmon, Building the Data Lakehouse
[2] The Big Book of Data Engineering 2nd Edition - A collection of technical blogs, including code samples and notebooks
- Note:
- Time-table for winter semester 2024/2025:
-
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon Tue Wed Thu Fri - Time-table for summer semester 2024/2025:
- Time-table is not available yet
- The course is a part of the following study plans:
-
- Aplikace informatiky v přírodních vědách (elective course)