Big Data Technologies

Display time-table

Code	Completion	Credits	Range	Language
B0M33BDT	Z,ZK	4	2P+1C	Czech

Relations:

It is not possible to register for the course B0M33BDT if the student is concurrently registered for or has already completed the course BE0M33BDT (mutually exclusive courses).

It is not possible to register for the course B0M33BDT if the student is concurrently registered for or has previously completed the course BE0M33BDT (mutually exclusive courses).

The requirement for course B0M33BDT can be fulfilled by substitution with the course BE0M33BDT.

Course guarantor:

Jan Hučín, Petr Paščenko, Marek Sušický

Lecturer:

Petr Filas, Jan Hučín, Martin Oharek, Petr Paščenko, Marek Sušický

Tutor:

Alisa Benešová, Petr Filas, Jan Hučín, Michal Janeček, Martin Oharek, Petr Paščenko, Sergii Stamenov, Marek Sušický

Supervisor:

Department of Computer Science

Synopsis:

The objective of this elective course is to familiarize students with new trends and technologies for storing, management and processing of Big Data. The course will focus on methods for extraction, analysis as well as a selection of hardware infrastructure for managing persistent and streamed data, such as data from social networks. As part of the course we will present how to apply the traditional methods of artificial intelligence and machine learning to Big Data analysis.

Requirements:

Seminars will be run the standard way. We assume that students will bring their own computers for editing scripts. Calculations will be executed in the computer cluster with remote access. For practical exercises, students will use pre-loaded text database. The seminars will focus on practical application of technology to specific examples. During the semester are scheduled two short tests of subject matter.

Syllabus of lectures:

1. Introduction, Big Data processing motivation, requirements

2. Hadoop overview - all components and how they work together

i) Hadoop Common: The common utilities that support the other Hadoop modules.

ii) Hadoop Distributed File System (HDFS?): A distributed file system that provides high-throughput access to application data.

iii) Hadoop YARN: A framework for job scheduling and cluster resource management.

iv) Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

3. Introduction to MapReduce, how to use pre-installed data. Basic skeleton for running words histogram in Java

4. HDFS, NoSQL databases, HBase, Cassandra, SQL access, Hive,

5. What is Mahout, what are the basic algorithms

6. Streamed data - real time processing

7. Twitter data processing, simple sentiment algorithm

Syllabus of tutorials:

1. Cloud computing cluster OpenStack basic commands, virtualization.

2. Install hadoop, hw requirements, sw requirements, how to administer (create access), introduce to the basic setup on our cluster, how to monitor. Run the words histogram, single thread.

3. The bag of words notion, TF-IDF, run SVD, LDA.

4. Manipulation with data, how to upscale-downscale HDFS, How to run and monitor computation progres, how to organize the computation.

5. Run random forest classification task using the Mahout algorithms, show how much faster is the map reduce implementation compared to single thread on one box.

6. Semester work presentation and zápočet

Study Objective:

The goal of the course is to show on practical examples to the basic methods for processing Big Data. Examples will focus on the statistical data processing.

Study materials:

Hadoop: The Definitive Guide, 4th Edition, by Tom White

Note:

Further information:

https://cw.fel.cvut.cz/wiki/courses/B0M33BDT

Time-table for winter semester 2025/2026:

	06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon
Tue
Wed	roomKN:E-307 09:15–10:45 EVEN WEEK (lecture parallel1 parallel nr.999) Karlovo nám. roomKN:E-310 11:00–12:30 ODD WEEK (lecture parallel1 parallel nr.101) Karlovo nám. roomKN:E-310 12:45–14:15 ODD WEEK (lecture parallel1 parallel nr.102) Karlovo nám. roomKN:A-310 Paščenko P. Sušický M. 09:15–10:45 (lecture parallel1) Karlovo nám.
Thu
Fri

Time-table for summer semester 2025/2026:

Time-table is not available yet

The course is a part of the following study plans: