DB Technologies for Big Data

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
BI-BIG.21 KZ 5 2P+2C Czech
Garant předmětu:
Monika Borkovcová
Monika Borkovcová
Monika Borkovcová, Jan Matoušek
Department of Software Engineering

Students will be introduced into the field of Big Data processing where nonrelational (NoSQL) database engines are typically used today. The course is focused practically so that after finishing the course students were able to choose suitable tools (mostly open source) and techniques,design and implement a simplest reproducible method of data processing (data collection, transformation/aggregation, presentation). Students get acquainted with various architectures for processing and storing big data. A theoretical foundation and presentation of individual technologies will be supplemented with specific examples from practice.


Basic knowledge of relational databases, working with the command line, knowledge of Docker technology is recommended.

Syllabus of lectures:

1. Introduction to the subject, distributed solutions, basic concepts (Big Data, cluster, distributed file systems, CAP theorem,...)

2. NoSQL key-value database (Redis)

3. NoSQL document database (MongoDB)

4. NoSQL columnar database (Apache Cassandra)

5-6. NoSQL graph database (Neo4j)

7-9. The Elastic Stack (Elasticsearch, Beats, Logstash, Kibana)

10. Hadoop Ecosystem (Hadoop, Map Reduce, HDFS, YARN)

11-12. Apache Spark

13. The Credit test

Syllabus of tutorials:

1. Introduction to the laboratory environment

2. Introduction to working with Cassandra Cluster

3. Basics of Redis

4. MongoDB Basics

5. Basics of Apache Cassandra

6. Basics of Neo4j

7. Basics of Elasticsearch

8. Ways and possibilities of data presentation using ELK Stack

9. Basics of working with Apache Spark, use of the Scala language

10. Practical workshop on a selected topic

11. Consultation on semester work

12. Defense of semester work - 1st part

13. Defense of semester work - 2nd part

Study Objective:

After completing this course, the student will be able to distinguish between individual types of noSQL databases and work with Big Data. He will be able to design and implement suitable solutions for various use cases. At a slightly advanced level, they will learn to work with key-value, document, column and graph NoSQL databases. The Elastic Stack ecosystem and a basic overview of the Hadoop ecosystem (Map Reduce, HDFS, YARN, Apache Spark) are part of practical and theoretical knowledge. As part of the course, the student will learn about data visualization options and the process of cleaning and transforming various data sets.

Study materials:

1. Holubová Irena, Minařík Karel, Novák David, Kosek Jiří. Big Data a NoSQL databáze. 2015. ISBN 978-80-247-5466-6.

2. Meier A., Kaufmann M. : SQL & NoSQL Databases. Springer, 2019. ISBN 978-3-658-24549-8.

3. Bradshaw S., Brazil E., Chodorow Ch. : MongoDB: The Defnitive Guide: Powerful and Scalable Data Storage. O'Reilly Media, 2019. ISBN 9781491954461.

4. https://redis.io

5. https://cassandra.apache.org/

6. https://neo4j.com/

7. https://www.mongodb.com/

8. https://www.elastic.co/

Further information:
Time-table for winter semester 2024/2025:
Time-table is not available yet
Time-table for summer semester 2024/2025:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2024-06-16
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet6608206.html