Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2023/2024
UPOZORNĚNÍ: Jsou dostupné studijní plány pro následující akademický rok.

Informatics 4

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
155IN4G Z,ZK 5 2P+2C Czech
Garant předmětu:
Jan Pytel
Lecturer:
Jan Pytel
Tutor:
Jan Pytel
Supervisor:
Department of Geomatics
Synopsis:

In the course, students are introduced to techniques how to handle big amount of data. The course starts with data preprocessing by command tools before import into DB. The focus is related to relation databases, NoSQL databases, ElasticSearch, R and cloud.

Requirements:

Informatika 2 and Informatika 3

Syllabus of lectures:

1. BigData - evolution and basic concepts

2. Data preprocessing by command line tools

3. Data preprocessing by command line tools 2

4. Relational SQL databases - indexes, partitioning, performance tuning, ACID

5. NoSQL database - concepts

6. NoSQL database - Apache Cassandra

7. NoSQL database - graph databases (Neo4j), document oriented databases

8. Cloud basics

9. Installation of NoSQL database into cloud - hands on redundancy, CAP Theorem

10. Apache ecosystem I: Hadoop, HBase, Sparc, Pig

11. ElasticSearch

12. Statistical language R

13. Statistical language R - in connection with Apache Sparc

Syllabus of tutorials:

1. BigData - evolution and basic concepts

2. Data preprocessing by command line tools

3. Data preprocessing by command line tools 2

4. Relational SQL databases - indexes, partitioning, performance tuning, ACID

5. NoSQL database - concepts

6. NoSQL database - Apache Cassandra

7. NoSQL database - graph databases (Neo4j), document oriented databases

8. Cloud basics

9. Installation of NoSQL database into cloud - hands on redundancy, CAP Theorem

10. Apache ecosystem I: Hadoop, HBase, Sparc, Pig

11. ElasticSearch

12. Statistical language R

13. Statistical language R - in connection with Apache Sparc

Study Objective:

Target is make students familiar with techniques and tools which can be used for processing large amount of data. Also students will have good understanding how NoSQL databases work.

Study materials:

:Apache Cassandra/Hadoop/HBase/Sparc/Pig - http://www.apache.org/

:Neo4j - https://neo4j.com/

:ElasticSearch - https://www.elastic.co/

:Language R - https://www.r-project.org

Note:
Time-table for winter semester 2023/2024:
Time-table is not available yet
Time-table for summer semester 2023/2024:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2024-04-18
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet7546306.html