Fulltext Systems

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
18FULS KZ 4 2+2 Czech
Tomáš Liška
Tomáš Liška
Department of Software Engineering

The Fulltext Systems covers methods, algorithms for free text processing including searching and compression methods.

Syllabus of lectures:

The lecture Fulltext systems provides a full exhaustive explanation of methods used for the management, searching and data mining on un-structured data. Students meet with approaches and methods of full text based searching - elementar method, sophisticated methods using pattern pre-processing and text pre-processing. Both index based and signature based methods are provided for the deep study, too.

The second non-trivial part of the un-structured data management are methods for data compression and data encoding. We study methods of Shannon-Fano, Huffman and their modifications. At last but not least we study also dictionary based methods - Lempel-Ziv-Welsch works.

Syllabus of tutorials:

1.Introduction to the Text Information Systems, basic search algorithm

2.Search algorithms, patern matching methods: KMP, AC,

3.Search algorithms, patern matching methods: BM, CW, finite automaton

4.Index based methods

5.Signature based methods

6.Data compression methods, data encoding: binary, fibonacci,

7.Data compression methods, data encoding: elias data encoding methods and other methods

8.Coding trees based methods: Shannon-Fano, Huffman,

9.Coding trees based methods: adaptive Huffman method

10.Dictionary based data compression methods: Lempel-Ziv works and modifications - principles

11.Dictionary based data compression methods: details of LZ77, LZ78, LZW and further modifications

12.Principles of data crawling and data management of huge data indexes for search

13.Principles of distributed computing management of data search and data mining

Study Objective:

The lecture provides a complex view to the methods and approaches of processing, management and analytical retrieval from mass of non-structured data. Students will deeply use their previous knowledge of the graph theory - mainly trees. The application skills of data coding principles will be further extended.

Study materials:

Key referencis:

Melichar B., Text information systems, lectures 1 and 2 Melichar B., Text information systems, case studies

Recommended referencis:

Google search engine http://google.com

Time-table for winter semester 2020/2021:
Time-table is not available yet
Time-table for summer semester 2020/2021:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2021-03-02
For updated information see http://bilakniha.cvut.cz/en/predmet24706205.html