Fulltext systems

Code	Completion	Credits	Range	Language
18FULS	KZ	4	2+2	Czech

Lecturer:

Tomáš Liška

Tutor:

Tomáš Liška

Supervisor:

Department of Software Engineering in Economy

Synopsis:

The Fulltext Systems covers methods, algorithms for free text processing including searching and compression methods.

Requirements:

Syllabus of lectures:

The lecture Fulltext systems provides a full exhaustive explanation of methods used for the management, searching and data mining on un-structured data. Students meet with approaches and methods of full text based searching - elementar method, sophisticated methods using pattern pre-processing and text pre-processing. Both index based and signature based methods are provided for the deep study, too.

The second non-trivial part of the un-structured data management are methods for data compression and data encoding. We study methods of Shannon-Fano, Huffman and their modifications. At last but not least we study also dictionary based methods - Lempel-Ziv-Welsch works.

Syllabus of tutorials:

1.Introduction to the Text Information Systems, basic search algorithm

2.Search algorithms, patern matching methods: KMP, AC,

3.Search algorithms, patern matching methods: BM, CW, finite automaton

4.Index based methods

5.Signature based methods

6.Data compression methods, data encoding: binary, fibonacci,

7.Data compression methods, data encoding: elias data encoding methods and other methods

8.Coding trees based methods: Shannon-Fano, Huffman,

9.Coding trees based methods: adaptive Huffman method

10.Dictionary based data compression methods: Lempel-Ziv works and modifications - principles

11.Dictionary based data compression methods: details of LZ77, LZ78, LZW and further modifications

12.Principles of data crawling and data management of huge data indexes for search

13.Principles of distributed computing management of data search and data mining

Study Objective:

The lecture provides a complex view to the methods and approaches of processing, management and analytical retrieval from mass of non-structured data. Students will deeply use their previous knowledge of the graph theory - mainly trees. The application skills of data coding principles will be further extended.

Study materials:

Key referencis:

Melichar B., Text information systems, lectures 1 and 2 Melichar B., Text information systems, case studies

Recommended referencis:

Google search engine http://google.com

Note:

Time-table for winter semester 2011/2012:

Time-table is not available yet

Time-table for summer semester 2011/2012:

Time-table is not available yet

The course is a part of the following study plans: