- Garant předmětu:
- Department of Software Engineering
The Fulltext Systems covers methods, algorithms for free text processing including searching and compression methods.
- Syllabus of lectures:
The lecture Fulltext systems provides a full exhaustive explanation of methods used for the management, searching and data mining on un-structured data. Students meet with approaches and methods of full text based searching - elementar method, sophisticated methods using pattern pre-processing and text pre-processing. Both index based and signature based methods are provided for the deep study, too.
The second non-trivial part of the un-structured data management are methods for data compression and data encoding. We study methods of Shannon-Fano, Huffman and their modifications. At last but not least we study also dictionary based methods - Lempel-Ziv-Welsch works.
- Syllabus of tutorials:
1.Introduction to the Text Information Systems, basic search algorithm
2.Search algorithms, patern matching methods: KMP, AC,
3.Search algorithms, patern matching methods: BM, CW, finite automaton
4.Index based methods
5.Signature based methods
6.Data compression methods, data encoding: binary, fibonacci,
7.Data compression methods, data encoding: elias data encoding methods and other methods
8.Coding trees based methods: Shannon-Fano, Huffman,
9.Coding trees based methods: adaptive Huffman method
10.Dictionary based data compression methods: Lempel-Ziv works and modifications - principles
11.Dictionary based data compression methods: details of LZ77, LZ78, LZW and further modifications
12.Principles of data crawling and data management of huge data indexes for search
13.Principles of distributed computing management of data search and data mining
- Study Objective:
The lecture provides a complex view to the methods and approaches of processing, management and analytical retrieval from mass of non-structured data. Students will deeply use their previous knowledge of the graph theory - mainly trees. The application skills of data coding principles will be further extended.
- Study materials:
Melichar B., Text information systems, lectures 1 and 2 Melichar B., Text information systems, case studies
Google search engine http://google.com
- Further information:
- No time-table has been prepared for this course
- The course is a part of the following study plans:
- Aplikace softwarového inženýrství (compulsory course of the specialization)