Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2023/2024

Fulltext search

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
18FV Z 2 2C Czech
Garant předmětu:
Lecturer:
Zuzana Petříčková
Tutor:
Zuzana Petříčková
Supervisor:
Department of Software Engineering
Synopsis:

The aim of the „Fulltext search“ course is to inform students about methods and data structures for efficient search of strings in text files and with the basic principles of searching text files in text databases.

Requirements:

To obtain credit for the course, students have to submit two seminar works – 1. Implementation and evaluation of a chosen string-searching algorithm, 2. design and evaluation of a simple index for searching over a text database. The topics should be approved by the course instructor.

Syllabus of lectures:
Syllabus of tutorials:

1. Introduction to search of strings in text files and to fulltext search, basic principles and definitions.

2. Sequential algorithms for string-search: naive algorithm, Boyer-Moore and its variants, Knuth-Morris-Pratt, …

3. Index based algorithms (suffix tree, …)

4. Index based algorithms (suffix automaton, …)

5. Algorithms for searching a finite set of patterns (Aho-Corasick, …) and infinite number of patterns (regular expressions)

6. Full text search: Boolean retrieval system and Inverted index

7. Parsing a document, tokenization, term vocabulary…

8. Index construction and index compression

9. Ranking search results, relevance, term weighting

10. Vector space model

11. Relevance feedback and query expansion

12. Introduction to Web search

13. Information retrieval frameworks

Study Objective:
Study materials:

[1] M. Crochemore, C. Hancart and T. Lecroq, Algorithms on Strings, Cambridge University

Press, 2007.

[2] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.

[3] G. G. Chowdhury, Introduction to Modern Information Retrieval, 3rd Edition, Facet Publishing, 2017.

Note:
Time-table for winter semester 2023/2024:
Time-table is not available yet
Time-table for summer semester 2023/2024:
Time-table is not available yet
The course is a part of the following study plans:
Data valid to 2024-04-26
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/en/predmet7697306.html