Efficient Text Pattern Matching
Code | Completion | Credits | Range | Language |
---|---|---|---|---|
NI-EVY | Z,ZK | 5 | 2P+1C | English |
- Course guarantor:
- Jan Holub
- Lecturer:
- Jan Holub
- Tutor:
- Jan Holub
- Supervisor:
- Department of Theoretical Computer Science
- Synopsis:
-
Students get knowledge of efficient algorithms for text pattern matching. They learn to use so called succinct data structures that are efficient in both access time and memory complexity. They will be able to use the knowledge in design of applications that utilize pattern matching.
- Requirements:
-
Knowledge of basic data structures, fundamentals of computer programming, and theory of finite automata.
- Syllabus of lectures:
-
1. Introduction, basic definitions, border array.
2. Text full index: Suffix array.
3. Text full index: Suffix tree, LCP construction.
4. Text full index: Factor, suffix automata, on-line construction.
5. Exact pattern matching algorithms.
6. FFT in pattern matching.
7. Succinct data structure: rank & select.
8. Succinct data structure: wavelet tree.
9. FM-Index.
10. Dictionary representation, spell checking.
11. Approximate pattern matching.
12. Pattern matching in bioinformatics and musicology.
13. Pattern matching in bioinformatics and musicology.
- Syllabus of tutorials:
-
1. Basic definitions.
2. Text full index
3. Exact pattern matching algorithms.
4. FFT in pattern matching.
5. Succinct data structure
6. FM-Index, approximate pattern matching.
- Study Objective:
-
The pattern matching is a key task in most applications. Its importance is increasing with enlargement of the mass of information collected. Efficient algorithms and data structures for pattern matching are therefore crucial in information processing, but also in other fields like security (e.g., intrusion detection, virus scans). This module provides an extensive overview of such algorithms and data structures.
- Study materials:
-
W. F. Smyth: ''Computing Patterns in Strings'', Pearson Addison Wesley (UK), 2003, 423 pp. ISBN-10: 0201398397
M. Crochemore, W. Rytter: ''Jewels of Stringology''. World Scientific Publishing Company, 2003. ISBN-10: 9810248970.
G. Navarro, M. Raffinot: ''Flexible Pattern Matching in Strings''. Cambridge University Press, 2008. ISBN-10: 0521039932.
M. Crochemore, C. Hancart, T. Lecroq: ''Algorithms on Strings''. Cambridge University Press, 2007. ISBN-10: 0521848997
- Note:
- Further information:
- https://courses.fit.cvut.cz/MI-EVY/
- Time-table for winter semester 2024/2025:
-
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon Tue Wed Thu Fri - Time-table for summer semester 2024/2025:
- Time-table is not available yet
- The course is a part of the following study plans:
-
- Master specialization Computer Security, in Czech, 2020 (elective course)
- Master specialization Design and Programming of Embedded Systems, in Czech, 2020 (elective course)
- Master specialization Computer Systems and Networks, in Czech, 202 (elective course)
- Master specialization Management Informatics, in Czech, 2020 (elective course)
- Master specialization Software Engineering, in Czech, 2020 (elective course)
- Master specialization System Programming, in Czech, version from 2020 (elective course)
- Master specialization Web Engineering, in Czech, 2020 (elective course)
- Master specialization Knowledge Engineering, in Czech, 2020 (elective course)
- Master specialization Computer Science, in Czech, 2020 (PS)
- Mgr. programme, for the phase of study without specialisation, ver. for 2020 and higher (VO)
- Master Specialization Digital Business Engineering, 2023 (VO)
- Master specialization System Programming, in Czech, version from 2023 (elective course)
- Master specialization Computer Science, in Czech, 2023 (PS)