Automata in Text Pattern Matching
Code  Completion  Credits  Range  Language 

MIAVY  Z,ZK  4  2P+1C  Czech 
 Garant předmětu:
 Lecturer:
 Tutor:
 Supervisor:
 Department of Theoretical Computer Science
 Synopsis:

Searching in a text (pattern matching) and generally in data is an area of problems and exciting solutions from theoretical and practical perspectives. We may interpret and search the data as onedimensional (text) or multidimensional (tree, picture). We may search for something known (a pattern: a string or a set specified by regular expression) or unknown (for example, a regularity). Matching can be either exact or approximate. This course presents a taxonomy of searching problems. It focuses on algorithms based on some automaton (finite, pushdown, linearbounded, or tree).
 Requirements:

Students are supposed to know the formal language theory and algorithms on finite automata (BIEAAG course). In particular, students should be familiar with Chomsky hierarchy of languages, subset construction and epsilontransitions removal.
 Syllabus of lectures:

1. Finite automata, basic operations with finite automata. Taxonomy of pattern matching problems for exact and approximate matching. Forward pattern matching, models of searching algorithms. Nondeterministic search automata.
2. Deterministic search finite automata and their state complexity.
3. Construction of prefix and suffix automata. Construction of factor automata. Computation of borders and periods of text. Searching exact and approximate repetitions in text.
4. Searching other string regularities and other indexing automata applications.
5. Regular expressions with backreferences.
6. Synchronizing finite automata.
7. Locally testable languages.
8. Classes of deterministic and nondeterministic pushdown automata. Determinisation of pushdown automata.
9. Tree automata.
10. Tree pattern matching & indexing, nonlinear tree patterns.
11. Combinatorial pattern matching and indexing of multidimensional text.
12. Tree regular expressions.
 Syllabus of tutorials:

1. Finite automata, basic operations with finite automata. Taxonomy of pattern matching problems for exact and approximate matching. Forward pattern matching, models of searching algorithms. Nondeterministic search automata.
2. Deterministic search finite automata and their state complexity.
3. Construction of prefix and suffix automata. Construction of factor automata. Computation of borders and periods of text. Searching exact and approximate repetitions in text.
4. Searching other string regularities and other indexing automata applications.
5. Regular expressions with backreferences.
6. Synchronizing finite automata.
7. Locally testable languages.
8. Classes of deterministic and nondeterministic pushdown automata. Determinisation of pushdown automata.
9. Tree automata.
10. Tree pattern matching & indexing, nonlinear tree patterns.
11. Combinatorial pattern matching and indexing of multidimensional text.
12. Tree regular expressions.
 Study Objective:

The students get familiar with algorithms for text, tree and image pattern matching. Those algorithms are based on finite, pushdown, linearbounded, and tree automata. The students also get familiar with a taxonomy of pattern matching problems. They learn the principles of the construction of automata for solving these problems. The students can use this knowledge to develop applications for pattern matching (for example, DNA or data streams).
 Study materials:

1. Melichar, B.; Holub, J.; Polcar, T. Text Searching Algorithms. Volume I: Forward String Matching. Dostupné z: https://psc.fit.cvut.cz/athens/TextSearchingAlgorithms/
2. Aho, A. V. Algorithms for Finding Patterns in Strings. In Handbook of Theoretical Computer Science, Algorithms and Complexity, 255300. Elsevier, 1990. ISBN 9780444880710. DOI: 10.1016/B9780444880710.500102.
3. Alur, R.; Madhusudan P. Visibly pushdown languages. In Proc. 36th Int. ACM Symposium on Theory of Computing (STOC), 2004.
4. Van Tang, N. A tighter bound for the determinization of visibly pushdown automata. In 11th International Workshop on Verification of InfiniteState Systems, INFINITY 2009, 2009.
5. Nowotka, D.; Srba J. HeightDeterministic Pushdown Automata. In 32nd International Symposium on Mathematical Foundations of Computer Science, MFCS'07, 2007.
6. Černý, J. Poznámka k homogénnym experimentom s konečnými automatmi. Matematickofyzikálny časopis Slovenskej Akadémie Vied, 14: 208216. Dostupné z: https://dml.cz/handle/10338.dmlcz/126647
7. Pin, JE. On two combinatorial problems arising from automata theory. Combinatorial mathematics (MarseilleLuminy, 1981), 1983, MarseilleLuminy, pp.535548. Dostupné z: https://hal.archivesouvertes.fr/hal00143937
8. Holub, J.; Štekr, S. Implementation of deterministic finite automata on parallel computers. Colloquium and Festschrift at the occasion of the 60th birthday of Derrick Kourie (Computer Science), Windy Brow, South Africa, 28 June 2008. Dostupné z: http://hdl.handle.net/2263/9145
9. Yechezkel, Z. Locally testable languages. Journal of Computer and System Sciences 6, 151167 (1972). Dostupné z: https://doi.org/10.1016/S00220000(72)800205
10. James, R.; Dakotah, L. Extracting Forbidden Factors from Regular Stringsets. Proceedings of the 15th Meeting on the Mathematics of Language, 2017, London, pp.3646. Dostupné z: https://www.aclweb.org/anthology/W173404/
 Note:
 Further information:
 https://courses.fit.cvut.cz/MIAVY
 No timetable has been prepared for this course
 The course is a part of the following study plans:

 Master branch Knowledge Engineering, in Czech, 20162017 (elective course)
 Master branch Computer Systems and Networks, in Czech, 20162019 (elective course)
 Master branch Design and Programming of Embedded Systems, in Czech, 20162019 (elective course)
 Master branch Web and Software Engineering, spec. Info. Systems and Management, in Czech, 20162019 (elective course)
 Master branch Web and Software Engineering, spec. Software Engineering, in Czech, 20162019 (elective course)
 Master branch Web and Software Engineering, spec. Web Engineering, in Czech, 20162019 (elective course)
 Master program Informatics, unspecified branch, in Czech, version 20162019 (VO)
 Master branch System Programming, spec. System Programming, in Czech, 20162019 (elective course)
 Master branch System Programming, spec. Computer Science, in Czech, 20162017 (compulsory course of the branch)
 Master branch Knowledge Engineering, in Czech, 20182019 (elective course)