Web Data Mining

The course is not on the list Without time-table
Code Completion Credits Range Language
MIE-DDW.16 Z,ZK 5 2P+1C English
Department of Software Engineering

A student learns in detail about various search and data mining methods and will be able to select, in the context of a given application, a suitable method of automatic web data processing and to control the process of its usage.


Familiarity with basic principles of WWW data representation, such as the HTML language.

Syllabus of lectures:

1. Main topic of the web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining.

2. An overview of practical web mining applications.

3. Web Content Mining: document indexing and retrieval in the web environment, Boolean and vector retrieval models, latent semantic indexing (LSI), results ordering, meta-search.

4. Web Content Mining: web documents categorization and clustering.

5. Natural Language Processing methods used for web information retrieval: lemmatization, part-of-speech tagging, disambiguation, shallow syntactic parsing, etc.

6. Web Structure Mining: primary web browsing (crawling, spidering), link topology analysis, PageRank, HITS methods.

7. Global analysis of the Web; social networks analysis.

8. Web Usage Mining: mining for user behavior on the web, internet marketing.

9. Information Extraction as a specific type of web content mining: wrapper-based vs. token activated extraction.

10. Specific applications: opinion mining vs. fact mining, web spam analysis, comparative shopping, etc.

11. Web information integration, mapping schemas usage.

12. Web Mining and its relation to the Semantic Web: automatic semantic annotation, ontology learning, Semantic Web search.

Syllabus of tutorials:
Study Objective:

Provide students with an overview of web mining technologies and qualify them to use some of them in practice.

Study materials:

1. Chakrabarti, S. ''Mining the Web: Discovering Knowledge from Hypertext Data''. Morgan Kaufmann, 2002. ISBN 1558607544.

2. Konchady, M. ''Building Search Applications: Lucene, LingPipe, and Gate''. Mustru Publishing, 2008. ISBN 0615204252.

Further information:
No time-table has been prepared for this course
The course is a part of the following study plans:
Data valid to 2020-10-22
For updated information see http://bilakniha.cvut.cz/en/predmet4652906.html