Web Data Mining
Kód | Zakončení | Kredity | Rozsah | Jazyk výuky |
---|---|---|---|---|
MIE-DDW | Z,ZK | 4 | 2P+1C | anglicky |
- Garant předmětu:
- Přednášející:
- Cvičící:
- Předmět zajišťuje:
- katedra softwarového inženýrství
- Anotace:
-
A student learns in detail about various search and data mining methods and will be able to select, in the context of a given application, a suitable method of automatic web data processing and to control the process of its usage.
- Požadavky:
-
Familiarity with basic principles of WWW data representation, such as the HTML language.
- Osnova přednášek:
-
1. Main topic of the web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining.
2. An overview of practical web mining applications.
3. Web Content Mining: document indexing and retrieval in the web environment, Boolean and vector retrieval models, latent semantic indexing (LSI), results ordering, meta-search.
4. Web Content Mining: web documents categorization and clustering.
5. Natural Language Processing methods used for web information retrieval: lemmatization, part-of-speech tagging, disambiguation, shallow syntactic parsing, etc.
6. Web Structure Mining: primary web browsing (crawling, spidering), link topology analysis, PageRank, HITS methods.
7. Global analysis of the Web; social networks analysis.
8. Web Usage Mining: mining for user behavior on the web, internet marketing.
9. Information Extraction as a specific type of web content mining: wrapper-based vs. token activated extraction.
10. Specific applications: opinion mining vs. fact mining, web spam analysis, comparative shopping, etc.
11. Web information integration, mapping schemas usage.
12. Web Mining and its relation to the Semantic Web: automatic semantic annotation, ontology learning, Semantic Web search.
- Osnova cvičení:
- Cíle studia:
-
Provide students with an overview of web mining technologies and qualify them to use some of them in practice.
- Studijní materiály:
-
1. Chakrabarti, S. ''Mining the Web: Discovering Knowledge from Hypertext Data''. Morgan Kaufmann, 2002. ISBN 1558607544.
2. Konchady, M. ''Building Search Applications: Lucene, LingPipe, and Gate''. Mustru Publishing, 2008. ISBN 0615204252.
- Poznámka:
-
Information about the course and courseware are available at https://courses.fit.cvut.cz/MI-DDW/
- Další informace:
- https://courses.fit.cvut.cz/MI-DDW/
- Pro tento předmět se rozvrh nepřipravuje
- Předmět je součástí následujících studijních plánů: