Logo ČVUT
Loading...
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2011/2012

Web Data Mining

Login to KOS for course enrollment Display time-table
Code Completion Credits Range Language
MI-DDW Z,ZK 4 2+1 Czech
Lecturer:
Vojtěch Svátek (gar.)
Tutor:
Vojtěch Svátek (gar.), Milan Dojchinovski, Ivo Lašek
Supervisor:
Department of Software Engineering
Synopsis:

A student learns in detail about various search and data mining methods and will be able to select, in the context of a given application, a suitable method of automatic web data processing and to control the process of its usage.

Requirements:

Familiarity with basic principles of WWW data representation, such as the HTML language.

Syllabus of lectures:

1. Main topic of the web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining.

2. An overview of practical web mining applications.

3. Web Content Mining: document indexing and retrieval in the web environment, Boolean and vector retrieval models, latent semantic indexing (LSI), results ordering, meta-search.

4. Web Content Mining: web documents categorization and clustering.

5. Natural Language Processing methods used for web information retrieval: lemmatization, part-of-speech tagging, disambiguation, shallow syntactic parsing, etc.

6. Web Structure Mining: primary web browsing (crawling, spidering), link topology analysis, PageRank, HITS methods.

7. Global analysis of the Web; social networks analysis.

8. Web Usage Mining: mining for user behavior on the web, internet marketing.

9. Information Extraction as a specific type of web content mining: wrapper-based vs. token activated extraction.

10. Specific applications: opinion mining vs. fact mining, web spam analysis, comparative shopping, etc.

11. Web information integration, mapping schemas usage.

12. Web Mining and its relation to the Semantic Web: automatic semantic annotation, ontology learning, Semantic Web search.

Syllabus of tutorials:
Study Objective:

Provide students with an overview of web mining technologies and qualify them to use some of them in practice.

Study materials:

1. Chakrabarti, S. ''Mining the Web: Discovering Knowledge from Hypertext Data''. Morgan Kaufmann, 2002. ISBN 1558607544.

2. Konchady, M. ''Building Search Applications: Lucene, LingPipe, and Gate''. Mustru Publishing, 2008. ISBN 0615204252.

Note:
Time-table for winter semester 2011/2012:
Time-table is not available yet
Time-table for summer semester 2011/2012:
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Mon
Tue
Fri
Thu
roomT9:350
Dojchinovski M.
14:30–16:00
ODD WEEK

(lecture parallel1
parallel nr.101)

Dejvice
NBFIT PC ucebna
roomT9:350
Lašek I.
16:15–17:45
ODD WEEK

(lecture parallel1
parallel nr.103)

Dejvice
NBFIT PC ucebna
roomT9:350
Lašek I.
14:30–16:00
EVEN WEEK

(lecture parallel1
parallel nr.102)

Dejvice
NBFIT PC ucebna
roomT9:350
Lašek I.
16:15–17:45
EVEN WEEK

(lecture parallel1
parallel nr.104)

Dejvice
NBFIT PC ucebna
Fri
roomVE:RB-209
Svátek V.
12:45–14:15
(lecture parallel1)
VŠE Žižkov
VSE Prednaskova mistnost
The course is a part of the following study plans:
Generated on 2012-7-9
For updated information see http://bilakniha.cvut.cz/en/predmet1433806.html