Content
The module covers an introduction to the traditional text IR, including Boolean retrieval, vector space model as well as tolerant retrieval. Afterwards, the technical basics of Web IR are discussed, starting with the Web size estimation and duplicate detection followed by the link analysis and crawling. This leads on to the study of the modern search engine evaluation methods and various test collections. Finally, applications of classification and clustering in the IR domain are discussed. The theoretical basis is illustrated by the examples of the modern search systems, such as Google, Altavista, Clusty, etc.
Die Lehrveranstaltung behandelt Algorithmen, Strukturen und innovative Systeme, die im Rahmen des World Wide Web relevant sind bzw. durch das World Wide Web möglich geworden sind. Kernpunkte der Lehrveranstaltung sind Web-Suche (Web Crawling, Text Indexing, Ranking Mechanismen), Analyse und Struktur des World Wide Web, Datenmanagement (Suche, Topologien, Systeme), sowie weitere aktuelle Themen.
Lecturers
TEACHING ASSISTANTS
Recommended Literature
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. It is available online here: nlp.stanford.edu/IR-book/
Participants
Computer science students (recommended from the 3. semester) and ITIS students.
Lecture and Exercise dates
Lectures take place Tuesdays, 14:15 – 15:45 in room 3703-023.
Tutorial session will take place Thursdays, 16:30 – 18:00 in room 1101-F142.
Please refer to Stud.IP for more information
Exam
The exam will be in English. You can answer in English. All topics discussed in the lectures, exercises, and programming exercises are relevant.
Duration: 120 minutes.
Auxiliary material: a non-programmable calculator, dictionary.
Lecture notes
We mainly use the book "Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, which is available online and as PDF here.
Lectures and Dates
April 12, 2022 Boolean retrieval
April 19, 2022 Document ingestion, Dictionary and Tolerant Retrieval
April 26, 2022 Dictionary and tolerant retrieval, Indexing, Index Compression
May 3, 2022 Index compression, Scoring, Term weighting, Vector space model
May 10, 2022 Evaluation
May 19, 2022 Query expansion
May 26, 2022 Query expansion (continued), Probabilistic information retrieval
May 31, 2022 Language models for IR
June 14, 2022 Text classification and Naive Bayes
June 21, 2022 Vector space classification
June 28, 2022 Learning to rank
July 5, 2022 Flat and Hierarchical calustering
July 12, 2022 Link Analysis
Exercises
Exercises and their solutions are published via Stud.IP.