| Home > °³¹ßÀÚ°ø°£ > °Ë»ö¿£Áø > Á¤º¸°Ë»ö¾È³» |
| Á¤º¸°Ë»ö¾È³» : Introduction to Information Retrieval |
|
1. ºÒ¸°¸ðµ¨ Á¤º¸°Ë»ö | Information retrieval using the Boolean model - An example information retrieval problem - A first take at building an inverted index - Processing Boolean queries - Boolean querying, extended Boolean querying, and ranked retrieval - References and further reading - Exercises 2. »öÀÎ¾î »çÀü°ú Æ÷½ºÆÃ ¸ñ·Ï | Dictionary terms and postings lists - Document delineation and character sequence decoding - Obtaining the character sequence in a document - Choosing a document unit - Determining dictionary terms - Tokenization - Dropping common terms: stop words - Normalization (equivalence classing of terms) - Stemming and lemmatization - Postings lists, revisited - Faster postings list intersection: Skip pointers - Phrase queries - Biword indexes - Positional indexes - Combination schemes - References and further reading - Exercises 3. °ü¿ëÀû °Ë»ö | Tolerant retrieval - Wildcard queries - General wildcard queries - Permuterm indexes - k-gram indexes - Spelling correction - Implementing spelling correction - Forms of spell correction - Edit distance - k-gram indexes - Context sensitive spelling correction - Phonetic correction - References and further reading 4. »öÀÎ ±¸Ãà | Index construction - Hardware basics - indexing - Single-pass in-memory indexing - Distributed indexing - Dynamic indexing - Other types of indexes - References and further reading - Exercises 5. »öÀÎ ¾ÐÃà | Index compression - Statistical properties of terms in information retrieval - Heaps' law: Estimating the number of term types - Zipf's law: Modeling the distribution of terms - Dictionary compression - Dictionary-as-a-string - Blocked storage - Postings file compression - Variable byte codes - References and further reading - Exercises 6. »öÀÎ¾î °¡ÁßÄ¡¿Í º¤ÅͰø°£¸ðµ¨ | Term weighting and vector space models - Parametric and zone indexes - Weighted zone scoring - Learning weights - Term frequency and weighting - Inverse document frequency - Tf-idf weighting - Variants in tf-idf functions - Sublinear tf scaling - Maximum tf normalization - Document length and Euclidean normalization - Scoring from term weights - The vector space model for scoring - Inner products - Queries as vectors - Document and query weighting schemes - Computing vector scores - References and further reading 7. ¿ÏÀüÇÑ °Ë»ö½Ã½ºÅÛ³»ÀÇ °¡ÁßÄ¡ °è»ê | Computing scores in a complete search system - Efficient scoring and ranking - Inexact top K document retrieval - Index elimination - Champion lists - Static quality scores and ordering - Impact ordering - Cluster pruning - Components of a basic information retrieval system - Tiered indexes - Query-term proximity - Designing parsing and scoring functions - Machine-learned scoring - Putting it all together - Interaction between vector space and other retrieval methods - Boolean retrieval - Wildcard queries - Phrase queries - References and further reading 8. Á¤º¸°Ë»ö Æò°¡ | Evaluation in information retrieval - Evaluating information retrieval systems and search engines - Standard test collections - Evaluation of unranked retrieval sets - Evaluation of ranked retrieval results - Assessing relevance - Document relevance: critiques and justifications of the concept - A broader perspective: System quality and user utility - System issues - User utility - Refining a deployed system - Results snippets - References and further reading - Exercises 9. °ü·Ã¼º Çǵå¹é ¹× ÁúÀÇ È®Àå | Relevance feedback and query expansion - Relevance feedback and pseudo-relevance feedback - The Rocchio algorithm for relevance feedback - Probabilistic relevance feedback - When does relevance feedback work? - Relevance feedback on the web - Evaluation of relevance feedback strategies - Pseudo-relevance feedback - Indirect relevance feedback - Summary - Global methods for query reformulation - Vocabulary tools for query reformulation - Query expansion - Automatic thesaurus generation - References and further reading - Exercises 10. XML ¹®¼ °Ë»ö | XML retrieval - Basic XML concepts - Challenges in XML retrieval - A vector space model for XML retrieval - Evaluation of XML Retrieval - Content-centric vs. structure-centric XML retrieval - References and further reading - Exercises 11. È®·üÀû Á¤º¸°Ë»ö | Probabilistic information retrieval - Review of basic probability theory - The Probability Ranking Principle - The 1/0 loss case - The PRP with retrieval costs - The Binary Independence Model - Deriving a ranking function for query terms - Probability estimates in theory - Probability estimates in practice - Probabilistic approaches to relevance feedback - The assumptions of the Binary Independence Model - An appraisal and some extensions - An appraisal of probabilistic models - Okapi BM25: a non-binary model - Bayesian network approaches to IR - References and further reading - Exercises 12. ¾ð¾î¸ðµ¨ Á¤º¸°Ë»ö | Language models for information retrieval - The query likelihood model - Using query likelihood language models in IR - Estimating the query generation probability - Ponte and Croft's Experiments - Language modeling versus other approaches in IR - Extended language modeling approaches - References and further reading 13. ¹®¼ ºÐ·ù¿Í Naive Bayes | Text classification and Naive Bayes - The text classification problem - Naive Bayes text classification - Relation to multinomial unigram language model - The Bernoulli model - Feature selection - Mutual information - Chi2 feature selection - Frequency-based feature selection - Comparison of feature selection methods - Evaluation of text classification - References and further reading - Exercises 14. º¤ÅͰø°£ºÐ·ù | Vector space classification - Rocchio classification - k nearest neighbor - Linear vs. nonlinear classifiers - More than two classes - References and further reading - Exercises 15. SVM°ú ºÐ·ù±â ¼³°è | Support vector machines and classifier design - Support vector machines: The linearly separable case - Soft margin classification - Nonlinear SVMs - Experimental data - Issues in the classification of text documents - Choosing what kind of classifier to use - Tweaking performance - References and further reading 16. ÀϹÝÀûÀΠŬ·¯½ºÅ͸µ | Flat clustering - Clustering in information retrieval - Problem statement - Cardinality - the number of clusters - Evaluation of clustering - K-means - References and further reading - Exercises 17. °èÃþÀû Ŭ·¯½ºÅ͸µ | Hierarchical clustering - Hierarchical agglomerative clustering - Single-link and complete-link clustering - Time complexity - Group-average agglomerative clustering - Centroid clustering - Divisive clustering - Cluster labeling - Implementation notes - References and further reading - Exercises 18. Çà·Ä ºÐÇØ¿Í ÀáÀçÀû ½Ã¸Çƽ »öÀÎ | Matrix decompositions and Latent Semantic Indexing - Linear algebra review - Matrix decompositions - Term-document matrices and singular value decompositions - Low-rank approximations and latent semantic indexing - References and further reading 19. À¥°Ë»ö ÀÏ¹Ý | Web search basics - Background and history - Web characteristics - The web graph - Spam - Advertising as the economic model - The search user experience - User query needs - Index size and estimation - Near-duplicates and shingling - Shingling - References and further reading 20. À¥¹®¼ ¼öÁý ¹× »öÀÎ | Web crawling and indexes - Overview - Features a crawler must provide - Features a crawler should provide - Crawling - Crawler architecture - Distributing the crawler - DNS resolution - The URL frontier - Distributing indexes - Connectivity servers - References and further reading 21. ¸µÅ© ºÐ¼® | Link analysis - The web as a graph - Anchor text and the web graph - Pagerank - Markov chains - The Pagerank computation - Topic-specific Pagerank - Hubs and Authorities - Choosing the subset of the web - References and further reading |