°Ë»ö¿£Áø Àü¹®»çÀÌÆ® - ZaLab
 

Home > °³¹ßÀÚ°ø°£ > °Ë»ö¿£Áø > Á¤º¸°Ë»ö¾È³»

Á¤º¸°Ë»ö¾È³» : Introduction to Information Retrieval

1. ºÒ¸°¸ðµ¨ Á¤º¸°Ë»ö | Information retrieval using the Boolean model
  - An example information retrieval problem
  - A first take at building an inverted index
  - Processing Boolean queries
  - Boolean querying, extended Boolean querying, and ranked retrieval
  - References and further reading
  - Exercises


2. »öÀÎ¾î »çÀü°ú Æ÷½ºÆÃ ¸ñ·Ï | Dictionary terms and postings lists
  - Document delineation and character sequence decoding
    - Obtaining the character sequence in a document
    - Choosing a document unit
  - Determining dictionary terms
    - Tokenization
    - Dropping common terms: stop words
    - Normalization (equivalence classing of terms)
    - Stemming and lemmatization
  - Postings lists, revisited
    - Faster postings list intersection: Skip pointers
    - Phrase queries
      - Biword indexes
      - Positional indexes
      - Combination schemes
  - References and further reading
  - Exercises


3. °ü¿ëÀû °Ë»ö | Tolerant retrieval
  - Wildcard queries
    - General wildcard queries
      - Permuterm indexes
    - k-gram indexes
  - Spelling correction
    - Implementing spelling correction
    - Forms of spell correction
    - Edit distance
    - k-gram indexes
    - Context sensitive spelling correction
  - Phonetic correction
  - References and further reading


4. »öÀÎ ±¸Ãà | Index construction
  - Hardware basics
  - indexing
  - Single-pass in-memory indexing
  - Distributed indexing
  - Dynamic indexing
  - Other types of indexes
  - References and further reading
  - Exercises


5. »öÀÎ ¾ÐÃà | Index compression
  - Statistical properties of terms in information retrieval
    - Heaps' law: Estimating the number of term types
    - Zipf's law: Modeling the distribution of terms
  - Dictionary compression
    - Dictionary-as-a-string
    - Blocked storage
  - Postings file compression
    - Variable byte codes
  - References and further reading
  - Exercises


6. »öÀÎ¾î °¡ÁßÄ¡¿Í º¤ÅͰø°£¸ðµ¨ | Term weighting and vector space models
  - Parametric and zone indexes
    - Weighted zone scoring
    - Learning weights
  - Term frequency and weighting
    - Inverse document frequency
    - Tf-idf weighting
  - Variants in tf-idf functions
    - Sublinear tf scaling
    - Maximum tf normalization
    - Document length and Euclidean normalization
    - Scoring from term weights
  - The vector space model for scoring
    - Inner products
    - Queries as vectors
    - Document and query weighting schemes
    - Computing vector scores
  - References and further reading


7. ¿ÏÀüÇÑ °Ë»ö½Ã½ºÅÛ³»ÀÇ °¡ÁßÄ¡ °è»ê | Computing scores in a complete search system
  - Efficient scoring and ranking
    - Inexact top K document retrieval
    - Index elimination
    - Champion lists
    - Static quality scores and ordering
    - Impact ordering
    - Cluster pruning
  - Components of a basic information retrieval system
    - Tiered indexes
    - Query-term proximity
    - Designing parsing and scoring functions
    - Machine-learned scoring
    - Putting it all together
    - Interaction between vector space and other retrieval methods
      - Boolean retrieval
      - Wildcard queries
      - Phrase queries
  - References and further reading


8. Á¤º¸°Ë»ö Æò°¡ | Evaluation in information retrieval
  - Evaluating information retrieval systems and search engines
  - Standard test collections
  - Evaluation of unranked retrieval sets
  - Evaluation of ranked retrieval results
  - Assessing relevance
    - Document relevance: critiques and justifications of the concept
  - A broader perspective: System quality and user utility
    - System issues
    - User utility
    - Refining a deployed system
  - Results snippets
  - References and further reading
  - Exercises


9. °ü·Ã¼º Çǵå¹é ¹× ÁúÀÇ È®Àå | Relevance feedback and query expansion
  - Relevance feedback and pseudo-relevance feedback
    - The Rocchio algorithm for relevance feedback
    - Probabilistic relevance feedback
    - When does relevance feedback work?
    - Relevance feedback on the web
    - Evaluation of relevance feedback strategies
    - Pseudo-relevance feedback
    - Indirect relevance feedback
    - Summary
  - Global methods for query reformulation
    - Vocabulary tools for query reformulation
    - Query expansion
    - Automatic thesaurus generation
  - References and further reading
  - Exercises


10. XML ¹®¼­ °Ë»ö | XML retrieval
  - Basic XML concepts
  - Challenges in XML retrieval
  - A vector space model for XML retrieval
  - Evaluation of XML Retrieval
  - Content-centric vs. structure-centric XML retrieval
  - References and further reading
  - Exercises


11. È®·üÀû Á¤º¸°Ë»ö | Probabilistic information retrieval
  - Review of basic probability theory
  - The Probability Ranking Principle
    - The 1/0 loss case
    - The PRP with retrieval costs
  - The Binary Independence Model
    - Deriving a ranking function for query terms
    - Probability estimates in theory
    - Probability estimates in practice
    - Probabilistic approaches to relevance feedback
    - The assumptions of the Binary Independence Model
  - An appraisal and some extensions
    - An appraisal of probabilistic models
    - Okapi BM25: a non-binary model
    - Bayesian network approaches to IR
  - References and further reading
  - Exercises


12. ¾ð¾î¸ðµ¨ Á¤º¸°Ë»ö | Language models for information retrieval
  - The query likelihood model
    - Using query likelihood language models in IR
    - Estimating the query generation probability
  - Ponte and Croft's Experiments
  - Language modeling versus other approaches in IR
  - Extended language modeling approaches
  - References and further reading


13. ¹®¼­ ºÐ·ù¿Í Naive Bayes | Text classification and Naive Bayes
  - The text classification problem
  - Naive Bayes text classification
    - Relation to multinomial unigram language model
  - The Bernoulli model
  - Feature selection
    - Mutual information
    - Chi2 feature selection
    - Frequency-based feature selection
    - Comparison of feature selection methods
  - Evaluation of text classification
  - References and further reading
  - Exercises


14. º¤ÅͰø°£ºÐ·ù | Vector space classification
  - Rocchio classification
  - k nearest neighbor
  - Linear vs. nonlinear classifiers
  - More than two classes
  - References and further reading
  - Exercises


15. SVM°ú ºÐ·ù±â ¼³°è | Support vector machines and classifier design
  - Support vector machines: The linearly separable case
  - Soft margin classification
  - Nonlinear SVMs
  - Experimental data
  - Issues in the classification of text documents
    - Choosing what kind of classifier to use
    - Tweaking performance
  - References and further reading


16. ÀϹÝÀûÀΠŬ·¯½ºÅ͸µ | Flat clustering
  - Clustering in information retrieval
  - Problem statement
    - Cardinality - the number of clusters
  - Evaluation of clustering
  - K-means
  - References and further reading
  - Exercises


17. °èÃþÀû Ŭ·¯½ºÅ͸µ | Hierarchical clustering
  - Hierarchical agglomerative clustering
  - Single-link and complete-link clustering
    - Time complexity
  - Group-average agglomerative clustering
  - Centroid clustering
  - Divisive clustering
  - Cluster labeling
  - Implementation notes
  - References and further reading
  - Exercises


18. Çà·Ä ºÐÇØ¿Í ÀáÀçÀû ½Ã¸Çƽ »öÀÎ | Matrix decompositions and Latent Semantic Indexing
  - Linear algebra review
    - Matrix decompositions
  - Term-document matrices and singular value decompositions
  - Low-rank approximations and latent semantic indexing
  - References and further reading


19. À¥°Ë»ö ÀÏ¹Ý | Web search basics
  - Background and history
  - Web characteristics
    - The web graph
    - Spam
  - Advertising as the economic model
  - The search user experience
    - User query needs
  - Index size and estimation
  - Near-duplicates and shingling
    - Shingling
  - References and further reading


20. À¥¹®¼­ ¼öÁý ¹× »öÀÎ | Web crawling and indexes
  - Overview
    - Features a crawler must provide
    - Features a crawler should provide
  - Crawling
    - Crawler architecture
      - Distributing the crawler
    - DNS resolution
    - The URL frontier
  - Distributing indexes
  - Connectivity servers
  - References and further reading


21. ¸µÅ© ºÐ¼® | Link analysis
  - The web as a graph
    - Anchor text and the web graph
  - Pagerank
    - Markov chains
    - The Pagerank computation
    - Topic-specific Pagerank
  - Hubs and Authorities
    - Choosing the subset of the web
  - References and further reading

0