Document & Collection Analysis Engine (cont.)
Index document collection (cont.)
- compute the weight Wdk associating each keyword with this document using inverse frequency weighting
- Wdk = TOTFREQ * [log(N) - log(DOCFREQ) + 1]
- where (N) is the number of documents in collection
Conclusion: IR D&C Analysis Engine
- data base of computational representations of each document in collection
- a document representation is a list of word tokens with associated Wdk and DOCFREQ