Keyword Distribution Information
- Use stemming algorithm to obtain keyword distribution across document collection
- Compute the indexing constant
- log(N), where N is the number of documents
- Goal: identify those words that are rare enough to probably be contentful but frequent enough that they describe several documents