Information Retrieval Terms
Keyword is a descriptor used to build a document representative.
Document Representative (inverted document) is achieved through removal of high and low frequency words, suffix stripping and stem merging.
High Frequency Words do not help discriminate between documents. They tell us nothing about the document.
Low Frequency Words are too rare and do not help in building a document representation.
Stemming presupposes that words can be mapped onto a single representation. Words with similar stems have similar meanings.