An open POS-tagged corpus analysis toolkit for discovering words/n-grams, collocation, frequencies, concordance, and SOA