Loading big-scale data and read files sentence by sentence. Combination of word has meaning in one sentence.
So base on this fact, build meaningful N-Gram model go give us good prediction.
Model format is <key, value> : < word combination : count >
take 1 sentence as example "how to build Ngram model from scratch". following is its 2-gram and 3 gram <key, value>
reducer will take this as input and sum up count for this specific prefix