- While building language model, we try to estimate the probability of the sentence or a document.
- Given sequences(sentences or documents) like
- Language model(bigram language model) will be :
for each sequence given by above equation.
- Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term .
- Perplexity is the inverse of square root of likelihood. So lesser the perplexity, better is the model. Note that square root in case of bigram language model, root in case of n-gram model. For more explanation on perplexity, visit this question.