<img style:"float:left; width=75%" src="images/dinosaur_ngrams.png"/>
Recap by the Berkeley NLP group
Let's combine them (for 3gram LM, $k=2$):
\begin{align} P_{SLI}(x_n|x_{n-1},x_{n-2})& = \lambda_1 P(x_n|x_{n-1},x_{n-2})\\ & + \lambda_2 P(x_n|x_{n-1})\\ & + \lambda_{k+1}P(x_n) \qquad \lambda_i>0, \sum \lambda_i = 1 \end{align}Every model can be fooled, avoid relying on any one exclusively!
<img style:"float:left" src="images/depLM.png"/>
$$P(\text{binoculars}|\text{saw})$$more informative than:
$$(\text{binoculars}| \text{strong}, \text{very}, \text{with}, \text{ship}, \text{the},\text{saw})$$Words and contexts are represented by sets of numbers (vectors):
Recap: