·Two
 issues with large vocabulary bigram LMs:
 ·With vocabulary size V and N word exits per
 frame, NxV cross-word transitions
 per frame
 ·Bigram
 probabilities very sparse; mostly “backoff” to unigrams
 ·Optimize cross-word
 transitions using “backoff node”:
 ·Viterbi decision at
 backoff node selects single-best predecessor