If we examine a single time slice of the model, it can be seen as a mixture distribution with component densities given by

It can be interpreted as an extension of a mixture model where the choice of mixture component for each observation is not independent but depends on the choice of component for the previous observations ()

Applications

Speech recognition

Natural language modeling

On-line handwriting recognition

analysis of biological sequences such as protein and DNA

Transition probability

Latent variables; discrete multinomial variables = describe which component of the mixture is responsible for generating the corresponding observation

The probability distribution of depends on the previous latent variable through conditional distribution

Conditional distribution

Inital latent node does not have a parent node, so it has a marginal distribution