Hidden Markov model

Difference between a mixture model and HHM

  • If we examine a single time slice of the model, it can be seen as a mixture distribution with component densities given by p(X|Z)
  • It can be interpreted as an extension of a mixture model where the choice of mixture component for each observation is not independent but depends on the choice of component for the previous observations  (p(Z_n|Z_{n-1}))


  • Speech recognition
  • Natural language modeling
  • On-line handwriting recognition
  • analysis of biological sequences such as protein and DNA

Transition probability

  • Latent variables; discrete multinomial variables Z_n = describe which component of the mixture is responsible for generating the corresponding observation X_n
  • The probability distribution of Z_n depends on the previous latent variable Z_{n-1} through conditional distribution p(Z_n|Z_{n-1})
  • Conditional distribution

p(Z_n|Z_{n-1}, A) = \displaystyle \prod_{k=1}^{K}\prod_{j=1}^{K} A_{jk}^{Z_{n-1, j}Z_{nk}}

  • Inital latent node z_1 does not have a parent node, so it has a marginal distribution

p(Z_1|\pi) = \displaystyle \prod_{k=1}^{K} \pi_{k}^{z_{1k}}

  • Lattice or trellis diagram

Emission probability


  • Three Gaussian distribution/ two dice problem
  • Handwriting

What is a Markov model?

Markov Models

  • A way to exploit special sequential aspect (e.g. correlations between observations that are close in the sequence).
  • For example, rainy day or not. If we treat the data as i.i.d., then the only information we glean from the data is the frequency of rainy days without any weather trends that last few days. Therefore, knowing whether or not it rains today helps to predict tomorrow’s weather.
  • A Markov Model is one of the simplest ways to relax such i.i.d. assumptions and express such effects in a probabilistic model.

Joint distribution for Markov chain 

  • General expression of the joint distribution for a sequence of observations

p(X_1, ..., X_N) = \displaystyle\prod_{n=2}^{N} p(X_n|X_1, ..., X_{n-1})

  • First-order Markov chain

If we assume that each of the conditional probability on the right-hand side is independent of all previous observations except the most recent one, we obtain the first-order Markov chain.

p(X_1, ..., X_N) = p(X_1)\displaystyle\prod_{n=2}^{N} p(X_n|X_{n-1})

In other words, the conditional distribution for observation X_n is given by

p(X_n|X_1, ..., X_{n-1}) = p(X_n|X_{n-1})

  • Second-order markov chain

Likewise, a second order Markov chain, [Figure]

p(X_n|X_1, ..., X_{n-1}) = p(X_n|X_{n-1}, X_{n-2})

which menas each observation is only influenced by two previous observations and independent of all observations before.

Sequential data

What is sequential data?

  • Data with poor or no i.i.d assumption
  • Often found in time series data. For instance,
    • Rainfall measurements on successive days at a particular location
    • Daily values of a currency exchange rate
    • Acoustic features at successive time frames used for speech recognition)

Stationary vs. nonstationary sequential distributions

  • Stationary: data evolves in time but the distribution from which it is generated remains the same
  • Nonstationary: the generative distribution itself evolves with time

Applications of Markov Models

  • Financial forecasting: predict the next value in a time series given observations of the previous values
  • Speech recognition: predict the next speech spectrum given observations of the previous speech spectrum values
  • Note: Markov models are useful in applications where more recent observations are more informative than less recent observations

Latent variable

  • Hidden Markov model: where the latent variables are discrete
  • Linear dynamical systems: where latent variables are Gaussian