A way to exploit special sequential aspect (e.g. correlations between observations that are close in the sequence).

For example, rainy day or not. If we treat the data as i.i.d., then the only information we glean from the data is the frequency of rainy days without any weather trends that last few days. Therefore, knowing whether or not it rains today helps to predict tomorrow’s weather.

A Markov Model is one of the simplest ways to relax such i.i.d. assumptions and express such effects in a probabilistic model.

Joint distribution for Markov chain

General expression of the joint distribution for a sequence of observations

First-order Markov chain

If we assume that each of the conditional probability on the right-hand side is independent of all previous observations except the most recent one, we obtain the first-order Markov chain.

In other words, the conditional distribution for observation is given by

Second-order markov chain

Likewise, a second order Markov chain, [Figure]

which menas each observation is only influenced by two previous observations and independent of all observations before.