Forward-backward algorithm

Forward-backward algorithm

In computer science, the forward-backward algorithm, a dynamic programming algorithm for computing the probability of a particular observation sequence, given the parameters of the model, operates in the context of hidden Markov models.

Overview

The algorithm comprises three steps:

# computing forward probabilities
# computing backward probabilities
# computing smoothed values

The forward and backward steps are often called "forward message pass" and "backward message pass". The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performs the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results.

Formal description

The following description takes as its base matrices of probability values rather than probability distributions. We transform the probability distributions related to a given hidden Markov model into matrix notation as follows.The transition probabilities mathbf{P}(X_tmid X_{t-1}) of a given random variable X_t representing all possible states in the hidden Markov model will be represented by the matrix mathbf{T} where the row index, i, will represent the start state and the column index, j, represents the target state. For example, in the example below mathbf{T} would be defined as:

mathbf{T} = egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}

Similarly, we will represent the probabilities of a new state, given some observed states as evidence, in an observation matrix mathbf{O_t} where each diagonal entry contains the probability of a new state given the particular observed state at t as evidence. Note that t indicates a particular observation in the observation sequence. All other entries in the matrix will be zero. For example, in the example below mathbf{O} the first observed evidence (t=1) is "umbrella". Therefore mathbf{O_1} would be defined as:

mathbf{O_1} = egin{pmatrix} 0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}

This allows a very elegant description of how to compute the next forward probability. Let the set of forward probabilities be stored in yet another matrix mathbf{f_{1:t+1. Where 1:t+1 indicates that the computed probabilities are depending on all forward probabilities from 1 to t+1 including the current one which we will describe with mathbf{f_{1:t. Hence, mathbf{f_{1:t+1 is equal to the multiplication of the transposed transformation matrix with the current forward probabilities and the observation matrix for the next evidence observation. Finally the resulting matrix requires normalization, i.e. the resulting values in the matrix are divided by the sum of all values in the matrix. The normalization factor is described with alpha. Therefore the resulting forward computation is described by the following general formula:

mathbf{f_{1:t+1 = alphamathbf{O_{t+1mathbf{T^T}mathbf{f_{1:t

We can describe the backward computation mathbf{b_{k+1:t that starts at the end of the sequence in a similar manner. Let the end of the sequence be described by the index k, starting at 0. Therefore running from k down to t=0 and calculating each backward probability can be described by the following formula:

mathbf{b_{k+1:t = alpha(mathbf{T}mathbf{O_{k+1mathbf{b_{k+2t) imesmathbf{f_{1:t

Note that we use the non-transposed matrix of mathbf{T} and that the order of the terms has changed. Also note that as a final product we have not a usual matrix multiplication, but a point product. This operation multiplies each value in one matrix with the corresponding value of the other. Finally note that the description in Russel & Norvig 2003 pp. 550 excludes the point product, thought the procedure is required earlier.

The third and final step involves the computation of smoothed probabilities mathbf{svt}. These are the point product of the backward and forward probabilities for each t. Therefore the formular is defined as:

mathbf{sv_t} = alphamathbf{b_{k+1:t imesmathbf{f_{1:t

The example in the following section will show numerically the calculation of these matrices.

Example

This example takes as its basis the umbrella world in Russel & Norvig 2003 pp. 540. The example uses a longer observation sequence {umbrella, umbrella, no umbrella, umbrella, umbrella} than the example described in the book. Assume the probabilities for rain given an umbrella observation are 90% if an umbrella is observed and 20% if no umbrella is observed. Further more assume that the probability of rain in t+1 given that it is raining in now (i.e. t) is 70% and the probability that it does not rain is 30%. The following matrices can be extracted from the umbrella world given the above observations:

mathbf{O_1} = egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix} mathbf{O_2} = egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}mathbf{O_3} = egin{pmatrix}0.1 & 0.0 \ 0.0 & 0.8 end{pmatrix}mathbf{O_4} = egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}mathbf{O_5} = egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}mathbf{T} = egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix} mathbf{T^T} = egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}

Note that mathbf{O_3} differs from the others because of the "no umbrella" observation. Also note that mathbf{T} and mathbf{T^T} are equal because they are symmetrical. Before we can start to compute forward messages we need to describe two special values, the first forward probability and the k+2 backward probability. The first forward probability at t=0 is defined by the prior of the random variable. The k+2 backward probability is defined by the "true" matrix. Therefore it follows:

mathbf{f_{1:0= egin{pmatrix} 0.5 & 0.5 end{pmatrix}mathbf{b_{k+2t = egin{pmatrix} 1.0 & 1.0end{pmatrix}

Now we will first iterate through all t and compute the forward probabilities. The following matrices appear in the same order as described in the forward formula above. Some calculations may be less accurate due to the limited numbers of decimals used in this example.

mathbf{f_{1:1 = alphaegin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.5000 & 0.5000 end{pmatrix}=alphaegin{pmatrix}0.4500 & 0.1000end{pmatrix}=egin{pmatrix}0.8182 & 0.1818 end{pmatrix}mathbf{f_{1:2 = alphaegin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.8182 & 0.1818 end{pmatrix}=alphaegin{pmatrix}0.5645 & 0.0745end{pmatrix}=egin{pmatrix}0.8834 & 0.1165 end{pmatrix}mathbf{f_{1:3 = alphaegin{pmatrix}0.1 & 0.0 \ 0.0 & 0.8 end{pmatrix}egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.8834 & 0.1165 end{pmatrix}=alphaegin{pmatrix}0.0653 & 0.2772end{pmatrix}=egin{pmatrix}0.1906 & 0.8093 end{pmatrix}mathbf{f_{1:4 = alphaegin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.1906 & 0.8093 end{pmatrix}=alphaegin{pmatrix}0.3386 & 0.1247end{pmatrix}=egin{pmatrix}0.7308 & 0.2691 end{pmatrix}mathbf{f_{1:5 = alphaegin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.7308 & 0.2691 end{pmatrix}=alphaegin{pmatrix}0.5331 & 0.0815end{pmatrix}=egin{pmatrix}0.8673 & 0.1326 end{pmatrix}

Now that we have defined the forward probabilities, we continue to compute the backward probabilities. Again the matrices appear in the order as in the backward formula above.

mathbf{b_{5:5 = alpha(egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix}1.0000 & 1.0000 end{pmatrix}) imes egin{pmatrix}0.8673 & 0.1326 end{pmatrix}=alphaegin{pmatrix}0.5984 & 0.0543end{pmatrix}=egin{pmatrix}0.9168 & 0.0831 end{pmatrix}mathbf{b_{4:5 = alpha(egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix}0.9168 & 0.0831 end{pmatrix}) imes egin{pmatrix}0.7308 & 0.2691 end{pmatrix}=alphaegin{pmatrix}0.7308 & 0.2691end{pmatrix}=egin{pmatrix}0.8593 & 0.1407 end{pmatrix}mathbf{b_{3:5 = alpha(egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.1 & 0.0 \ 0.0 & 0.8 end{pmatrix}egin{pmatrix}0.8593 & 0.1407 end{pmatrix}) imes egin{pmatrix}0.1906 & 0.8093 end{pmatrix}=alphaegin{pmatrix}0.0178 & 0.0845end{pmatrix}=egin{pmatrix}0.1739 & 0.8260 end{pmatrix}mathbf{b_{2:5 = alpha(egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix}0.1739 & 0.8260 end{pmatrix}) imes egin{pmatrix}0.8834 & 0.1165 end{pmatrix}=alphaegin{pmatrix}0.1405 & 0.0189end{pmatrix}=egin{pmatrix}0.8814 & 0.1185 end{pmatrix}mathbf{b_{1:5 = alpha(egin{pmatrix} 0.7 & 0.3 \ 0.3 & 0.7 end{pmatrix}egin{pmatrix}0.9 & 0.0 \ 0.0 & 0.2 end{pmatrix}egin{pmatrix}0.8814 & 0.1185 end{pmatrix}) imes egin{pmatrix}0.8182 & 0.1818 end{pmatrix}=alphaegin{pmatrix}0.4600 & 0.0462end{pmatrix}=egin{pmatrix}0.9087 & 0.0912 end{pmatrix}

Finally we will compute the smoothed probability values. The ordering of the matrices follows the smoothed values formula above.

mathbf{sv_{5 = alphaegin{pmatrix}1.0000 & 1.0000 end{pmatrix} imes egin{pmatrix}0.8673 & 0.1326 end{pmatrix}=alphaegin{pmatrix}0.8673 & 0.1326 end{pmatrix}=egin{pmatrix}0.8673 & 0.1326 end{pmatrix}mathbf{sv_{4 = alphaegin{pmatrix}0.9168 & 0.0831 end{pmatrix} imes egin{pmatrix}0.7308 & 0.2691 end{pmatrix}=alphaegin{pmatrix}0.6699 & 0.0223end{pmatrix}=egin{pmatrix}0.9677 & 0.322 end{pmatrix}mathbf{sv_{3 = alphaegin{pmatrix}0.8593 & 0.1407 end{pmatrix} imes egin{pmatrix}0.1906 & 0.8093 end{pmatrix}=alphaegin{pmatrix}0.1637 & 0.1138end{pmatrix}=egin{pmatrix}0.5899 & 0.4101 end{pmatrix}mathbf{sv_{2 = alphaegin{pmatrix}0.1739 & 0.8260 end{pmatrix} imes egin{pmatrix}0.8834 & 0.1165 end{pmatrix}=alphaegin{pmatrix}0.1536 & 0.0962end{pmatrix}=egin{pmatrix}0.6148 & 0.3852 end{pmatrix}mathbf{sv_{1 = alphaegin{pmatrix}0.8814 & 0.1185 end{pmatrix} imes egin{pmatrix}0.8182 & 0.1818 end{pmatrix}=alphaegin{pmatrix}0.7211 & 0.0215end{pmatrix}=egin{pmatrix}0.9710 & 0.0289 end{pmatrix}

Performance

The brute-force procedure for the solution of this problem is the generation of all possible sequences of observed events and hidden states with their probabilities using the two transition matrices. The joint probability of two sequences, given the model, is calculated by multiplying the corresponding probabilities. The brute force procedure has time complexity O(T cdot N^T) , where T is the length of sequences and N is the number of symbols in the state alphabet. This is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high. However, the forward-backward algorithm has time complexity O(N^2 T), .Several enhancements are known to the general forward-backward algorithm which allow for computations to take place in constant space. In addition, as t grows, algorithms have been developed to compute mathbf{f_{1:t+1 efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm Russel & Norvig 2003 pp. 552.

Pseudocode

ForwardBackward(guessState, sequenceIndex): if sequenceIndex is past the end of the sequence, return 1 if (guessState, sequenceIndex) has been seen before, return saved result result = 0 for each neighboring state n: result = result + (transition probability from guessState to n given observation element at sequenceIndex)*ForwardBackward(n, sequenceIndex+1) save result for (guessState, sequenceIndex) return result

ee also

* Baum-Welch algorithm
* Hidden Markov model
* Viterbi algorithm

References

* Lawrence R. Rabiner, [http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition] . "Proceedings of the IEEE", 77 (2), p. 257–286, February 1989.
*cite journal |author=Rabiner L. R., Juang B. H.|title=An introduction to hidden Markov models|journal=IEEE ASSP Magazine |month=January |pages=4–15 |year=1986
*cite book | author = Charniak Eugene|title = Statistical Language Learning|publisher = MIT Press|address = Cambridge, Massachusetts|year = 1993|id=ISDN 978-0-262-53141-2
*cite book | author = Russel S., Norvig P.|title = Artificial Intelligence A Modern Approach 2nd Edition|publisher = Pearson Education|address = Upper Saddle River, New Jersey|year = 2003|id=ISBN 0-13-080302-2

External links

* [http://www.cs.jhu.edu/~jason/papers/#tnlp02 An Interactive Spreadsheet for Teaching the Forward-Backward Algorithm] (spreadsheet and article with step-by-step walk-through)
* [http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of Hidden Markov Models including the forward-backward algorithm]
* [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward-backward algorithm)


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Algorithme forward-backward — En informatique, l algorithme forward backward, ou algorithme avant arrière, est un algorithme pour calculer la probabilité d une séquence observée dans le contexte des modèles de Markov cachés. L algorithme commence par calculer l ensemble des… …   Wikipédia en Français

  • Baum-Welch algorithm — In computer science, statistical computing and bioinformatics, the Baum Welch algorithm is used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the forward backward algorithm and is named for Leonard E. Baum and… …   Wikipedia

  • Viterbi algorithm — The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states ndash; called the Viterbi path ndash; that results in a sequence of observed events, especially in the context of Markov information… …   Wikipedia

  • Inside-outside algorithm — The inside outside algorithm is a way of re estimating production probabilities in a probabilistic context free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward backward algorithm for parameter estimation on …   Wikipedia

  • BCJR algorithm — The BCJR algorithm is an algorithm for maximum a posteriori decoding of error correcting codes defined on trellises (principally convolutional codes). The algorithm is named after its inventers: Bahl, Cocke, Jelinek and Raviv L.Bahl, J.Cocke,… …   Wikipedia

  • Backward Raytracing — Raytracing (dt. Strahlverfolgung[1] oder Strahlenverfolgung[2], in englischer Schreibweise meist ray tracing, seltener ray shooting) ist ein auf der Aussendung von Strahlen basierender Algorithmus zur Verdeckungsberechnung, also zur Ermittlung… …   Deutsch Wikipedia

  • Forward Ray Tracing — Raytracing (dt. Strahlverfolgung[1] oder Strahlenverfolgung[2], in englischer Schreibweise meist ray tracing, seltener ray shooting) ist ein auf der Aussendung von Strahlen basierender Algorithmus zur Verdeckungsberechnung, also zur Ermittlung… …   Deutsch Wikipedia

  • Forward Raytracing — Raytracing (dt. Strahlverfolgung[1] oder Strahlenverfolgung[2], in englischer Schreibweise meist ray tracing, seltener ray shooting) ist ein auf der Aussendung von Strahlen basierender Algorithmus zur Verdeckungsberechnung, also zur Ermittlung… …   Deutsch Wikipedia

  • Expectation-maximization algorithm — An expectation maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an… …   Wikipedia

  • Rete algorithm — The Rete algorithm is an efficient pattern matching algorithm for implementing production rule systems. The Rete algorithm was designed by Dr Charles L. Forgy of Carnegie Mellon University, first published in a working paper in 1974, and later… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”