Expectation Maximization Algorithm.

Seen the new capsules paper by Hinton ? Yeah capsules aren't the fanciest things in there, its the EM algorithm.

What? The expectation-maximization routing.

How? Through the EM algorithm.

Why ? To estimate latent (hidden) variables.
Where ? From mixture distributions.

The Expectation-Maximization (EM) algorithm is a way to find maximum-likelihood estimates for a model parameters when your data is incomplete, has missing data points, or has unobserved (hidden) latent variables. Thus, it can be used in Deep Learning, where a NN might have unaccounted parameters (ex: transformation variations)

EM is an iterative way to approximate the maximum likelihood function. While maximum likelihood estimation can find the “best fit” model for a set of data, it doesn’t work particularly well for incomplete data sets. The EM algorithm which is more complex, can find model parameters even if you have missing data. It works by choosing random values for the missing data points, and using those guesses to estimate a second set of data. The new values are used to create a better guess for the first set, and the process continues until the algorithm converges on a fixed point.

MLE vs. EM

Although Maximum Likelihood Estimation (MLE) and EM can both find “best-fit” parameters, how they find the models are very different. MLE accumulates all of the data first and then uses that data to construct the most likely model. EM takes a guess at the parameters first — accounting for the missing data — then tweaks the model to fit the guesses and the observed data.

The basic steps for the EM algorithm are:

An initial guess is made for the model’s parameters and a probability distribution is created. This is sometimes called the “E-Step” for the “Expected” distribution.
Newly observed data is fed into the model.
The probability distribution from the E-step is tweaked to include the new data. This is sometimes called the “M-step.”
Steps 2 through 4 are repeated until stability (i.e. a distribution that doesn’t change from the E-step to the M-step) is reached.

The EM Algorithm always improves a parameter’s estimation through this multi-step process. However, it sometimes needs a few random starts to find the best model because the algorithm can hone in on a local maxima that isn’t that close to the (optimal) global maxima. In other words, it can perform better if you force it to restart and take that “initial guess” from Step 1 over again. From all of the possible parameters, you can then choose the one with the greatest maximum likelihood.

The EM algorithm has many applications, including:

Dis-entangling superimposed signals
Estimating Gaussian mixture models (GMMs)
Estimating hidden Markov models (HMMs)

Author Description