Many applications require the random sampling of matrices with prescribed structure for modeling, statistical, or aesthetic purposes. What does it mean for a random variable to be matrix-valued? What can we say about the eigenvalues of a random matrix? How can we design algorithms to sample from a target distribution on a group or manifold? More generally, what can we say deterministic algorithms with random inputs? Our study of random matrices will lead us to the subgroup algorithm (Diaconis 1987), which subsumes many familiar random sampling procedures.
These notes provide a theoretical treatment of Expectation-Maximization, an iterative parameter estimation algorithm used to find local maxima of the likelihood function in the presence of hidden variables. Introductory textbooks (MLAPP, PRML) typically state the algorithm without explanation and expect students to work blindly through derivations. We find this approach to be unsatisfying, and instead choose to tackle the theory head-on, followed by plenty of examples. Following (Neal & Hinton 1998), we view expectation-maximization as coordinate ascent on the Evidence Lower Bound. This perspective takes much of the mystery out of the algorithm and allows us to easily derive variants like Hard EM and Variational Inference.