In the previous post I derived the evidence lower bound (ELBO),
\[\mathcal{F}(\lambda; x) = \int q(z;\lambda) \left[\log p(x,z) - \log q(z;\lambda)\right]dz ,\]
which variational inference attempts to maximize. Now I’ll describe a method to perform this maximization using only the model gradient $\nabla_z \log p(x, z)$.
Variational inference has a reputation of being really complicated or involving mathematical black magic. I think part of this is because the standard derivation uses Jensen’s inequality in a way that seems unintuitive. Here’s an derivation that feels easier to me, using only the notion of KL divergence between probability distributions.
tl;dr: I was confused about the precise expression for the quotient of two multivariate Gaussian densities, so I’m writing it up here.
tl;dr: I ported an R function to Python that helps avoid some numerical issues in polynomial regression.
Lorem ipsum blah blah blah. Let’s test some math: \[\sum_{n=1}^\infty 1/n^2 = \frac{\pi^2}{6}.\]
And inline: \(A^{-1}X=b\)
.