tl;dr: I was confused about the precise expression for the quotient of two multivariate Gaussian densities, so I’m writing it up here.

Suppose you want to multiply two Gaussian densities, $N(x; a, A)$ and $N(x; b, B)$.1 It’s a standard result2 that the product of two Gaussian densities is an (unnormalized) Gaussian in the same variable, \[N(x; a, A)N(x; b, B) = \alpha \cdot N(x; c, C)\\ C = \left(A^{-1} + B^{-1}\right)^{-1}\\ c = C\left(A^{-1}a + B^{-1}b\right)\\ \alpha = N(a; b, A+B).\]The precision matrices add, the means are averaged weighted by precision, and (surprisingly?) the normalizing constant $\alpha$ is also in the form of a Gaussian density. All of this is straightforward, though annoying, to prove, just by expanding out the product of the two densities, completing the square, and collecting the terms that involve $x$ (for the product), and those that don’t (for the normalizing constant).

What happens if you take the quotient of two Gaussian densities? This comes up, for example, in expectation propagation, where a newly-updated Gaussian approximate posterior is divided by the previous approximation to get the “message” that would have transformed the latter into the former. It turns out the result is \[\frac{N(x; a, A)}{N(x; b, B)} = \beta \cdot N(x; d, D)\\ D = \left(A^{-1} - B^{-1}\right)^{-1}\\ d = D\left(A^{-1}a - B^{-1}b\right)\\ \beta = \frac{|B|}{|B-A|}\frac{1}{N(a; b, B-A)}.\] Note that the form of the message—the mean and covariance $(d, D)$—is the same as you would have gotten by plugging in the negated covariance $-B$ to the product formula above; in this sense, dividing by a Gaussian is like multiplying by a Gaussian with negative variance. This follows directly from the standard identity $1/e^x = e^{-x}$. A Gaussian with negative variance (or more generally, a negative-definite covariance matrix) is kind of a weird beast: the bell curve opens upwards instead of downwards, so it can’t be normalized and is not a valid probability density, but we can treat it as a formal object that “cancels out” a certain amount of observation. If I have a Gaussian belief about some quantity, and then observe that quantity with negative-variance Gaussian noise, I am now more uncertain than I was before!

But now we get to the point of this post: the interpretation of division as multiplication by a negative-variance density is workable if you only care about the form of the result, but it falls apart when you need to compute the normalization constant (as Bayesian model evidence, for example). Plugging $-B$ into the formula for $\alpha$ above does not give the correct normalization constant for the quotient case; in fact it doesn’t in general give a real number! (the determinant $|A-B|$ will be negative, so its square root is imaginary). Doing the derivation from scratch for the quotient case yields the correct normalization constant $\beta$, given above.

  1. Notation: let $N(x; a, A) = \frac{1}{|2\pi A|^{1/2}}\exp\left(-\frac{1}{2}\left(x-a\right)^TA^{-1}\left(x-a\right)\right)$ denote a multivariate Gaussian density in the variable $x$ with mean $a$ and covariance matrix $A$

  2. I first saw this in the late Sam Roweis’ notes on Gaussian identities