Optimization of Expected Value

Framework

Say we have a function that maps some high dimensional space to a real value. In a conventional optimization approach, we would try to minimize , and the relevant gradient would be . An alternative approach seeks to minimize the expected value of with respect to a probability distribution. In particular, assume we have a distribution over the domain of that’s parametrized by . Unlike the direct approach, we can compute the gradient of this expectation without differentiating through .

So as long as we can draw samples from , we have an unbiased estimator.

So we can do gradient descent in order to improve our distribution.

Using a Multivariate Gaussian

Suppose our distribution is a multivariate Gaussian with mean and covariance matrix .