Generate 100 points $x$ uniformly distributed between 0 and 1, and let $y = 2+3x+\zeta$, where $\zeta$ is a Gaussian random variable with a standard deviation of 0.5. Use an SVD to fit $y = a + bx$ to this data set, finding $a$ and $b$. Evaluate the errors in $a$ and $b$
(a) With equation (12.34)
My initial attempt was a good start, but in order to get the correct answer, see Neil's example below for the full correct solution.
# Adapted from Neil's svdfit.r code
import math
import matplotlib.pyplot as plt
import random
npts = 100
nfits = 100
std = 0.5
x = np.random.uniform(0, 1, (npts,1))
zeta = np.random.normal(0, std, npts)
y = np.zeros((npts,1))
for i in range (npts):
y[i] = 2 + 3*x[i] + zeta[i]
m = np.ones((npts,1))
M = np.concatenate((m,x), axis=1)
U, w, VT = np.linalg.svd(M)
w_diag = np.diag(w)
w_inv = np.linalg.inv(w_diag)
w_inverse = np.zeros((2,npts))
w_inverse[0][0] = w_inv[0][0]
w_inverse[1][1] = w_inv[1][1]
UT = np.transpose(U)
V = np.transpose(VT)
temp1 = V.dot(w_inverse)
temp2 = temp1.dot(UT)
ans = temp2.dot(y)
err1 = np.sqrt((V[0,]**2/w**2).sum())*std
err2 = np.sqrt((V[1,]**2/w**2).sum())*std
print("SVD: a = ", ans[0], " astd = ", err1, " b = ", ans[1], " bstd = ", err2)
(b) By bootstrap sampling to generate 100 data sets
a = 0
a2 = 0
b = 0
b2 = 0
for i in range(nfits):
index = np.random.uniform(0, (npts-1), (npts,1))
for j in range(npts):
index[j] = math.ceil(index[j])
xboot = np.zeros((npts,1))
yboot = np.zeros((npts,1))
for k in range(npts):
xboot[k] = x[int(index[k])]
yboot[k] = y[int(index[k])]
m = np.ones((npts,1))
M = np.concatenate((m,xboot), axis=1)
U, w, VT = np.linalg.svd(M)
w_diag = np.diag(w)
w_inv = np.linalg.inv(w_diag)
w_inverse = np.zeros((2,npts))
w_inverse[0][0] = w_inv[0][0]
w_inverse[1][1] = w_inv[1][1]
UT = np.transpose(U)
V = np.transpose(VT)
temp1 = V.dot(w_inverse)
temp2 = temp1.dot(UT)
coeff = temp2.dot(yboot)
a = a + coeff[0]
a2 = a2 + coeff[0]**2
b = b + coeff[1]
b2 = b2 + coeff[1]**2
a_mean = a/nfits
a_std = np.sqrt(a2/nfits - a_mean**2)
b_mean = b/nfits
b_std = np.sqrt(b2/nfits - b_mean**2)
print("Bootstrap: a = ", a_mean, " astd = ", a_std, " b = ", b_mean, " bstd = ", b_std)
(c) From fitting an ensemble of 100 independent data sets
a = 0
a2 = 0
b = 0
b2 = 0
for i in range(nfits):
x = np.random.uniform(0, 1, (npts,1))
zeta = np.random.normal(0, std, npts)
y = np.zeros((npts,1))
for i in range (npts):
y[i] = 2 + 3*x[i] + zeta[i]
m = np.ones((npts,1))
M = np.concatenate((m,x), axis=1)
U, w, VT = np.linalg.svd(M)
w_diag = np.diag(w)
w_inv = np.linalg.inv(w_diag)
w_inverse = np.zeros((2,npts))
w_inverse[0][0] = w_inv[0][0]
w_inverse[1][1] = w_inv[1][1]
UT = np.transpose(U)
V = np.transpose(VT)
temp1 = V.dot(w_inverse)
temp2 = temp1.dot(UT)
coeff = temp2.dot(y)
a = a + coeff[0]
a2 = a2 + coeff[0]**2
b = b + coeff[1]
b2 = b2 + coeff[1]**2
a_mean = a/nfits
a_std = np.sqrt(a2/nfits - a_mean**2)
b_mean = b/nfits
b_std = np.sqrt(b2/nfits - b_mean**2)
print("Ensemble: a = ", a_mean, " astd = ", a_std, " b = ", b_mean, " bstd = ", b_std)
yfit = coeff[0] + coeff[1]*x
plt.plot(x, y, 'o')
plt.plot(x, yfit)
plt.show()
Generate 100 points $x$ uniformly distributed between 0 and 1, and let $y = sin(2+3x) + \zeta$, where $\zeta$ is a Gaussian random variable with a standard deviation of 0.1. Write a Levenberg-Marquardt routine to fit $y = sin(a+bx)$ to this data set starting from $a = b = 1$ (remembering that the second-derivative term can be dropped in the Hessian), and investigate the convergence for both fixed and adaptively adjusted $\lambda$ values.
# From Neil's lm.py code
import matplotlib.pyplot as plt
import numpy as np
def run(L, L_scale):
n_pts = 100
std = 0.1
mean = 0
err = 0
tolerance = 1e-6
Lambda = L
Lambda_scale = L_scale
n_max = 100
def f(x): return np.sin(a[0] + a[1]*x)
def df0(x): return np.cos(a[0] + a[1]*x)
def df1(x): return x*np.cos(a[0] + a[1]*x)
zeta = np.random.normal(mean, std, n_pts)
a = np.array([[2],[3]])
x = np.random.rand(n_pts)
y = f(x) + zeta
plt.plot(x, y, 'ko')
a = np.array([[1],[1]])
x_plot = np.arange(0, 1, 0.01)
M = np.zeros((2,2))
grad = np.zeros((2,1))
n = 0
while 1:
n = n + 1
if (n > n_max):
break
grad[0] = -2*sum((y - f(x))*df0(x))
grad[1] = -2*sum((y - f(x))*df1(x))
M[0][0] = sum(df0(x)*df0(x))*(1 + Lambda)
M[0][1] = sum(df0(x)*df1(x))
M[1][0] = sum(df1(x)*df0(x))
M[1][1] = sum(df1(x)*df1(x))*(1 + Lambda)
M_inv = np.linalg.inv(M)
a = a - np.dot(M_inv, grad)
y_plot = f(x_plot)
plt.plot(x_plot, y_plot, 'k-')
new_err = sum((y - f(x))**2)
# print("n:", n)
# print(" a:", a)
# print(" lambda:", Lambda)
# print(" error:", new_err)
if ((abs(new_err - err)/new_err) < tolerance):
break
if new_err > err:
Lambda = Lambda*Lambda_scale
else:
Lambda = Lambda/Lambda_scale
err = new_err
plt.draw()
return
run(100,1.0)
run(0,1.0)
run(100,3.0)
An alternative way to choose among models is to select the one that makes the weakest assumptions about the data; this is the purpose of maximum entropy methods. Assume that what is measured is a set of expectation values for functions $f_i$ of a random variable $x$,
$\langle f_i(x) \rangle = \int_{-\infty}^{\infty} p(x) f_i(x) dx$
(a) Given these measurements, find the compatible normalized probability distribution $p(x)$ that maximizes the differential entropy
$S = -\int_{-\infty}^{\infty} p(x) \text{log} p(x) dx$
$0 = \delta \mathcal{I}$
$ = \delta \left[-\int_{-\infty}^{\infty} p(x) \text{log} p(x) dx + \lambda_1 \int_{-\infty}^{\infty} p(x) dx + \sum\limits_i \lambda_i \int_{-\infty}^{\infty} p(x) f_i(x) dx\right]$
$implies 0 = \frac{\partial}{\partial p} \left[-p \text{log} p + \lambda_1 p + \sum\limits_i \lambda_i p f_i(x)\right]$
$ = -\text{log} p - 1 + \lambda_1 + \sum\limits_i f_i(x)$
$\implies p(x) = e^{-1 + \lambda_1 + \sum_i \lambda_i f_i(x)}$
The Lagrange multipliers are then found from the constraints.
(b) What is the maximum entropy distribution if we know only the second moment?
$\sigma^2 = \int_{-\infty}^{\infty} p(x) x^2 dx$
In this case $f(x) = x^2$, therefore
$p(x) = e^{-(1 + \lambda_1 + \lambda_2 x^2)}$
One constraint comes from normalization:
$1 = \int_{-\infty}^{\infty} p(x) dx$
$ = \frac{\sqrt{\pi} \text{erf} (x \sqrt{\lambda_2}) e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}}\bigg|_{-\infty}^{\infty}$
$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}} + \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}}$
$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{\sqrt{\lambda_2}}$
and the other comes from the second moment:
$\sigma^2 = \int_{-\infty}^{\infty} p(x) x^2 dx$
$ = \frac{\sqrt{\pi} \text{erf} (x \sqrt{\lambda_2}) e_{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}} - \frac{x e^{-\lambda_2 x^2} \sqrt{\lambda} e^{-(1 + \lambda_1)}}{2 \lambda_2^{3/2}} \bigg|_{-\infty}^{\infty}$
$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}} + \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}}$
$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \lambda_2^{3/2}}$
These two equations and two unknowns are solved by
$\lambda_1 = \text{log} \sqrt{2 \pi \sigma^2} - 1, \; \lambda_2 = \frac{1}{2 \sigma^2}$
which gives us
$p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-x^2/(2 \sigma^2)}$
Now consider the reverse situation. Let's say that we know that a data set $\{x_n\}_{n=1}^N$ was drawn from a Gaussian distribution with variance $\sigma^2$ and unknown mean $\mu$. Try to find an optimal estimator of the mean (one that is unbiased and has the smallest possible error in the estimate).
We will guess the average of the samples
$\mu \approx f(x_1, \cdots, x_N) = \frac{1}{N} \sum\limits_{n=1}^N x_n$
First, we'll check to see if it's unbiased:
$\langle f \rangle = \left\langle \frac{1}{N} \sum\limits_{n=1}^N x_n \right\rangle$
$ = \frac{1}{N} \sum\limits_{n=1}^N \langle x_n \rangle$
$ = \frac{1}{N} \sum\limits_{n=1}^N \mu$
$ = \mu$
It is unbiased. Next, we find the variance in the estimate of the mean:
$\langle (f - \mu)^2 \rangle = \left\langle \left[ \frac{1}{N} \sum\limits_{n=1}^N x_n - \mu \right]^2 \right\rangle$
$ = \left\langle \left[ \frac{1}{N} \sum\limits_{n=1}^N (x_n - \mu) \right]^2 \right\rangle$
$ = \left\langle \frac{1}{N} \sum\limits_{n=1}^N (x_n - \mu) \frac{1}{N} \sum\limits_{n'=1}^N (x_{n'} - \mu) \right\rangle$
$ = \frac{1}{N^2} \sum\limits_{n=1}^N \langle (x_n - \mu)^2 \rangle$
$ = \frac{\sigma^2}{N}$
Finally, we calculate the Fisher information of the mean of the distribution
$J_N(\mu) = N J(\mu)$
$ = N \left\langle \left[ \frac{\partial}{\partial \mu} \text{log} \left( \frac{1}{\sqrt{2 \pi \sigma^2}} e^{(x - \mu)^2/(2 \sigma^2)} \right) \right]^2 \right\rangle$
$ = N \left\langle \left( \frac{x - \mu}{\sigma^2} \right)^2 \right\rangle$
$ = \frac{N}{\sigma^2}$
Comparing these, we see that
$\langle (f - \mu)^2 \rangle = \frac{1}{J_N(\mu)}$