(12.1)

Generate 100 points $x$ uniformly distributed between 0 and 1, and let $y = 2+3x+\zeta$, where $\zeta$ is a Gaussian random variable with a standard deviation of 0.5. Use an SVD to fit $y = a + bx$ to this data set, finding $a$ and $b$. Evaluate the errors in $a$ and $b$

(a) With equation (12.34)

My initial attempt was a good start, but in order to get the correct answer, see Neil's example below for the full correct solution.

In [112]:
# Adapted from Neil's svdfit.r code

import math
import matplotlib.pyplot as plt
import random

npts = 100
nfits = 100
std = 0.5

x = np.random.uniform(0, 1, (npts,1))
zeta = np.random.normal(0, std, npts)
y = np.zeros((npts,1))
for i in range (npts):
    y[i] = 2 + 3*x[i] + zeta[i]
m = np.ones((npts,1))
M = np.concatenate((m,x), axis=1)
U, w, VT = np.linalg.svd(M)
w_diag = np.diag(w)
w_inv = np.linalg.inv(w_diag)
w_inverse = np.zeros((2,npts))
w_inverse[0][0] = w_inv[0][0]
w_inverse[1][1] = w_inv[1][1]
UT = np.transpose(U)
V = np.transpose(VT)

temp1 = V.dot(w_inverse)
temp2 = temp1.dot(UT)
ans = temp2.dot(y)

err1 = np.sqrt((V[0,]**2/w**2).sum())*std
err2 = np.sqrt((V[1,]**2/w**2).sum())*std

print("SVD: a = ", ans[0], " astd = ", err1, " b = ", ans[1], " bstd = ", err2)
SVD: a =  [2.01348833]  astd =  0.10333808945439447  b =  [2.96917813]  bstd =  0.18314752201972775

(b) By bootstrap sampling to generate 100 data sets

In [113]:
a = 0
a2 = 0
b = 0
b2 = 0

for i in range(nfits):
    index = np.random.uniform(0, (npts-1), (npts,1))
    for j in range(npts):
        index[j] = math.ceil(index[j])
    xboot = np.zeros((npts,1))
    yboot = np.zeros((npts,1))
    for k in range(npts):
        xboot[k] = x[int(index[k])]
        yboot[k] = y[int(index[k])]
    m = np.ones((npts,1))
    M = np.concatenate((m,xboot), axis=1)
    U, w, VT = np.linalg.svd(M)
    w_diag = np.diag(w)
    w_inv = np.linalg.inv(w_diag)
    w_inverse = np.zeros((2,npts))
    w_inverse[0][0] = w_inv[0][0]
    w_inverse[1][1] = w_inv[1][1]
    UT = np.transpose(U)
    V = np.transpose(VT)
    temp1 = V.dot(w_inverse)
    temp2 = temp1.dot(UT)
    coeff = temp2.dot(yboot)
    a = a + coeff[0]
    a2 = a2 + coeff[0]**2
    b = b + coeff[1]
    b2 = b2 + coeff[1]**2
    
a_mean = a/nfits
a_std = np.sqrt(a2/nfits - a_mean**2)
b_mean = b/nfits
b_std = np.sqrt(b2/nfits - b_mean**2)

print("Bootstrap: a = ", a_mean, " astd = ", a_std, " b = ", b_mean, " bstd = ", b_std)
Bootstrap: a =  [2.00205103]  astd =  [0.09602075]  b =  [2.98703869]  bstd =  [0.1943796]

(c) From fitting an ensemble of 100 independent data sets

In [117]:
a = 0
a2 = 0
b = 0
b2 = 0

for i in range(nfits):
    x = np.random.uniform(0, 1, (npts,1))
    zeta = np.random.normal(0, std, npts)
    y = np.zeros((npts,1))
    for i in range (npts):
        y[i] = 2 + 3*x[i] + zeta[i]
    m = np.ones((npts,1))
    M = np.concatenate((m,x), axis=1)
    U, w, VT = np.linalg.svd(M)
    w_diag = np.diag(w)
    w_inv = np.linalg.inv(w_diag)
    w_inverse = np.zeros((2,npts))
    w_inverse[0][0] = w_inv[0][0]
    w_inverse[1][1] = w_inv[1][1]
    UT = np.transpose(U)
    V = np.transpose(VT)
    temp1 = V.dot(w_inverse)
    temp2 = temp1.dot(UT)
    coeff = temp2.dot(y)
    a = a + coeff[0]
    a2 = a2 + coeff[0]**2
    b = b + coeff[1]
    b2 = b2 + coeff[1]**2
    
a_mean = a/nfits
a_std = np.sqrt(a2/nfits - a_mean**2)
b_mean = b/nfits
b_std = np.sqrt(b2/nfits - b_mean**2)

print("Ensemble: a = ", a_mean, " astd = ", a_std, " b = ", b_mean, " bstd = ", b_std)

yfit = coeff[0] + coeff[1]*x

plt.plot(x, y, 'o')
plt.plot(x, yfit)

plt.show()
Ensemble: a =  [1.98940325]  astd =  [0.09636527]  b =  [3.00816549]  bstd =  [0.16385397]

(12.2)

Generate 100 points $x$ uniformly distributed between 0 and 1, and let $y = sin(2+3x) + \zeta$, where $\zeta$ is a Gaussian random variable with a standard deviation of 0.1. Write a Levenberg-Marquardt routine to fit $y = sin(a+bx)$ to this data set starting from $a = b = 1$ (remembering that the second-derivative term can be dropped in the Hessian), and investigate the convergence for both fixed and adaptively adjusted $\lambda$ values.

In [134]:
# From Neil's lm.py code

import matplotlib.pyplot as plt
import numpy as np

def run(L, L_scale):
    n_pts = 100
    std = 0.1
    mean = 0
    err = 0
    tolerance = 1e-6
    Lambda = L
    Lambda_scale = L_scale
    n_max = 100

    def f(x): return np.sin(a[0] + a[1]*x)
    def df0(x): return np.cos(a[0] + a[1]*x)
    def df1(x): return x*np.cos(a[0] + a[1]*x)

    zeta = np.random.normal(mean, std, n_pts)
    a = np.array([[2],[3]])
    x = np.random.rand(n_pts)
    y = f(x) + zeta
    
    plt.plot(x, y, 'ko')

    a = np.array([[1],[1]])
    x_plot = np.arange(0, 1, 0.01)
    M = np.zeros((2,2))
    grad = np.zeros((2,1))
    n = 0

    while 1:
        n = n + 1

        if (n > n_max):
            break

        grad[0] = -2*sum((y - f(x))*df0(x))
        grad[1] = -2*sum((y - f(x))*df1(x))

        M[0][0] = sum(df0(x)*df0(x))*(1 + Lambda)
        M[0][1] = sum(df0(x)*df1(x))
        M[1][0] = sum(df1(x)*df0(x))
        M[1][1] = sum(df1(x)*df1(x))*(1 + Lambda)

        M_inv = np.linalg.inv(M)

        a = a - np.dot(M_inv, grad)

        y_plot = f(x_plot)
        plt.plot(x_plot, y_plot, 'k-')

        new_err = sum((y - f(x))**2)

    #     print("n:", n)
    #     print("    a:", a)
    #     print("    lambda:", Lambda)
    #     print("    error:", new_err)

        if ((abs(new_err - err)/new_err) < tolerance):
            break

        if new_err > err:
            Lambda = Lambda*Lambda_scale
        else:
            Lambda = Lambda/Lambda_scale
        err = new_err

    plt.draw()
    return

run(100,1.0)
In [135]:
run(0,1.0)
In [136]:
run(100,3.0)

(12.3)

An alternative way to choose among models is to select the one that makes the weakest assumptions about the data; this is the purpose of maximum entropy methods. Assume that what is measured is a set of expectation values for functions $f_i$ of a random variable $x$,

$\langle f_i(x) \rangle = \int_{-\infty}^{\infty} p(x) f_i(x) dx$

(a) Given these measurements, find the compatible normalized probability distribution $p(x)$ that maximizes the differential entropy

$S = -\int_{-\infty}^{\infty} p(x) \text{log} p(x) dx$

$0 = \delta \mathcal{I}$

$ = \delta \left[-\int_{-\infty}^{\infty} p(x) \text{log} p(x) dx + \lambda_1 \int_{-\infty}^{\infty} p(x) dx + \sum\limits_i \lambda_i \int_{-\infty}^{\infty} p(x) f_i(x) dx\right]$

$implies 0 = \frac{\partial}{\partial p} \left[-p \text{log} p + \lambda_1 p + \sum\limits_i \lambda_i p f_i(x)\right]$

$ = -\text{log} p - 1 + \lambda_1 + \sum\limits_i f_i(x)$

$\implies p(x) = e^{-1 + \lambda_1 + \sum_i \lambda_i f_i(x)}$

The Lagrange multipliers are then found from the constraints.

(b) What is the maximum entropy distribution if we know only the second moment?

$\sigma^2 = \int_{-\infty}^{\infty} p(x) x^2 dx$

In this case $f(x) = x^2$, therefore

$p(x) = e^{-(1 + \lambda_1 + \lambda_2 x^2)}$

One constraint comes from normalization:

$1 = \int_{-\infty}^{\infty} p(x) dx$

$ = \frac{\sqrt{\pi} \text{erf} (x \sqrt{\lambda_2}) e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}}\bigg|_{-\infty}^{\infty}$

$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}} + \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \sqrt{\lambda_2}}$

$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{\sqrt{\lambda_2}}$

and the other comes from the second moment:

$\sigma^2 = \int_{-\infty}^{\infty} p(x) x^2 dx$

$ = \frac{\sqrt{\pi} \text{erf} (x \sqrt{\lambda_2}) e_{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}} - \frac{x e^{-\lambda_2 x^2} \sqrt{\lambda} e^{-(1 + \lambda_1)}}{2 \lambda_2^{3/2}} \bigg|_{-\infty}^{\infty}$

$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}} + \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{4 \lambda_2^{3/2}}$

$ = \frac{\sqrt{\pi} e^{-(1 + \lambda_1)}}{2 \lambda_2^{3/2}}$

These two equations and two unknowns are solved by

$\lambda_1 = \text{log} \sqrt{2 \pi \sigma^2} - 1, \; \lambda_2 = \frac{1}{2 \sigma^2}$

which gives us

$p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-x^2/(2 \sigma^2)}$

(12.4)

Now consider the reverse situation. Let's say that we know that a data set $\{x_n\}_{n=1}^N$ was drawn from a Gaussian distribution with variance $\sigma^2$ and unknown mean $\mu$. Try to find an optimal estimator of the mean (one that is unbiased and has the smallest possible error in the estimate).

We will guess the average of the samples

$\mu \approx f(x_1, \cdots, x_N) = \frac{1}{N} \sum\limits_{n=1}^N x_n$

First, we'll check to see if it's unbiased:

$\langle f \rangle = \left\langle \frac{1}{N} \sum\limits_{n=1}^N x_n \right\rangle$

$ = \frac{1}{N} \sum\limits_{n=1}^N \langle x_n \rangle$

$ = \frac{1}{N} \sum\limits_{n=1}^N \mu$

$ = \mu$

It is unbiased. Next, we find the variance in the estimate of the mean:

$\langle (f - \mu)^2 \rangle = \left\langle \left[ \frac{1}{N} \sum\limits_{n=1}^N x_n - \mu \right]^2 \right\rangle$

$ = \left\langle \left[ \frac{1}{N} \sum\limits_{n=1}^N (x_n - \mu) \right]^2 \right\rangle$

$ = \left\langle \frac{1}{N} \sum\limits_{n=1}^N (x_n - \mu) \frac{1}{N} \sum\limits_{n'=1}^N (x_{n'} - \mu) \right\rangle$

$ = \frac{1}{N^2} \sum\limits_{n=1}^N \langle (x_n - \mu)^2 \rangle$

$ = \frac{\sigma^2}{N}$

Finally, we calculate the Fisher information of the mean of the distribution

$J_N(\mu) = N J(\mu)$

$ = N \left\langle \left[ \frac{\partial}{\partial \mu} \text{log} \left( \frac{1}{\sqrt{2 \pi \sigma^2}} e^{(x - \mu)^2/(2 \sigma^2)} \right) \right]^2 \right\rangle$

$ = N \left\langle \left( \frac{x - \mu}{\sigma^2} \right)^2 \right\rangle$

$ = \frac{N}{\sigma^2}$

Comparing these, we see that

$\langle (f - \mu)^2 \rangle = \frac{1}{J_N(\mu)}$