Week 7 - Function Fitting¶

Problem 12.1:¶

Generate $100$ points $x$ uniformly distributed between $0$ and $1$, and let $y = 2+3x+ζ$, where $ζ$ is a Gaussian random variable with a standard deviation of $0.5$. Use an SVD to fit $y = a + bx$ to this data set, finding $a$ and $b$. Evaluate the errors in $a$ and $b$.¶

(a) With equation (12.34)¶

import numpy as np
import matplotlib.pyplot as plt

noise_sd = .5

x = np.random.random(100)
y = 2 + 3*x + np.random.normal(0, noise_sd, 100)

X = np.ones((100, 2))
X[:, 1] = x

U, S, Vt = np.linalg.svd(X, full_matrices=False)
V = Vt.transpose()

error_sd = noise_sd*np.sum((V*V)/(S*S), axis=1)
print(error_sd)

[ 0.02049148  0.06164562]

(b) By bootstrap sampling to generate 100 data sets¶

def svd_inv(S):
    S = 1/S
    for i in range(len(S)):
        if S[i] == np.inf:
            S[i] = 0
            
    return S

a = []

for i in range(100):
    idx = np.random.choice(100, 100, replace=True)
    new_X = X[idx]
    new_y = y[idx]
    
    U, S, Vt = np.linalg.svd(new_X, full_matrices=False)
    V = Vt.transpose()
    
    S_inv = svd_inv(S)

    a.append(np.matmul(np.matmul(np.matmul(V, np.diag(S_inv)), 
                                  U.transpose()), new_y))
error_sd = np.std(a, axis=0)
print(error_sd)

[ 0.09517691  0.15238391]

(c) From fitting an ensemble of 100 independent data sets¶

a = []

for i in range(100):
    x = np.random.random(100)
    new_y = 2 + 3*x + np.random.normal(0, noise_sd, 100)

    new_X = np.ones((100, 2))
    new_X[:, 1] = x
    
    U, S, Vt = np.linalg.svd(new_X, full_matrices=False)
    V = Vt.transpose()
    
    S_inv = svd_inv(S)

    a.append(np.matmul(np.matmul(np.matmul(V, np.diag(S_inv)), 
                                  U.transpose()), new_y))
    
error_sd = np.std(a, axis=0)
print(error_sd)

[ 0.09004906  0.15082748]

Problem 12.2:¶

Generate $100$ points $x$ uniformly distributed between $0$ and $1$, and let $y = sin(2 + 3x) + ζ$, where $ζ$ is a Gaussian random variable with a standard deviation of $0.1$. Write a Levenberg-Marquardt routine to fit $y = sin(a+bx)$ to this data set starting from $a = b = 1$, and investigate the convergence for both fixed and adaptively adjusted $λ$ values.¶

noise_sd = .1

x = np.random.random(100)
y = np.sin(2 + 3*x) + np.random.normal(0, noise_sd, 100)

def LM(x, y, a, b, l, adapt=False):
    a = [a]
    b = [b]
    l = [l]
    error = np.sum((y-np.sin(a[-1]+b[-1]*x))**2)

    for step in range(10**3):
        grad = -2*np.array([np.sum((y - np.sin(a[-1]+b[-1]*x))*np.cos(a[-1]+b[-1]*x)),
                              np.sum((y - np.sin(a[-1]+b[-1]*x))*np.cos(a[-1]+b[-1]*x)*x)])
        
        H = 2*np.array([[np.sum(np.cos(a[-1]+b[-1]*x)**2), np.sum(x*(np.cos(a[-1]+b[-1]*x)**2))],
                        [np.sum(x*(np.cos(a[-1]+b[-1]*x)**2)), np.sum(x*x*(np.cos(a[-1]+b[-1]*x)**2))]])

        M = .5*H*np.array([[1+l[-1], 1], [1, 1+l[-1]]])

        D = -1 * np.matmul(np.linalg.inv(M), grad.transpose())

        a.append(a[-1] + D[0])
        b.append(b[-1] + D[1])

        new_error = np.sum((y-np.sin(a[-1]+b[-1]*x))**2)
    
        if adapt:
            if new_error < error:
                l.append(l[-1]*.8)
            else:
                l.append(l[-1]*1.3)

        error = new_error

    print('Estimation: a=%f, b=%f' % (a[-1], b[-1]))
    print('Mean Squered Error = %f' % error)
    
    print('Estimated function:')
#     fig, ax = plt.subplots(ncols=1, nrows=4)
#     ax[0] = plt.scatter(x, np.sin(a[-1]+b[-1]*x))
#     ax[0].set_label('Estimated function')
    plt.scatter(x, np.sin(a[-1]+b[-1]*x))
    plt.show()
    
    print('Convergence of a:')
    plt.plot(np.arange(len(a)), a)
    plt.show()
    
    print('Convergence of b:')
    plt.plot(np.arange(len(b)), b)
    plt.show()
    
    if adapt:
        print('Lambda:')
        plt.plot(np.arange(len(l)), l)
        plt.show()

Non-Adaptive, initial a=b=1, initial $\lambda$=100:¶

LM(x, y, 1, 1, 100, adapt=False)

Estimation: a=2.034997, b=2.884819
Mean Squered Error = 0.848885
Estimated function:

Convergence of a:

Convergence of b:

Non-Adaptive, initial a=b=1, initial $\lambda$=1:¶

LM(x, y, 1, 1, 1, adapt=False)

Estimation: a=2.013658, b=2.931337
Mean Squered Error = 0.844730
Estimated function:

Convergence of a:

Convergence of b:

Adaptive, initial a=b=1, initial $\lambda$=100:¶

LM(x, y, 1, 1, 100, adapt=True)

Estimation: a=2.013658, b=2.931337
Mean Squered Error = 0.844730
Estimated function:

Convergence of a:

Convergence of b:

Lambda:

Adaptive, initial a=b=1, initial $\lambda$=1:¶

LM(x, y, 1, 1, 1, adapt=True)

Estimation: a=2.013483, b=2.930582
Mean Squered Error = 0.844745
Estimated function:

Convergence of a:

Convergence of b:

Lambda:

Problem 12.3:¶

An alternative way to choose among models is to select the one that makes the weakest assumptions about the data; this is the purpose of maximum entropy methods. Assume that what is measured is a set of expectation values for functions $f_i$ of a random variable $x$, $ = \int_{-\infty}^\infty p(x)f_i(x) dx$ .

(a) Given these measurements, find the compatible normalized probability distribution $p(x)$ that maximizes the differential entropy $S = −\int_{-\infty}^\infty p(x) \log{p(x)} dx$ .¶

(b) What is the maximum entropy distribution if we know only the second moment $\sigma^2 = −\int_{-\infty}^\infty p(x) x^2 dx$ ?¶

Problem 12.4:¶

Now consider the reverse situation. Let’s say that we know that a data set $\{x_n\}^N_{n=1}$ was drawn from a Gaussian distribution with variance $σ^2$ and unknown mean $µ$. Try to find an optimal estimator of the mean (one that is unbiased and has the smallest possible error in the estimate).¶

$$ P_\mu (x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$$$ V = \frac{\partial}{\partial \mu} \log{P_\mu (x)} = \frac{1}{P_\mu (x)} \frac{\partial P_\mu (x)}{\partial \mu} = \frac{1}{\sigma^2} (x-\mu)$$$$ J(\mu) = \left \langle \left [ \frac{1}{\sigma^2} (x-\mu) \right ]^2 \right \rangle = \frac{1}{\sigma^4} \left \langle (x-\mu)^2 \right \rangle = \frac{1}{\sigma^2}$$

An optimal unbiased estimator will be such that: $$ \sigma^2(f) = \frac{1}{J(\mu)} $$

Therefore, $$ \sigma^2(f) = \sigma^2 $$

And by the definition of unbiased estimator, $$ \langle f \rangle = \mu $$

Now, let's look at the estimator $$ f(X) = \frac{1}{N} \sum_{n=1}^{N} x_n = \bar{x} $$

$$ \langle f \rangle = \left \langle \frac{1}{N} \sum_{n=1}^{N} x_n \right \rangle = \frac{1}{N} \sum_{n=1}^{N} \langle x_n \rangle = \mu $$$$ \sigma^2(f) = \left \langle \left [ (\frac{1}{N} \sum_{n=1}^{N} x_n) - \mu \right ]^2 \right \rangle = \left \langle \left [ \frac{1}{N} (\sum_{n=1}^{N} x_n - N \mu) \right ]^2 \right \rangle = \left \langle \left [ \frac{1}{N} \sum_{n=1}^{N} (x_n - \mu) \right ]^2 \right \rangle = \left \langle \frac{1}{N} \sum_{n=1}^{N} (x_n - \mu)^2 \right \rangle = \sigma^2 $$

Therefore, f meets the requirements for an optimal estimator!