# Week 7 - Function Fitting¶

## Problem 12.1:¶

### Generate $100$ points $x$ uniformly distributed between $0$ and $1$, and let $y = 2+3x+ζ$, where $ζ$ is a Gaussian random variable with a standard deviation of $0.5$. Use an SVD to fit $y = a + bx$ to this data set, finding $a$ and $b$. Evaluate the errors in $a$ and $b$.¶

#### (a) With equation (12.34)¶

In [4]:
import numpy as np
import matplotlib.pyplot as plt

In [41]:
noise_sd = .5

x = np.random.random(100)
y = 2 + 3*x + np.random.normal(0, noise_sd, 100)

X = np.ones((100, 2))
X[:, 1] = x

In [43]:
U, S, Vt = np.linalg.svd(X, full_matrices=False)
V = Vt.transpose()

error_sd = noise_sd*np.sum((V*V)/(S*S), axis=1)
print(error_sd)

[ 0.02049148  0.06164562]


#### (b) By bootstrap sampling to generate 100 data sets¶

In [53]:
def svd_inv(S):
S = 1/S
for i in range(len(S)):
if S[i] == np.inf:
S[i] = 0

return S

In [71]:
a = []

for i in range(100):
idx = np.random.choice(100, 100, replace=True)
new_X = X[idx]
new_y = y[idx]

U, S, Vt = np.linalg.svd(new_X, full_matrices=False)
V = Vt.transpose()

S_inv = svd_inv(S)

a.append(np.matmul(np.matmul(np.matmul(V, np.diag(S_inv)),
U.transpose()), new_y))
error_sd = np.std(a, axis=0)
print(error_sd)

[ 0.09517691  0.15238391]


#### (c) From fitting an ensemble of 100 independent data sets¶

In [80]:
a = []

for i in range(100):
x = np.random.random(100)
new_y = 2 + 3*x + np.random.normal(0, noise_sd, 100)

new_X = np.ones((100, 2))
new_X[:, 1] = x

U, S, Vt = np.linalg.svd(new_X, full_matrices=False)
V = Vt.transpose()

S_inv = svd_inv(S)

a.append(np.matmul(np.matmul(np.matmul(V, np.diag(S_inv)),
U.transpose()), new_y))

error_sd = np.std(a, axis=0)
print(error_sd)

[ 0.09004906  0.15082748]


## Problem 12.2:¶

### Generate $100$ points $x$ uniformly distributed between $0$ and $1$, and let $y = sin(2 + 3x) + ζ$, where $ζ$ is a Gaussian random variable with a standard deviation of $0.1$. Write a Levenberg-Marquardt routine to fit $y = sin(a+bx)$ to this data set starting from $a = b = 1$, and investigate the convergence for both fixed and adaptively adjusted $λ$ values.¶

In [35]:
noise_sd = .1

x = np.random.random(100)
y = np.sin(2 + 3*x) + np.random.normal(0, noise_sd, 100)

In [39]:
def LM(x, y, a, b, l, adapt=False):
a = [a]
b = [b]
l = [l]
error = np.sum((y-np.sin(a[-1]+b[-1]*x))**2)

for step in range(10**3):
np.sum((y - np.sin(a[-1]+b[-1]*x))*np.cos(a[-1]+b[-1]*x)*x)])

H = 2*np.array([[np.sum(np.cos(a[-1]+b[-1]*x)**2), np.sum(x*(np.cos(a[-1]+b[-1]*x)**2))],
[np.sum(x*(np.cos(a[-1]+b[-1]*x)**2)), np.sum(x*x*(np.cos(a[-1]+b[-1]*x)**2))]])

M = .5*H*np.array([[1+l[-1], 1], [1, 1+l[-1]]])

D = -1 * np.matmul(np.linalg.inv(M), grad.transpose())

a.append(a[-1] + D[0])
b.append(b[-1] + D[1])

new_error = np.sum((y-np.sin(a[-1]+b[-1]*x))**2)

if new_error < error:
l.append(l[-1]*.8)
else:
l.append(l[-1]*1.3)

error = new_error

print('Estimation: a=%f, b=%f' % (a[-1], b[-1]))
print('Mean Squered Error = %f' % error)

print('Estimated function:')
#     fig, ax = plt.subplots(ncols=1, nrows=4)
#     ax[0] = plt.scatter(x, np.sin(a[-1]+b[-1]*x))
#     ax[0].set_label('Estimated function')
plt.scatter(x, np.sin(a[-1]+b[-1]*x))
plt.show()

print('Convergence of a:')
plt.plot(np.arange(len(a)), a)
plt.show()

print('Convergence of b:')
plt.plot(np.arange(len(b)), b)
plt.show()

print('Lambda:')
plt.plot(np.arange(len(l)), l)
plt.show()


#### Non-Adaptive, initial a=b=1, initial $\lambda$=100:¶

In [47]:
LM(x, y, 1, 1, 100, adapt=False)

Estimation: a=2.034997, b=2.884819
Mean Squered Error = 0.848885
Estimated function:

Convergence of a:

Convergence of b:


#### Non-Adaptive, initial a=b=1, initial $\lambda$=1:¶

In [41]:
LM(x, y, 1, 1, 1, adapt=False)

Estimation: a=2.013658, b=2.931337
Mean Squered Error = 0.844730
Estimated function:

Convergence of a:

Convergence of b:


#### Adaptive, initial a=b=1, initial $\lambda$=100:¶

In [48]:
LM(x, y, 1, 1, 100, adapt=True)

Estimation: a=2.013658, b=2.931337
Mean Squered Error = 0.844730
Estimated function:

Convergence of a:

Convergence of b:

Lambda:


#### Adaptive, initial a=b=1, initial $\lambda$=1:¶

In [45]:
LM(x, y, 1, 1, 1, adapt=True)

Estimation: a=2.013483, b=2.930582
Mean Squered Error = 0.844745
Estimated function:

Convergence of a:

Convergence of b:

Lambda:


## Problem 12.3:¶

### An alternative way to choose among models is to select the one that makes the weakest assumptions about the data; this is the purpose of maximum entropy methods. Assume that what is measured is a set of expectation values for functions $f_i$ of a random variable $x$, $= \int_{-\infty}^\infty p(x)f_i(x) dx$ .

#### (a) Given these measurements, find the compatible normalized probability distribution $p(x)$ that maximizes the differential entropy $S = −\int_{-\infty}^\infty p(x) \log{p(x)} dx$ .¶

In [ ]:



#### (b) What is the maximum entropy distribution if we know only the second moment $\sigma^2 = −\int_{-\infty}^\infty p(x) x^2 dx$ ?¶

In [ ]:



## Problem 12.4:¶

### Now consider the reverse situation. Let’s say that we know that a data set $\{x_n\}^N_{n=1}$ was drawn from a Gaussian distribution with variance $σ^2$ and unknown mean $µ$. Try to find an optimal estimator of the mean (one that is unbiased and has the smallest possible error in the estimate).¶

$$P_\mu (x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$$$V = \frac{\partial}{\partial \mu} \log{P_\mu (x)} = \frac{1}{P_\mu (x)} \frac{\partial P_\mu (x)}{\partial \mu} = \frac{1}{\sigma^2} (x-\mu)$$$$J(\mu) = \left \langle \left [ \frac{1}{\sigma^2} (x-\mu) \right ]^2 \right \rangle = \frac{1}{\sigma^4} \left \langle (x-\mu)^2 \right \rangle = \frac{1}{\sigma^2}$$

An optimal unbiased estimator will be such that: $$\sigma^2(f) = \frac{1}{J(\mu)}$$

Therefore, $$\sigma^2(f) = \sigma^2$$

And by the definition of unbiased estimator, $$\langle f \rangle = \mu$$

Now, let's look at the estimator $$f(X) = \frac{1}{N} \sum_{n=1}^{N} x_n = \bar{x}$$

$$\langle f \rangle = \left \langle \frac{1}{N} \sum_{n=1}^{N} x_n \right \rangle = \frac{1}{N} \sum_{n=1}^{N} \langle x_n \rangle = \mu$$$$\sigma^2(f) = \left \langle \left [ (\frac{1}{N} \sum_{n=1}^{N} x_n) - \mu \right ]^2 \right \rangle = \left \langle \left [ \frac{1}{N} (\sum_{n=1}^{N} x_n - N \mu) \right ]^2 \right \rangle = \left \langle \left [ \frac{1}{N} \sum_{n=1}^{N} (x_n - \mu) \right ]^2 \right \rangle = \left \langle \frac{1}{N} \sum_{n=1}^{N} (x_n - \mu)^2 \right \rangle = \sigma^2$$

Therefore, f meets the requirements for an optimal estimator!