NMM week 8: Function Fitting¶

Question 1¶

Generate 100 points x uniformly distributed between 0 and 1, and let y = 2+3x+ζ, where ζ is a Gaussian random variable with a standard deviation of 0.5. Use an SVD to fit y = a + bx to this data set, finding a and b. Evaluate the errors in a and b (a) With equation (12.34), (b) By bootstrap sampling to generate 100 data sets, (c) From fitting an ensemble of 100 independent data sets.¶

(a) With equation (12.34)¶

The equation is:

$$ \frac{\sigma^2_{\eta,i}}{\sigma^2_\zeta} = \sum_j{\frac{V^2_{ij}}{w^2_j}} $$

$\eta$ is number of samples

Approach

  • Recreate matrix (12.18). --> concatenate
  • First two columns of the matrix
  • Trying to reproduce a and b from y=a+bx, a~2 b~3.
In [82]:
import numpy as np

N = 100 
x = np.random.uniform(0, 1, N)
σ = 0.5 # Standard deviation
ζ = np.random.normal(0, σ, N) # Gaussian random variable with a standard deviation of 0.5.
y = 2 + 3 * x + ζ 

print(y)
[3.83841573 2.37934419 4.64126511 2.38579234 3.18519615 5.1696597
 3.83204179 2.83691669 3.45737728 3.30458597 4.15034004 4.32694262
 3.12813568 3.40180422 5.10324228 2.73471991 4.15302545 3.96879935
 2.13793946 4.21953912 4.71864388 2.74989445 4.65145764 3.32987668
 3.58648756 3.57106525 5.24812975 4.54063726 2.34132565 2.90598766
 2.8917754  2.73334846 5.32056485 4.21352491 2.66590592 3.6695892
 3.9231272  4.68720366 4.22082257 3.0760049  4.06808217 5.0279926
 1.42526115 3.34682996 3.97973206 3.52066468 2.2452014  3.5902408
 3.89653017 2.4185273  4.09930324 2.916523   4.32150642 3.68584578
 2.65654489 4.85304576 2.86077689 4.59538269 1.37524084 4.17659751
 4.41362427 2.86854573 2.62213989 2.15033856 5.32324482 3.81417235
 3.13008336 2.77314183 4.09951682 3.34544619 5.52118401 2.7541611
 4.5504779  4.22965538 3.42319438 5.36750224 3.91001752 3.96522175
 2.99839066 5.01517731 3.15000966 4.98179781 5.3446218  3.3154342
 3.14356095 3.47310786 1.9938163  5.09279428 4.48077119 4.67747833
 2.08394667 3.81591355 4.56303597 3.57195345 4.18132896 4.77283027
 3.6721198  4.31432804 4.72268064 4.78742079]

np.stack is for joining a sequence of arrays along a new axis (Matplotlib).

In [107]:
import numpy as np
import matplotlib.pyplot as plt

# Constants and given values
N = 100
x = np.random.uniform(0, 1, N)
y = 2 + 3 * x + ζ 
σ = 0.5
ζ = np.random.normal(0, σ, N) # Gaussian random variable with a standard deviation of 0.5.

# Create dataset
def dataset(N, σ, x, y, ζ):
    return x, y

x, y = dataset(N, σ, x, y, ζ)

plt.plot(x, y, 'o', color="pink")
plt.title('Dataset of x = 0 to 1 and y=2+3x+ζ')
plt.show()

# SVD fitting to y = a + bx
def svd_fitting(x, y):
    matrix = np.stack([np.ones(N), x], axis=-1)
    print("A", matrix.shape)

    u, s, vh = np.linalg.svd(matrix, full_matrices=False)
    print("U", u.shape)
    print("W", s.shape)
    print("V^T", vh.shape)

    mat_inv = (vh.T * (1 / s)).dot(u.T)
    print("V * W^{-1} * U^T", mat_inv.shape)

    solution = mat_inv.dot(y)
    return solution

solution = svd_fitting(x, y)
print("Solution: ", solution)
A (100, 2)
U (100, 2)
W (2,)
V^T (2, 2)
V * W^{-1} * U^T (2, 100)
Solution:  [1.83369504 3.11963543]

Next, adding the error estimation.

In [86]:
import numpy as np
import matplotlib.pyplot as plt

# Constants and given values
N = 100
x = np.random.uniform(0, 1, N)
y = 2 + 3 * x + ζ 
σ = 0.5
ζ = np.random.normal(0, σ, N) # Gaussian random variable with a standard deviation of 0.5.

# Create dataset
def dataset(N, σ, x, y, ζ):
    return x, y

x, y = dataset(N, σ, x, y, ζ)

plt.plot(x, y, 'o', color="pink")
plt.title('Dataset of x is 0 to 1 and y=2+3x+ζ')
plt.show()

# SVD fitting to y = a + bx
def svd_fitting(x, y):
    matrix = np.stack([np.ones(N), x], axis=-1)
    print("A", matrix.shape)

    u, s, vh = np.linalg.svd(matrix, full_matrices=False)
    print("U", u.shape)
    print("W", s.shape)
    print("V^T", vh.shape)

    mat_inv = (vh.T * (1 / s)).dot(u.T)
    print("V * W^{-1} * U^T", mat_inv.shape)

    solution = mat_inv.dot(y)
    return solution

solution = svd_fitting(x, y)
print("Solution: ", solution)


# Error estimation
errs = []
for i in range(2):
    err = 0
    for j in range(2):
        err += (vh[j, i] / s[j]) ** 2
    errs.append(err)
    
print("a error: %.4f" % np.sqrt(errs[0] * (σ ** 2)))
print("b error: %.4f" % np.sqrt(errs[1] * (σ ** 2)))
A (100, 2)
U (100, 2)
W (2,)
V^T (2, 2)
V * W^{-1} * U^T (2, 100)
Solution:  [2.02601046 2.92431077]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[86], line 46
     44     err = 0
     45     for j in range(2):
---> 46         err += (vh[j, i] / s[j]) ** 2
     47     errs.append(err)
     49 print("a error: %.4f" % np.sqrt(errs[0] * (σ ** 2)))

NameError: name 'vh' is not defined