Week 10

Function Architectures

Link to completed assignment

Problem 14.1

To find the first five diagonal Pade approximants [1/1],...,[5/5] to the function e^x, we start by taking the Taylor series expansion of the function: $$e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$$ As in Eq. (14.3), we will compare that Taylor expansion to the general form: $$f(x) = \sum_{l=0}^{\infty} c_l x^l$$ Which will then make it fit nicely in the Pade approximant formula as in Eq. (14.4): $$\frac{\sum_{n=0}^N a_n x^n}{1 + \sum_{m=1}^M b_m x^m} = \sum_{l=0}^L c_l x^l$$ After a little more solving, we can get the following equations to help us solve for our a and b coefficients. First, Eq. (14.6) gives us a: $$a_n = c_n + \sum_{m=1}^N b_m c_{n-m} \quad (n=0,...,N)$$ Remember that $$c_n = \frac{1}{n!}$$ Which helps us figure out how to index the different coefficients arrays and matrices in the code. We also have b from Eq. (14.7): $$0 = c_l + \sum_{m=1}^M b_m c_{l-m} \quad (l=N+1,...,N+M)$$ We have to do a little bit of linear algebra to extract the b coefficients out of there, but it all works out. Once all that is taken care of, we can take the original fraction from the left-hand side of Eq. (14.4) to find out the value of the Pade approximation, plugging in x=1. We will also check to see how accurate the approximation is to the real value of e¹ = 2.718281828459045.


      import math
      import numpy as np

      def pade_approx(N):
          b = solve_b_coef(N)
          a = solve_a_coef(N, b)
          num = a.dot(np.ones(N+1))
          den = 1 + b.dot(np.ones(N))
          val = num / den
          return val

      def solve_b_coef(N):
          C_lm = np.zeros((N, N))
          C_l = np.zeros(N)
          for i in range(N):
              C_l[i] = -1/math.factorial(N+1+i)
              for j in range(N):
                  C_lm[i][j] = 1/math.factorial(N+1+i-(1+j))
          B = np.linalg.pinv(C_lm).dot(C_l)
          return B

      def solve_a_coef(N, B):
          A = np.zeros(N+1)
          for i in range(N+1):
              A[i] += 1/math.factorial(i)
              for j in range(1, i+1):
                  A[i] += B[j-1]/math.factorial(i-j)
          return A

      e = 2.718281828459045
      start = 1
      stop = 5
      print()

      for N in range (start,stop+1):
          val = pade_approx(N)
          error = abs(val - e)

          print("Pade Approximant", N)
          print("Value:    ", val)
          print("Actual:   ", e)
          print("|Error|:  ", error)
          print()

The code above gives us the following results:


      Pade Approximant 1
      Value:     3.0
      Actual:    2.718281828459045
      |Error|:   0.2817181715409549

      Pade Approximant 2
      Value:     2.7142857142857144
      Actual:    2.718281828459045
      |Error|:   0.003996114173330678

      Pade Approximant 3
      Value:     2.71830985915493
      Actual:    2.718281828459045
      |Error|:   2.8030695884861956e-05

      Pade Approximant 4
      Value:     2.718281718281719
      Actual:    2.718281828459045
      |Error|:   1.1017732592932816e-07

      Pade Approximant 5
      Value:     2.7182818287356953
      Actual:    2.718281828459045
      |Error|:   2.7665025825740486e-10

We can see that the error improves greatly with the order of the Pade approximants. For the first five diagonal Pade approximants, the error improves by 2 or 3 orders of magnitude with each step.

Problem 14.2

We will be working with the following data set: $$x = {-10,\ -9,\ ...,\ 9,\ 10}$$ $$y = \begin{cases} 0, \ x \le 0 \\[2ex] 1, \ x \gt 0 \\[2ex] \end{cases}$$


      import matplotlib
      matplotlib.use('Agg')
      import matplotlib.pyplot as plt
      import numpy as np

      def data(x_min, x_max):
          x = np.arange(x_min, x_max + 1)
          n = len(x)
          y = np.zeros(n)
          for i in range(n):
              if (x[i] > 0):
                  y[i] = 1
          return x, y, n

      def plot_data(x, y):
          plt.clf()
          plt.plot(x, y, label='Data')
          plt.xlim(x[0] - 1, x[-1] + 1)
          plt.ylim(y[0] - 0.2, y[-1] + 0.2)

      x, y, n = data(-10, 10)
      plot_data(x, y)

(a) First, we will fit the data with a polynomial with 5, 10, and 15 terms, using a psuedo-inverse of the Vandermonde matrix.


      def vandermonde(n, m, x, y):
          A = np.zeros((n, m))
          for i in range(n):
              for j in range(m):
                  A[i][j] = x[i]**j
          return np.linalg.pinv(A).dot(y)

      def vandermonde_fit(n, m, x, coef):
          y = np.zeros(n)
          for i in range(n):
              for j in range(m):
                  y[i] += coef[j]*x[i]**j
          return y

      terms = [5, 10, 15]

      for m in terms:
          coef = vandermonde(n, m, x, y)
          y_vand = vandermonde_fit(n, m, x, coef)
          plt.plot(x, y_vand, label=str(m) + ' Terms')

      plt.title("Psuedo-Inverse Vandermonde Fit")
      plt.legend()
      plt.savefig('./plots/function_architecture/vandermonde.png')

Data fit to psuedo-inverse of Vandermonde matrix

(b) Then, we will fit the data with 5, 10, and 15 r³ RBFs uniformly distributed between x = -10 and x = 10.


      def centers(m):
          c = np.zeros(m)
          for i in range(m):
              c[i] = (20)*(np.random.rand()-0.5)
          return c

      def rbf(n, m, x, y, c):
          A = np.zeros((n, m))
          for i in range(n):
              for j in range(m):
                  A[i][j] = np.abs(x[i]-c[j])**3
          return np.linalg.pinv(A).dot(y)

      def rbf_fit(n, m, x, c, coef):
          y = np.zeros(n)
          for i in range(n):
              for j in range(m):
                  y[i] += coef[j]*np.abs(x[i]-c[j])**3
          return y

      plot_data(x, y)

      for m in terms:
          c = centers(m)
          coef = rbf(n, m, x, y, c)
          y_rbf = rbf_fit(n, m, x, c, coef)
          plt.plot(x, y_rbf, label=str(m) + ' Terms')

      plt.title("RBF Fit")
      plt.legend()
      plt.savefig('./plots/function_architecture/rbf.png')

Data fit to r^3 RBFs — Data fit to r³ RBFs

(c) Next, using the coefficients found for these six fits, we will evaluate the total out-of-sample error at $$x = {-10.5,\ -9.5,\ ...,\ 9.5,\ 10.5}.$$


      x_oos, y_oos, n_oos = data(-10.5, 10.5)
      plot_data(x_oos, y_oos)

      for m in terms:
          coef = vandermonde(n_oos, m, x_oos, y_oos)
          y_vand_oos = vandermonde_fit(n_oos, m, x_oos, coef)
          plt.plot(x_oos, y_vand_oos, label=str(m) + ' Terms')

      plt.title("Out-of-Sample Psuedo-Inverse Vandermonde Fit")
      plt.legend()
      plt.savefig('./plots/function_architecture/vandermonde_oos.png')

      plot_data(x_oos, y_oos)

      for m in terms:
          c = centers(m)
          coef = rbf(n_oos, m, x_oos, y_oos, c)
          y_rbf_oos = rbf_fit(n_oos, m, x_oos, c, coef)
          plt.plot(x_oos, y_rbf_oos, label=str(m) + ' Terms')

      plt.title("Out-of-Sample RBF Fit")
      plt.legend()
      plt.savefig('./plots/function_architecture/rbf_oos.png')

Out-of-sample ata fit to psuedo-inverse of Vandermonde matrix — Out-of-sample data fit to psuedo-inverse of Vandermonde matrix

Out-ofsample data fit to r^3 RBFs — Out-of-sample data fit to r³ RBFs

(d) Finally, using a 10th-order polynomial, we wil fit the data with the curvature regularizer in equation (14.53): $$I = \sum_{n=1}^N [y_n - f(x_n, a)]^2 + \lambda \int \bigl[ \frac{d^2 f}{dx^2} \bigr]^2$$ We will then plot the fits for λ = 0, 0.01, 0.1, 1.

Problem 14.3

We will train a neural network on the output from an order 4 maximal LFSR (see Problem 6.3) and learn to reproduce it. We will then analyze how the results depend on the network depth and architecture. First, the code to produce our data:


      import numpy as np

      def generate_data(n, x):
          input = []
          output = []
          for i in range(n):
              input.append(x)
              x_n = x[0] ^ x[3]
              output.append(x_n)
              x = [x_n, x[0], x[1], x[2]]
          return input, output

      seed = [1, 1, 1, 1]
      x_data, y_data = generate_data(16, seed)

      for i in range(len(x_data)):
          print("Input:", x_data[i], "Output:", y_data[i])

This gives us the following data. Note that this data set will repeat indefinitely depending on how many iterations the generate_data(n, x) function runs.


      Input: [1, 1, 1, 1] Output: 0
      Input: [0, 1, 1, 1] Output: 1
      Input: [1, 0, 1, 1] Output: 0
      Input: [0, 1, 0, 1] Output: 1
      Input: [1, 0, 1, 0] Output: 1
      Input: [1, 1, 0, 1] Output: 0
      Input: [0, 1, 1, 0] Output: 0
      Input: [0, 0, 1, 1] Output: 1
      Input: [1, 0, 0, 1] Output: 0
      Input: [0, 1, 0, 0] Output: 0
      Input: [0, 0, 1, 0] Output: 0
      Input: [0, 0, 0, 1] Output: 1
      Input: [1, 0, 0, 0] Output: 1
      Input: [1, 1, 0, 0] Output: 1
      Input: [1, 1, 1, 0] Output: 1
      Input: [1, 1, 1, 1] Output: 0

The following code is my attempt to use Tensorflow to write my own neural network to be trained from the data I generated (I would generate many more sets for the actual training data). It would then learn to predict and reproduce the output data from the true data. Unfortunately, I wasn't able to completely figure out how to get the code to run. I followed the following tutorial to set up this neural network: https://adventuresinmachinelearning.com/python-tensorflow-tutorial/


      import tensorflow.compat.v1 as tf
      import tensorflow_datasets as tfds
      tf.disable_v2_behavior()
      dataset = tfds.load(name="mnist", split=tfds.Split.TRAIN)

      from tensorflow.examples.tutorials.mnist import input_data
      mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

      learning_rate = 0.5
      epochs = 10
      batch_size = 100

      x = tf.placeholder(tf.float32, [None, 4])
      y = tf.placeholder(tf.float32, [None, 2])

      W1 = tf.Variable(tf.random_normal([4, 3], stddev=0.03), name='W1')
      b1 = tf.Variable(tf.random_normal([3]), name='b1')
      W2 = tf.Variable(tf.random_normal([3, 2], stddev=0.03), name='W2')
      b2 = tf.Variable(tf.random_normal([2]), name='b2')

      hidden_out = tf.add(tf.matmul(x, W1), b1)
      hidden_out = tf.nn.relu(hidden_out)

      y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))

      y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
      cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped) + (1 - y) * tf.log(1 - y_clipped), axis=1))

      optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

      init_op = tf.global_variables_initializer()

      correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
      accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

      with tf.Session() as sess:
         sess.run(init_op)
         total_batch = int(len(mnist.train.labels) / batch_size)
         for epoch in range(epochs):
              avg_cost = 0
              for i in range(total_batch):
                  batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
                  _, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y})
                  avg_cost += c / total_batch
              print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
         print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))