The results below show the accuracy. It reveals that increasing the neurons will improve accuracy but will require more epochs to reach accuracy and more computation; where as it can give you better results in complex systems. Increasing the number of layers affect the hypothesis.The result from this hypothesis is used in cost function. Sigmoid is added as non linearity because it is best suited for less layers. Gradient descent minimizes the cost and derivatives are computed by tensorflow itself.