Gradient Descent - Learning Rate
- Draw the graph of the value of the cost function as a function of the number of iterations in gradient descent.
- It should have a dowwards slope, but after a while its descent might slow down. (It is hard to tell how many iterations it will take.)
- If the convergence is some small (e.g. less than 1/1000 or epsylon, but it might be difficult to choose this number)
- If it is increasing than probably the learning rate is too big and it will never converge. (Fix is to use smaller learning rate.)