Normal Equation
- An analytical way to find the best function
numpy.linalg.pinv(x.transpose * x) * x.transpose * y
- Gradient Descent vs. Normal Equation
- The latter migh work faster but only if the number of features is small. n = 10,000 might be the limit, depending on the computer power.
- Noninvertibility
- Redundant features: If two features are linearly dependent then the matrix is noninvertable (e.g. area in square mater and square feet)
- Too many features (m <= n) - delete some features or use regularization