Normal Equation

numpy.linalg.pinv(x.transpose * x) * x.transpose * y

Gradient Descent vs. Normal Equation
The latter migh work faster but only if the number of features is small. n = 10,000 might be the limit, depending on the computer power.

Noninvertibility
Redundant features: If two features are linearly dependent then the matrix is noninvertable (e.g. area in square mater and square feet)
Too many features (m <= n) - delete some features or use regularization