Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. We provide a systematic review of current methods and then introduce a method based on bi-cross-validation, using randomly held-out submatrices of the data to choose the optimal number of factors. We find it performs better than many existing methods especially when both the number of variables and the sample size are large and some of the factors are relatively weak. Our performance criterion is based on recovery of an underlying signal, equal to the product of the usual factor and loading matrices. Like previous comparisons, our work is simulation based. Recent advances in random matrix theory provide principled choices for the number of factors when the noise is homoscedastic, but not for the heteroscedastic case. The simulations we chose are designed using guidance from random matrix theory. In particular, we include factors which are asymptotically too small to detect, factors large enough to detect but not large enough to improve the estimate, and two classes of factors (weak and strong) large enough to be useful. We also find that a form of early stopping regularization improves the recovery of the signal matrix.