What could be the problem leading to the result that a model can never perfectly overfit?
I tried to fit my model on a small batch of 128 samples for binary classification. The model should be powerful enough as it has hundreds of thousands of parameters. It should be able to overfit to 100% accuracy. However, it only fts to 96% for the best. It is about the same as when I train it on 30,000 samples. So, I tried the following but all failed:
use a smaller batch of 16 samples, it still cannot overfit
use different optimizers, including Adam, SGD, Adagrad, even reset the optimizer every 1,000 epochs, not working
every epoch, only train on the samples that are misclassified, not working.
The problem should be with this network since another more basic neural network can 100% fit. This one can only 99.2% fit. The top layer is indeed sigmoid.
Anyone got any idea what could be the problem?
Topic machine-learning
Category Data Science