What to do when seed has a big impact on model performance?
I have a training procedure set up for an image recognition task. Each time I train a model, I record training loss, validation loss, validation precision and validation recall.
Recently I switched from EfficientNet to a ResNet-based model. Both models are pretrained, so weight initialization is deterministic. With the old model I ran 5 experiments (each on 5-folds) with exactly the same parameters, varying only the seed and got around 0.001 standard deviation for validation loss. With the new model I ran 3 experiments (also with 5 folds) with exactly the same parameters and got a standard deviation of 0.028 for validation loss.
This high variance makes it very hard to interpret the effects of changing non-seed parameters when running new experiments with the new model, since in many cases the differences in performance sit within one standard deviation. Experiments take multiple hours each, so it's not feasible to run multiple experiments for each condition.
What does one do in cases like this?
Topic data-science-model machine-learning
Category Data Science