How should I sample my validation set if I randomly sample training data?

Question

How should I sample my validation set if I randomly sample training data?

Heisenbug

2022年4月24日 17:02

I have:

training dataset of size 150k.

validation dataset of size 19k.

At each epoch I randomly sample without replacement 10k datapoints for training because I get Out of Mem Errors.

I need to downsample my validation set too. Which of the following methods seem most appropriate:

Randomly sampling validation set which is x% of 10k and use the same set across every epoch.
Randomly sampling validation set which is x% of 10k at every epoch.

Topic deep-learning dataset

Category Data Science

Predicted Life · Accepted Answer · 2020年11月11日 11:28

Actually you should never use any sampling techniques on your testing/evaluation data because this could lead to wrong classification results. If your dataset is imbalanced you could perform upsampling or downsampling techniques (like SMOTE) on your training data only. If you want to benchmark your multi-class classification you need to rely on e.g. the confusion matrix, recall, precision and F1 measure. Please keep in mind that the accuracy measure cannot be interpreted if you have too imbalanced data.

How should I sample my validation set if I randomly sample training data?

About