How to resample

I have to deal with a small dataset. I thought that I maght take advantage of resamplin methods to enlarge the population and improve the performance of my regression algorithm. I heard about SMOTE, but it is used for classification in imbalanced datasets. Is there any method to create synthetic data of a small size dataset? Thanks.

Topic regression python

Category Data Science


please check out library imbalanced-learn (python). Have you some example of code:

#assuming that you have X and y

from imblearn.over_sampling import SMOTE

smote = SMOTE(ratio='minority')
X_sm, y_sm = smote.fit_sample(X, y)

Documentation:

https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.under_sampling.TomekLinks.html

https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html


As suggested by others, use oversampling SMOTE, it will balance your data set and create more examples(based on neighbors) equals to your majority class.


You can also try the GAN's to generate some pseudo data.


About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.