how to deal with columns that has different value in only 1 or 2 rows?
I have very high dimensional data. Almost 20% of the columns has different value in less than 1% of rows. All of these are binary columns and many columns has 0s filled in more than almost 98% of rows.
Some more info:
Target variable is an imbalanced(91.9%:8.1%) binary variable.
Every variable I have, except 3, are binary.
I would like some ideas on how to deal with columns like this? drop them or smote to have more data?
Thanks in advance.
Topic data-cleaning data-mining machine-learning
Category Data Science