Statistically increase mean difference between two data sets?
A dataset will be used to train a binary classification model. For better understanding/visualization, the data set was divided into 2:
- one set with all the rows that result in prediction value of 1
- another set with all the rows that result in prediction value of 0
Comparing both datasets, there are small, but expected differences in many of the features. For example, a lower mean or median.
Is there a sound way to statistically enhance the data to make the differences more visible to a model?
For example, going from this small difference in the mean (the red dotted line):
To something where the mean difference is much more pronounced? (Please excuse the crudeness of the drawing)
I've thought about applying a logarithmic transformation, but since my data behavior is not exponential, I wanted to corroborate any possible approaches beforehand with the community.
Let me know if my question needs clarification. Thanks :D
Topic classification statistics machine-learning
Category Data Science