Statistically increase mean difference between two data sets?

Question

Statistically increase mean difference between two data sets?

PeJota

2022年5月5日 21:19

A dataset will be used to train a binary classification model. For better understanding/visualization, the data set was divided into 2:

one set with all the rows that result in prediction value of 1
another set with all the rows that result in prediction value of 0

Comparing both datasets, there are small, but expected differences in many of the features. For example, a lower mean or median.

Is there a sound way to statistically enhance the data to make the differences more visible to a model? For example, going from this small difference in the mean (the red dotted line):

To something where the mean difference is much more pronounced? (Please excuse the crudeness of the drawing)

I've thought about applying a logarithmic transformation, but since my data behavior is not exponential, I wanted to corroborate any possible approaches beforehand with the community.

Let me know if my question needs clarification. Thanks :D

Topic classification statistics machine-learning

Category Data Science

Statistically increase mean difference between two data sets?

About