Imbalanced dataset with 3 classes xgboost scale_pos_weight parameter

The xgboost classifier states the use of parameter scale_pos_weight for 2-class problems.

I have a highly imbalanced dataset with 3 classes. Classes '1' and '-1' are very rare (~1% of dataset) and class '0' is very common.

How do I set this scale_pos_weight parameter in the xgboost classifier correctly for my classification problem?

Topic xgboost python machine-learning

Category Data Science


The parameter scale_pos_weight works for two classes (binary classification).

The parameter weight goes into the xgb.DMatrix function can be used for three or more classes. The weights can be computed like this:

weights = total_samples / (n_classes * class_samples * 1.0)

For my multiclass classification problem with similar unbalanced data I used the output from sklearn compute_class_weight function:

https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html

sklearn.utils.class_weight.compute_class_weight(class_weight, classes, y)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.