Decide threshold for each class for optimal precision/recall in a multi-class classification problem

Say I have three classes $C_1$,$C_2$, $C_3$ and a model $M$ which outputs a score $P$ for the confidence of each class for a sample $X$ i.e $M(X)=[P(C_1),P(C_2),P(C_3)]$ (note, we only want to predict one class)

Say I have created 3 one-vs-rest precision/recall plots and I decide that the optimal thresholds for each class are

$T_1 = 0.6$

$T_2 = 0.7$

$T_3 = 0.5$

We can then create the logic of assigning $X$ to a class like so:

If the index, $i$, of the biggest score of $M$ is greater than or equal to $T_i$, assign $X$ to $C_i$. Else, don't assign $X$ to anything, see the two examples below for two input of $X$:

$M(X_1) = [0.8,0.1,0.1] \rightarrow C_1\quad$ since the biggest socre is $0.8$ which is for class 1 and $T_10.8$

$M(X_2) = [0.3,0.6,0.1] \rightarrow \text{None}\quad$ since the biggest score is $0.6$ which is for class 2, but $T_20.6$

But, something tells me that in this way we don't preserve the optimal precision/recall for each class as we used to decide the thresholds at the first place, so my questions are:

  1. Can we have dynamic thresholds for a multi-class classification i.e a threshold for each class which preserves the optimal precision/recall for all classes at the same time?
  2. Is there a better way to decide thresholds for multi-class classification as per my problem above, when we want to control the precision/recall for each class

EDIT:

Say I have the following results for my validation-set:

conf  pred  target
----+------+-------
0.9    C1     C1
0.8    C1     C1
0.76   C1     C2
.
.
0.93   C2     C2
0.9    C2     C2
0.83   C2     C3
.
.

wouldn't this overcome the issue about the one-vs-rest example, since we now have the confidence when all three classes are involved and not 3x one-vs-rest?

Topic multiclass-classification

Category Data Science


I think there's a confusion between multi-class and multi-label classification:

  • In multi-class classification, every instance has a single label. This means that the classifier returns the single most likely label among the possible labels, not all the labels which may apply. In probabilistic terms, this implies that the output probabilities sum to 1 over all the classes.
  • In multi-label classification, every instance can have any number of labels (including no label at all). This is equivalent to training an independent model for each class. In probabilistic terms, every class probability $p$ represents the likelihood that the instance has this label as opposed to not having it.

Applying a custom threshold for each class in the multi-class case means that some instances may have zero or more than one class, i.e. it transforms the problem to multi-label.

So the problem should be clearly defined from the start:

  • If it's multi-class, then there's no way to use a custom threshold for every class, the most likely class is always the only one assigned.
  • If it's multi-label, then the classifiers are independent and custom thresholds can be used. But the problem is different, and in theory the training data should be consistent and contain instances with zero or multiple labels.

[edit following OP's comment]

I'm talking about multi-label because practically what you want to do is some kind of hybrid multi-class/multi-label classification. For example, in multi-class it's impossible for an instance not to have any label, as in your $X_2$ example, because there's always a most likely class ($C_2$ in your example).

It's important to understand that in one-vs-rest multi-class classification the predicted score cannot be interpreted independently, the classifier only knows how to distinguish between the possible classes. For example if one classifies images in three classes dog, fish and plant, an image of an elephant would be predicted as class dog with a very high probability because it's the closest. Whereas if one classifies dog vs. anything else, the probability of dog should be very low for the same image of elephant.

Using custom thresholds by class "breaks" the dependency that exists between the predicted scores for an instance, it's likely to cause a bias which advantages some class at the expense of the others.

Also I think that logically the possibility of zero label implies that multiple labels should also be possible, but I'm not sure that this is an important point.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.