Imbalanced Binary Dataset in Keras. Finding the best threshold after fit s.t. sensitivity and specificity is maximized?

I have made an ANN in Keras that works on an imbalanced binary dataset. The data is used after fitting the model to predict the binary classes and I want to choose a threshold s.t. sensitivity and specificity is maximized.

This is the code that I am using right now, iterating through all thresholds from 0-1 and finding the best one using G-mean score.

predictions = model_p.predict(Xt)
thresholds = arange(0, 1, 0.001)
threshold = -1
best_Gscore = 0
false_positive = 0
true_positive = 0
false_negative = 0
true_negative = 0

    for z in thresholds:
        print(Threshold = %f  % (z))
        fp = 0
        fn = 0
        tp = 0
        tn = 0
        for i in range(len(yt)):
            if( yt[i] == 0 and predictions[i]  z ):
                fp += 1
            elif( yt[i] == 1 and predictions[i]  z ):
                tp += 1
            elif( yt[i] == 1 and predictions[i] = z ):
                fn += 1
            elif( yt[i] == 0 and predictions[i] = z ):
                tn += 1
        
        if( (tp+fn) == 0):
            continue
        if( (tn+fp) == 0):
            continue
        TPR = fp / (fp + tn)
        #sens = tp / (tp + fn)
        #spec = tn / (tn + fp)
        FPR = tp / (tp + fn)
        Gscore = math.sqrt(TPR*(1-FPR))

        print(J Stat = %f  % (Gscore), flush=True)

        if( Gscore  best_Gscore ):
            best_Gscore = Gscore
            false_positive = fp
            false_negative = fn
            true_positive = tp
            true_negative = tn
            threshold = z

But is there a better way to maximize sens and spec? Perhaps finding a sens and spec suchs that

| sens - spec |  0.05 and sens*spec  score_max

Then once this score_max is found you can run through smaller jumps for like +- 0.2 on both? Or is there another way to find sensitivity and specificity maximum?

Topic keras tensorflow python

Category Data Science


It's impossible in general to optimize both sensitivity and specificity, in the sense of finding a threshold for which sensitivity is maximum and specificity is maximum:

  • sensitivity is high when TP is high and FN is low
  • specificity is high when TN is high and FP is low

But since:

  • When the threshold is increased, more instances are predicted negative so TN and FN increase, TP and FP decrease.
  • When the threshold is decreased, more instances are predicted positive so TP and FP increase, TN and FN decrease.

Therefore one cannot have the lowest possible FP and the lowest possible FN at the same time.

In other words, max sensitivity is when all instances are predicted positive while max specificity is when all instances are predicted negative. Clearly both are not compatible.

Instead one can only optimize a combination of both, similarly to F-score which is the harmonic mean of precision (related to specificity) and recall (sensitivity).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.