How to Implement a Custom Loss Function with Keras for a Sparse Dataset

Question

How to Implement a Custom Loss Function with Keras for a Sparse Dataset

Philippe Fanaro

2022年5月6日 15:02

My dataset is composed of an idle system that, at some time instants, receives requests. I'm trying to predict these instants through a clock. Since the requests are sparsely distributed (I've forced them to last for a while so they don't get too sparse), I wanted to create a new loss function that would penalize the model if it only gives out a zero prediction for everything. My implementation attempt is just a penalty for the standard logits:

def sparse_penalty_logits(y_true, y_pred):
    penalty = 10
    if y_true != 0:
        loss = -penalty*K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))
    else:
        loss = -K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))

    return loss

Is it correct? (I have also tried it with tensorflow). Every time I run it I either get a lot of NaN's as the loss or predictions that are not binary at all. I wonder if I'm doing something wrong at setting up the model also because binary_crossentropy is not working properly either. My model is something like this (the targets are represented by a column with either 0's or 1's):

model = Sequential()
model.add(Dense(100, activation = 'relu', input_shape = (train.shape[1],)))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'adam', loss = sparse_penalty_logits)

If I run it, as I said, I get very strange results (boy, do I feel like I've messed up real bad...):

Topic keras tensorflow

Category Data Science

thanatoz · Accepted Answer · 2018年10月4日 10:36

From the mentioned problems you are facing, this seems like a problem of exploding gradients. The exploding gradients problem can be identified by:

The model is unable to get traction on your training data (e.g. poor loss).
The model is unstable, resulting in large changes in loss from update to update.
The model loss goes to NaN during training.

More about Exploding gradient problem can be found at this article

I would suggest you to use some gradient clipping technique in you code and this will remove the NaN generation during model training.

How to Implement a Custom Loss Function with Keras for a Sparse Dataset

About