How to Implement a Custom Loss Function with Keras for a Sparse Dataset

My dataset is composed of an idle system that, at some time instants, receives requests. I'm trying to predict these instants through a clock. Since the requests are sparsely distributed (I've forced them to last for a while so they don't get too sparse), I wanted to create a new loss function that would penalize the model if it only gives out a zero prediction for everything. My implementation attempt is just a penalty for the standard logits:

def sparse_penalty_logits(y_true, y_pred):
    penalty = 10
    if y_true != 0:
        loss = -penalty*K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))
    else:
        loss = -K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))

    return loss

Is it correct? (I have also tried it with tensorflow). Every time I run it I either get a lot of NaN's as the loss or predictions that are not binary at all. I wonder if I'm doing something wrong at setting up the model also because binary_crossentropy is not working properly either. My model is something like this (the targets are represented by a column with either 0's or 1's):

model = Sequential()
model.add(Dense(100, activation = 'relu', input_shape = (train.shape[1],)))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'adam', loss = sparse_penalty_logits)

If I run it, as I said, I get very strange results (boy, do I feel like I've messed up real bad...):

Topic keras tensorflow

Category Data Science


From the mentioned problems you are facing, this seems like a problem of exploding gradients. The exploding gradients problem can be identified by:

  • The model is unable to get traction on your training data (e.g. poor loss).
  • The model is unstable, resulting in large changes in loss from update to update.
  • The model loss goes to NaN during training.

More about Exploding gradient problem can be found at this article

I would suggest you to use some gradient clipping technique in you code and this will remove the NaN generation during model training.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.