Is my dataset unbalanced?

So I'm trying to implement Nvidia's end to end driving paper to simply have an agent in Carla follow the lanes. I'm trying to predict the steering angle of the car based on the RGB images from the front camera.

I'm getting and r_score of about 1.20 %

Every image in the dataset has a steering angle.

So here are plots for the distribution of steering angles.

I've also augmented the images but when I run it, the car still fails to follow the lane.

I'm being wrong or should I take another approach? Thanks :)!

Topic self-driving visualization dataset python machine-learning

Category Data Science


To answer your question no.

The term "imbalance" usually refers to classification problems. For your case, i.e. a regression problem you can only look at the distribution of your target variable.

If by "balance" you mean them having a uniform distribution, you could argue that they are, if fact imbalanced. However, I'd argue that this is not the problem here. When steering you rarely need an extreme angle, I think this dataset represents this well.

Your goal is also a bit unclear. Are you trying to "predict the steering angle" or "follow the lane"? If it is the first, I'd first suggest changing your metric. Use one that tells you actually how well you are doing, e.g. MSE, MAE.

A lot of things could be to blame for a poor performance. I don't thing the distribution of the angles is one of them.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.