Neural Network Hidden Neuron Selection Strategy

I'm trying to determine what is the best number of hidden neurons for my MATLAB neural network. I was thinking to adopt the following strategy:

  • Loop for some values of hidden neurons, e.g. 1 to 40;
  • For each NN with a fixed number of hidden neurons, perform a certain number of training (e.g. 40, limiting the number of epoch for time reasons: I was thinking to doing this because the network seems to be hard to train, the MSE after some epochs is very high)
  • Store the MSE obtained with all the nets with different number of hidden neurons
  • Perform the previous procedure more than 1 time, e.g. 4, to take into account the initial random weight, and take the average of the MSEs
  • Select and perform the "real" training on a NN with a number of hidden neurons such that the MSE previously calculated is minimized

The MSE that I'm referring is the validation MSE: my samples splitting in trainining, testing and validation to avoid overfitting is 70%, 15% and 15% respectively)

Other informations related to my problem are:
fitting problem
9 input neurons
2 output neurons
1630 samples

This strategy could be work? Is there any better criterion to adopt? Thank you

Edit: Test done, so the result suggest me to adopt 12 neurons? (low validation MSE and number of neurons lower than 2*numberOfInputNeurons? but also 18 could be good...

Topic neural-network machine-learning

Category Data Science


Top level:

The rule is to chose the most simple network that can perform satisfactorily. See this publication and its PDF.

The Methodology:

So do your proposed test (training many networks at each number of hidden nodes) and plot the results. At the minimum number of nodes, you'll see the worst performance. As you increase the number of nodes, you'll see an increase in performance (reduction of error). At some point N, you'll see the performance seems to hit an upper limit and increasing nodes beyond this will stop giving significant performance gains. Further increases may start to hurt performance a little as training gets more difficult). That point N is the number of nodes you want.

How it worked for me:

The first time I used this methodology, it created a beautiful almost-sigmoid-looking function with a very clear number of nodes that were needed to achieve good results. I hope this works for you as well as it worked for me.


A rule of thumb approach is:

  • start with a number of hidden neurons equal (or little higher) that the number of features.
  • In your case it would be 9. My suggestion is to start with 9*2 = 18 to cover a wider range of possibilities.
  • Be sure your test and validation sets are selected "fairly": a random selection and varying the seed some number of times to test different configurations would be ok.

In general, a number of neurons equal to the number of features will tend to make each hidden neuron try to learn that special thing that each feature is adding, so will could say it is "learning each feature" separately. Although this sounds good it might tend to overfitting.

Since your number of inputs and your dataset size is small its ok to start with a hidden layer size of the double (18) and start lowering down. When the training error and test error stabilize in a difference lower than a threshold then you could have found a better generalizing model.

Neural networks are very good at finding local optima by exploring deeply a solution from a starting point. However, the starting point it is also very important. If you are not getting a good generalization you might try to find good initial starting points with methods of Hybrid Neural Networks. A common one, for example, is using genetic algorithms to find an initial combination of weights and then start the neural from that point. Given that your search space would be better covered (in case your problem actually needs that).

As for every problem in machine learning is very important to clean your data before introducing it to the NN. Try to be very detailed in order to avoid the NN to learn things you already know. For example if you know how two features are correlated improve the input data by making this correlation explicit so less workload is given to the NN (that might actually get you in trouble).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.