ValueError: Expected 2D array, got 1D array instead

I would like to extract the 20 most informative features of a very large set of features $X$ coming from a dataset containing clinical data by using the RFE function from scikit-learn in Python.

$X$ is a 68 x 1140 matrix where

  • Each row represents a recorded session.
  • For each subject, there are 4 recorded sessions.
  • Then, there are 17 subjects in the dataset.

My idea is to use 70% of the dataset (i.e. 70% x 1140 random features from each recording) and extract 50 features out of the whole dataset.

$Y$ represents a ranking from 0 to 2.

In other words, my data looks like this:

And my implementation in the code is the following:

## X = features
## Y = labels
p = 0.7
n_perc = round(X.shape[1]*X.shape[0]*p) #70% of the data - number of elements (height x width x 70%)
rand_idx = np.random.randint(X.shape[1]*X.shape[0], size=n_perc) #random indices (70% of the data)
X_rnd = X.flatten()[rand_idx] #select that 70% in X
Y_rnd = np.repeat(Y,round(X.shape[1]*p)) #we match the dimensions for X_rnd - Y_rnd
selector = RFE(estimator=LogisticRegression(C=1),n_features_to_select=20) #run RFE
selector.fit(X_rnd.reshape,Y_rnd) #select best features

The idea is that I flatten all the values from X and I get only 70% random elements from $X$, i.e $X_{rnd}$ (and also adapt $Y$ accordingly, i.e. $Y_{rnd}$).

ValueError: Expected 2D array, got 1D array instead:
array=[-0.25367578  0.8069118  -0.63161352 ...  0.5500815  -0.37418711 
0.2580666 ]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

But some reason I'm getting this error, which I don't understand. It says that I should reshape the array if I have either one feature or one sample, but it's not my case.

Does anybody know what I should do? Is this how I should approach the problem? Should I reshape $X$ in another manner?

Thanks.

Topic scikit-learn python

Category Data Science


selector.fit(X_rnd.reshape,Y_rnd) #select best features

X_rnd should be a 2D matrix.

I believe this error came when "fit" method was executed with X_rnd but your code is showing X_rnd.reshape. In this case, error should be -

    ValueError: Expected 2D array, got scalar array instead:
    array=<built-in method reshape of numpy.ndarray object at 0x000000000C28EAD0>.
    Reshape your data either using array.reshape(-1, 1) if your data has a single                         feature or array.reshape(1, -1) if it contains a single sample.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.