could not convert string to float: 'YELLOW'

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

#read data
data=pd.read_csv('adult-stretch.data', header=None)

#convert to arrays
x=data.iloc[:, :4].to_numpy()
t=data[4].replace(['ADULT','STRETCH'],[0,1])

t=t.to_numpy()

#split the dataset
xTrain, xTest, tTrain, tTest = train_test_split(x, t, test_size=0.2, random_state=3)
#split - 

#create the model/net
net=MLPClassifier(hidden_layer_sizes=(2,), max_iter=4000, random_state=0) 

 #model training
net=net.fit(xTrain,tTrain)   

#model run/testing for TRAIN
yTrain=net.predict(xTrain)    

accuracyTrain=accuracy_score(tTrain,yTrain)    #accuracy!!
print('Train accuracy is ',accuracyTrain)

#model run/testing for TEST
yTest=net.predict(xTest)    

accuracyTest=accuracy_score(tTest,yTest)    #accuracy for test
print('Train accuracy is ',accuracyTest)

M=confusion_matrix(tTest,yTest)
print(Confusion matrix= ) 
print (M)

Topic python machine-learning

Category Data Science


I guess this is the data you are working on. Attribute Information: (Classes Inflated T or F) Color yellow, purple size large, small act stretch, dip age adult, child inflated T, F

ML models have an understanding of vectors only. As the data has categorical features, need to encode them and apply the ML models. That is the reason you encounter ValueError: could not convert string to float: 'YELLOW'


This is probably because your dataset has categorical values which need to be converted into numerical values before being fed into the model.

This is what the error is saying. You have to preprocess the data before training it and one of the steps in preprocessing is to encode the categorical values. There are many methods for that. You should start with the basic OneHotEncoder.

Cheers!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.