Dataframe has no column names. How to add a header?

I am using a dataset to practice for building a decision tree classifier.

Here is my code:

import pandas as pd 
tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', header=0)
tdf.info()

The column has no name, and i have problem to add the column name, already tried reindex, pd.melt, rename, etc.

The column names Ι want to assign are:

  1. Sample code number: id number
  2. Clump Thickness: 1 - 10
  3. Uniformity of Cell Size: 1 - 10
  4. Uniformity of Cell Shape: 1 - 10
  5. Marginal Adhesion: 1 - 10
  6. Single Epithelial Cell Size: 1 - 10
  7. Bare Nuclei: 1 - 10
  8. Bland Chromatin: 1 - 10
  9. Normal Nucleoli: 1 - 10
  10. Mitoses: 1 - 10
  11. Class: (2 for benign, 4 for malignant)

Thanks,

Topic pandas python

Category Data Science


For any dataframe, say df , you can add/modify column names by passing the column names in a list to the df.columns method: For example, if you want the column names to be 'A', 'B', 'C', 'D'],use this:

df.columns = ['A', 'B', 'C', 'D']

In your code , can you remove header=0? This basically tells pandas to take the first row as the column headers . Once you remove that, use the above to assign the column names.


df = pd.read_csv("Price Data.csv", names=['Date', 'Price'])

use the names field to add a header to your pandas dataframe.


I tried the code above and you are missing the first line of data.

1. original

tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', header=0)
tdf.shape

(698, 11)

2. as the previous questions, removing header=0

tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',')
tdf.shape

(698, 11)

3. new answer, adding column names while reading csv, does get all the rows

 tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', names=['Sample code number: id number','Clump Thickness: 1 - 10','Uniformity of Cell Size: 1 - 10','Uniformity of Cell Shape: 1 - 10','Marginal Adhesion: 1 - 10','Single Epithelial Cell Size: 1 - 10','Bare Nuclei: 1 - 10','Bland Chromatin: 1 - 10','Normal Nucleoli: 1 - 10','Mitoses: 1 - 10','Class: (2 for benign, 4 for malignant)'])  
    tdf.shape

(699, 11)

You can assign the names of the columns when reading the csv file

import pandas as pd 
tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', names=['Sample code number: id number','Clump Thickness: 1 - 10','Uniformity of Cell Size: 1 - 10','Uniformity of Cell Shape: 1 - 10','Marginal Adhesion: 1 - 10','Single Epithelial Cell Size: 1 - 10','Bare Nuclei: 1 - 10','Bland Chromatin: 1 - 10','Normal Nucleoli: 1 - 10','Mitoses: 1 - 10','Class: (2 for benign, 4 for malignant)'])

You can check the dataframe using

tdf.head()

and you get

enter image description here

You can check the code on https://gist.github.com/e94b31914dbaebda7d11f6bfe0cfbdec

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.