Skip to content
Advertisement

How to remove quotes from Numeric data in Python

I have one numeric feature in a data frame but in excel some of the values contain quotes which need to be removed.

Below table is what my data appears to be in Excel file now I want to remove quotes from last 3 rows using python.

Col1 Col2
123 A
456 B
789 C
“123” D
“456” E
“789” F

I have used following code in Python:

df["Col1"] = df['Col1'].replace('"', ' ').astype(int)

But above code gives me error message: invalid literal for int() with base 10: ‘”123″‘.

I have also tried strip() function but still it is not working.

If I do not convert the data type and use below code

df["Col1"] = df['Col1'].replace('"', ' ')

Then the code is getting executed without any error however while saving the file into CSV it is still showing quotes.

Advertisement

Answer

One way is to use converter function while reading Excel file. Something along those lines (assuming that data provided is in Excel file in columns ‘A’ and ‘B’):

import pandas as pd

def conversion(value):
    if type(value) == int:
        return value
    else:
        return value.strip('"')

df = pd.read_excel('remove_quotes_excel.xlsx', header=None,
                   converters={0: conversion})

# df
     0  1
0  123  A
1  456  B
2  789  C
3  123  D
4  456  E
5  789  F

Both columns are object type, but now (if needed) it is straightforward to convert to int:

df[0] = df[0].astype(int)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement