Skip to content
Advertisement

Why does a string cause entire pandas DataFrame to be non-numerical?

If I create a pandas DataFrame using numerical values, this is reflected in the DataFrame. However, if the first element is a string, i.e. 'a', the entire DataFrame goes grey and all numbers in it are converted to strings, i.e. 3 becomes '3'. Why and how to retain datatype diversity?

import numpy as np
import pandas as pd

AA= pd.DataFrame(np.asarray([1,2,3]))
AA2 = pd.DataFrame(np.asarray(['a','b',3]))

The output is

enter image description here

Advertisement

Answer

First problem is is use np.asarray(['a','b',3] all data are converting to strings, objects.

AA2 = pd.DataFrame(np.asarray(['a','b',3]))
print (AA2.dtypes)
0    object
dtype: object

print (AA2[0].apply(lambda x: type(x)))
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
Name: 0, dtype: object

If pass list get mixed data – numeric with strings:

AA2 = pd.DataFrame(['a','b',3])

print (AA2.dtypes)
0    object
dtype: object

print (AA2[0].apply(lambda x: type(x)))
0    <class 'str'>
1    <class 'str'>
2    <class 'int'>
Name: 0, dtype: object

But working with mixed values is problemtic, most numeric operations failed, so the best is avoid it.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement