If I create a pandas DataFrame using numerical values, this is reflected in the DataFrame. However, if the first element is a string, i.e. 'a'
, the entire DataFrame goes grey and all numbers in it are converted to strings, i.e. 3
becomes '3'
. Why and how to retain datatype diversity?
import numpy as np import pandas as pd AA= pd.DataFrame(np.asarray([1,2,3])) AA2 = pd.DataFrame(np.asarray(['a','b',3]))
The output is
Advertisement
Answer
First problem is is use np.asarray(['a','b',3]
all data are converting to strings, objects.
AA2 = pd.DataFrame(np.asarray(['a','b',3])) print (AA2.dtypes) 0 object dtype: object print (AA2[0].apply(lambda x: type(x))) 0 <class 'str'> 1 <class 'str'> 2 <class 'str'> Name: 0, dtype: object
If pass list get mixed data – numeric with strings:
AA2 = pd.DataFrame(['a','b',3]) print (AA2.dtypes) 0 object dtype: object print (AA2[0].apply(lambda x: type(x))) 0 <class 'str'> 1 <class 'str'> 2 <class 'int'> Name: 0, dtype: object
But working with mixed values is problemtic, most numeric operations failed, so the best is avoid it.