If I create a pandas DataFrame using numerical values, this is reflected in the DataFrame. However, if the first element is a string, i.e. 'a'
, the entire DataFrame goes grey and all numbers in it are converted to strings, i.e. 3
becomes '3'
. Why and how to retain datatype diversity?
JavaScript
x
6
1
import numpy as np
2
import pandas as pd
3
4
AA= pd.DataFrame(np.asarray([1,2,3]))
5
AA2 = pd.DataFrame(np.asarray(['a','b',3]))
6
The output is
Advertisement
Answer
First problem is is use np.asarray(['a','b',3]
all data are converting to strings, objects.
JavaScript
1
11
11
1
AA2 = pd.DataFrame(np.asarray(['a','b',3]))
2
print (AA2.dtypes)
3
0 object
4
dtype: object
5
6
print (AA2[0].apply(lambda x: type(x)))
7
0 <class 'str'>
8
1 <class 'str'>
9
2 <class 'str'>
10
Name: 0, dtype: object
11
If pass list get mixed data – numeric with strings:
JavaScript
1
12
12
1
AA2 = pd.DataFrame(['a','b',3])
2
3
print (AA2.dtypes)
4
0 object
5
dtype: object
6
7
print (AA2[0].apply(lambda x: type(x)))
8
0 <class 'str'>
9
1 <class 'str'>
10
2 <class 'int'>
11
Name: 0, dtype: object
12
But working with mixed values is problemtic, most numeric operations failed, so the best is avoid it.