I have a DataFrame
with two columns: a column of int
and a column of str
.
- I understand that if I insert
NaN
into theint
column, Pandas will convert all theint
intofloat
because there is noNaN
value for anint
. - However, when I insert
None
into thestr
column, Pandas converts all myint
tofloat
as well. This doesn’t make sense to me – why does the value I put in column 2 affect column 1?
Here’s a simple working example):
import pandas as pd df = pd.DataFrame() df["int"] = pd.Series([], dtype=int) df["str"] = pd.Series([], dtype=str) df.loc[0] = [0, "zero"] print(df) print() df.loc[1] = [1, None] print(df)
The output is:
int str 0 0 zero int str 0 0.0 zero 1 1.0 NaN
Is there any way to make the output the following:
int str 0 0 zero int str 0 0 zero 1 1 NaN
without recasting the first column to int
.
I prefer using
int
instead offloat
because the actual data in that column are integers. If there’s not workaround, I’ll just usefloat
though.I prefer not having to recast because in my actual code, I don’t
store the actualdtype
.I also need the data inserted row-by-row.
Advertisement
Answer
If you set dtype=object
, your series will be able to contain arbitrary data types:
df["int"] = pd.Series([], dtype=object) df["str"] = pd.Series([], dtype=str) df.loc[0] = [0, "zero"] print(df) print() df.loc[1] = [1, None] print(df) int str 0 0 zero 1 NaN NaN int str 0 0 zero 1 1 None