I have a DataFrame with two columns: a column of int and a column of str. I understand that if I insert NaN into the int column, Pandas will convert all the int into float because there is no NaN value for an int. However, when I insert None into the str column, Pandas converts all my int to float

Stop Pandas from converting int to float due to an insertion in another column

I have a DataFrame with two columns: a column of int and a column of str.

I understand that if I insert NaN into the int column, Pandas will convert all the int into float because there is no NaN value for an int.
However, when I insert None into the str column, Pandas converts all my int to float as well. This doesn’t make sense to me – why does the value I put in column 2 affect column 1?

Here’s a simple working example):

import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)

df.loc[0] = [0, "zero"]
print(df)
print()

df.loc[1] = [1, None]
print(df)

JavaScript
​x
 
import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)
​
df.loc[0] = [0, "zero"]
print(df)
print()
​
df.loc[1] = [1, None]
print(df)
​

The output is:

   int   str
0    0  zero

   int   str
0  0.0  zero
1  1.0   NaN

JavaScript
 
   int   str
0    0  zero
​
   int   str
0  0.0  zero
1  1.0   NaN
​

Is there any way to make the output the following:

   int   str
0    0  zero

   int   str
0    0  zero
1    1   NaN

JavaScript
 
   int   str
0    0  zero
​
   int   str
0    0  zero
1    1   NaN
​

without recasting the first column to int.

I prefer using int instead of float because the actual data in that column are integers. If there’s not workaround, I’ll just use float though.
I prefer not having to recast because in my actual code, I don’t
store the actual dtype.
I also need the data inserted row-by-row.

Answer

If you set dtype=object, your series will be able to contain arbitrary data types:

df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)

   int   str
0    0  zero
1  NaN   NaN

  int   str
0   0  zero
1   1  None

JavaScript
 
df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
​
   int   str
0    0  zero
1  NaN   NaN
​
  int   str
0   0  zero
1   1  None
​

Advertisement

Answer