Slightly expanding the Example 1: Merge on Multiple Columns with Different Names, results in the following Python code using Pandas pandas.DataFrame.merge:
# Create and view first DataFrame:
df1 = pd.DataFrame({'a1': [0, 0, 1, 1, 2],
                    'b': [0, 0, 1, 1, 1],
                    'c': [11, 8, 10, 6, 6]})
print(df1)
print(df1.dtypes)
print()
# Create and view second DataFrame:
df2 = pd.DataFrame({'a2': [0, 1, 1, 1, 3],
                    'b': [0, 0, 0, 1, 1],
                    'd': [22, 24, 25, 33, 37]})
print(df2)
print(df2.dtypes)
print()
# Merge df1 and df2:
df_merge = pd.merge(df1, df2, how='left', left_on=['a1', 'b'], right_on=['a2', 'b'])
print(df_merge)
print(df_merge.dtypes)
The resulting output (I’ve added line numbers):
1 a1 b c 2 0 0 0 11 3 1 0 0 8 4 2 1 1 10 5 3 1 1 6 6 4 2 1 6 7 a1 int64 8 b int64 9 c int64 10 dtype: object 11 12 a2 b d 13 0 0 0 22 14 1 1 0 24 15 2 1 0 25 16 3 1 1 33 17 4 3 1 37 18 a2 int64 19 b int64 20 d int64 21 dtype: object 22 23 a1 b c a2 d 24 0 0 0 11 0.00 22.00 25 1 0 0 8 0.00 22.00 26 2 1 1 10 1.00 33.00 27 3 1 1 6 1.00 33.00 28 4 2 1 6 NaN NaN 29 a1 int64 30 b int64 31 c int64 32 a2 float64 33 d float64 34 dtype: object
Notice the type of a2 and d columns in the resulting df_merge dataframe on lines 24 through 27 have changed from the original int64 to float64.  Why would it need to change the types?
Even the example in the manual at df1.merge(df2, how=’left’, on=’a’) shows a 3.0 where I would have expected it to stay an int64:
df1.merge(df2, how='left', on='a')
      a  b  c
0   foo  1  3.0
1   bar  2  NaN
But it doesn’t explain why.  I see How to left merge two dataframes with nan without changing types from integer to float types indicates that NaN‘s seem to be a factor, but doesn’t answer my specific question here as to why the type conversion happens.
If I change df1 to remove the last row:
df1 = pd.DataFrame({'a1': [0,  0,  1, 1, ],
                    'b':  [0,  0,  1, 1, ],
                    'c':  [11, 8, 10, 6, ]})
Then the output becomes what I would expect:
a1 b c a2 d 0 0 0 11 0 22 1 0 0 8 0 22 2 1 1 10 1 33 3 1 1 6 1 33 a1 int64 b int64 c int64 a2 int64 d int64 dtype: object
Advertisement
Answer
The reason for this is that NaN is of type float.
import numpy as np print(type(np.nan)) <class 'float'>
There are good reasons why this is the case which I can explain in comments if needed.
So when you do the merge and there are missing values filled with NaN then the column type will be automatically changed to float. This is because all rows of that column must be of the same data type.
If you specifically wanted to have int after the merge then you’d need to use the fillna method and define what integers should replace the missing values. E.g. people sometimes use -1 for counts. A simple example:
import numpy as np
import pandas as pd
df = pd.DataFrame({
    'a': [1, 2, 3, np.nan, 5],
    'b': [1, np.nan, 3, 4, 5]
})
df = df.fillna(value=-1).astype(int)
print(df.dtypes)
Result is:
a int64 b int64 dtype: object
