Skip to content
Advertisement

pandas – Merging on string columns not working (bug?)

I’m trying to do a simple merge between two dataframes. These come from two different SQL tables, where the joining keys are strings:

JavaScript

I try to merge them using this:

JavaScript

The result of the inner join is empty, which first prompted me that there might not be any entries in the intersection:

JavaScript

But when I try to match a single element, I see this really odd behavior.

JavaScript

So, the columns are defined with the ‘object’ dtype. Searching for them as strings don’t yield any results. Searching for them as integers does return a result, and I think this is the reason why the merge doesn’t work above..

Any ideas what’s going on?

It’s almost as thought Pandas converts df1.col1 to an integer just because it can, even though it should be treated as a string while matching.

(I tried to replicate this using sample dataframes, but for small examples, I don’t see this behavior. Any suggestions on how I can find a more descriptive example would be appreciated as well.)

Advertisement

Answer

The issue was that the object dtype is misleading. I thought it mean that all items were strings. But apparently, while reading the file pandas was converting some elements to ints, and leaving the remainders as strings.

The solution was to make sure that every field is a string:

JavaScript

Then the merge works as expected.

(I wish there was a way of specifying a dtype of str…)

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement