Skip to content
Advertisement

Pandas merge stop at first match like vlookup instead of duplicating

I have two tables, PO data and commodity code data. Some genius decided that some material group codes should be the same as they are differentiated at a lower level by GL accounts. Because of that, I can’t merge on material groups, as I’ll get duplicate rows.

Assume the following:

JavaScript

As you can see, PO 123456 shows up twice, as there are multiple rows for material 1001 in the material groups table.

The desired behavior is that merge only merges once, finds the first entry for the material group, adds it, and nothing else, like how vlookup works. The long commodity code might be incorrect in some cases (always showing the first one), that’s an acceptable inaccuracy.

ps.: while suggestions are welcome how to tackle this problem outside of the scope of this question (like merging on GL accounts, which is not feasible for other reasons) assume the following: The available data is a PO list from SAP ME81N and an Excel file with the list of material groups/commodity codes.

Advertisement

Answer

pandas’ merge behaves (mostly) like a SQL merge and will provide all combinations of matching keys. If you only want the first item, simply remove it from the data you feed to merge.

Use drop_duplicates on mat_grp:

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement