I am trying to delete rows of a df which are not part of an other columns entry from another table. For further explanation: I have a table with transactions including materialnumbers and another table with production information also including materialnumbers. I want to delete every row where a materialnumber is contained which is not in the other table.
My full code is not working. Tho the code is doing what I expect when used on a small sample. See below.
import pandas as pd import numpy as np import os file_path = os.path.realpath(__file__) dic_path = os.path.dirname(file_path) os.chdir(dic_path) df_V = pd.read_excel("V.xlsx", dtype ='str') mn = df_V.MAT print(mn.dtype) mn = mn.drop_duplicates() print(mn) df_L = pd.read_excel("L.xlsx", sheet_name = "Sheet1", dtype ='str') df_LH = df_L.head() print(df_LH) df_LH = df_LH[df_LH.MAT.isin(mn) == True] print(df_LH)
Works as predicted
df_L = df_L[df_L.MAT.isin(mn) == True] df_L.to_excel("correct_L.xlsx") print("done")
both files new_L aswell as L contain the same values though in the head() part some rows get removed.
The Tables can be seen as following:
Table V index MAT Value 1. 1 any 2. 2 any 3. 2 any 4. 3 any Table L index MAT value 1. 1 any 2. 1 any 3. 2 any 4. 3 any 5. 4 any predicted outcome index MAT value 1. 1 any 2. 1 any 3. 2 any 4. 3 any
Many Thanks in advance
Advertisement
Answer
You probably want to be using the merge function in pandas opposed to isin.
The code below is a simple demonstration of how to use the function
We use how='left'
so that only ‘materials’ that are in the left dataframe are included. The on='MAT'
is used to tell pandas to look at this column to decide what should be merged.
import pandas as pd v = pd.DataFrame([[1,9],[2,8],[2,7],[3,6]], columns=['MAT', 'V_vals']) l = pd.DataFrame([[1,5],[1,4],[2,3],[3,2],[4,1]], columns=['MAT', 'M_vals']) print('Table V:n', v) print('Table M:n', l) output = pd.merge(v,l, how='left', on='MAT') print('Merged table:n', output)
This produces the output shown below.
Table V: MAT V_vals 0 1 9 1 2 8 2 2 7 3 3 6 Table M: MAT M_vals 0 1 5 1 1 4 2 2 3 3 3 2 4 4 1 Merged table: MAT V_vals M_vals 0 1 9 5 1 1 9 4 2 2 8 3 3 2 7 3 4 3 6 2