I am trying to delete rows of a df which are not part of an other columns entry from another table. For further explanation: I have a table with transactions including materialnumbers and another table with production information also including materialnumbers. I want to delete every row where a materialnumber is contained which is not in the other table.
My full code is not working. Tho the code is doing what I expect when used on a small sample. See below.
import pandas as pd
import numpy as np
import os
file_path = os.path.realpath(__file__)
dic_path = os.path.dirname(file_path)
os.chdir(dic_path)
df_V = pd.read_excel("V.xlsx", dtype ='str')
mn = df_V.MAT
print(mn.dtype)
mn = mn.drop_duplicates()
print(mn)
df_L = pd.read_excel("L.xlsx", sheet_name = "Sheet1", dtype ='str')
df_LH = df_L.head()
print(df_LH)
df_LH = df_LH[df_LH.MAT.isin(mn) == True]
print(df_LH)
Works as predicted
df_L = df_L[df_L.MAT.isin(mn) == True]
df_L.to_excel("correct_L.xlsx")
print("done")
both files new_L aswell as L contain the same values though in the head() part some rows get removed.
The Tables can be seen as following:
Table V
index MAT Value
1. 1 any
2. 2 any
3. 2 any
4. 3 any
Table L
index MAT value
1. 1 any
2. 1 any
3. 2 any
4. 3 any
5. 4 any
predicted outcome
index MAT value
1. 1 any
2. 1 any
3. 2 any
4. 3 any
Many Thanks in advance
Advertisement
Answer
You probably want to be using the merge function in pandas opposed to isin.
The code below is a simple demonstration of how to use the function
We use how='left'
so that only ‘materials’ that are in the left dataframe are included. The on='MAT'
is used to tell pandas to look at this column to decide what should be merged.
import pandas as pd
v = pd.DataFrame([[1,9],[2,8],[2,7],[3,6]], columns=['MAT', 'V_vals'])
l = pd.DataFrame([[1,5],[1,4],[2,3],[3,2],[4,1]], columns=['MAT', 'M_vals'])
print('Table V:n', v)
print('Table M:n', l)
output = pd.merge(v,l, how='left', on='MAT')
print('Merged table:n', output)
This produces the output shown below.
Table V:
MAT V_vals
0 1 9
1 2 8
2 2 7
3 3 6
Table M:
MAT M_vals
0 1 5
1 1 4
2 2 3
3 3 2
4 4 1
Merged table:
MAT V_vals M_vals
0 1 9 5
1 1 9 4
2 2 8 3
3 2 7 3
4 3 6 2