Skip to content
Advertisement

Pandas Selection of rows not working propelry

I am trying to delete rows of a df which are not part of an other columns entry from another table. For further explanation: I have a table with transactions including materialnumbers and another table with production information also including materialnumbers. I want to delete every row where a materialnumber is contained which is not in the other table.

My full code is not working. Tho the code is doing what I expect when used on a small sample. See below.

import pandas as pd
import numpy as np
import os

file_path = os.path.realpath(__file__)
dic_path = os.path.dirname(file_path)

os.chdir(dic_path)

df_V = pd.read_excel("V.xlsx", dtype ='str')

mn = df_V.MAT
print(mn.dtype)
mn = mn.drop_duplicates()
print(mn)

df_L = pd.read_excel("L.xlsx", sheet_name = "Sheet1", dtype ='str')

df_LH = df_L.head()

print(df_LH)

df_LH = df_LH[df_LH.MAT.isin(mn) == True]

print(df_LH)

Works as predicted

df_L = df_L[df_L.MAT.isin(mn) == True]

df_L.to_excel("correct_L.xlsx")

print("done")

both files new_L aswell as L contain the same values though in the head() part some rows get removed.

The Tables can be seen as following:

Table V

index MAT Value
1.    1   any
2.    2   any
3.    2   any
4.    3   any

Table L
index MAT value
1.    1   any
2.    1   any
3.    2   any
4.    3   any
5.    4   any

predicted outcome

index MAT value
1.    1   any
2.    1   any
3.    2   any
4.    3   any

Many Thanks in advance

Advertisement

Answer

You probably want to be using the merge function in pandas opposed to isin.

The code below is a simple demonstration of how to use the function

We use how='left' so that only ‘materials’ that are in the left dataframe are included. The on='MAT' is used to tell pandas to look at this column to decide what should be merged.

    import pandas as pd
    v = pd.DataFrame([[1,9],[2,8],[2,7],[3,6]], columns=['MAT', 'V_vals'])
    l = pd.DataFrame([[1,5],[1,4],[2,3],[3,2],[4,1]], columns=['MAT', 'M_vals'])
    print('Table V:n', v)
    print('Table M:n', l)
    output = pd.merge(v,l, how='left', on='MAT')
    print('Merged table:n', output)

This produces the output shown below.

Table V:
    MAT  V_vals
0    1       9
1    2       8
2    2       7
3    3       6
Table M:
    MAT  M_vals
0    1       5
1    1       4
2    2       3
3    3       2
4    4       1
Merged table:
    MAT  V_vals  M_vals
0    1       9       5
1    1       9       4
2    2       8       3
3    2       7       3
4    3       6       2

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement