Skip to content
Advertisement

Delete rows based on calculated number

I have a dataframe that defines list of call (call)[List]. each call has an answer status (call)[Status]. I created a column to have a unique field (call)[Key]

Call DataFrame which appears as following:

a =  {'List':['List1','List1','List1','List1','List2','List3','List3','List1','List2','List2', 'List3'], 
      'Status': ['A', 'A', 'DO','A','A', 'A', 'DO','C','A', 'A', 'A'],
      'Key': ['List1 2022-02-09', 'List1 2022-02-09', 'List1 2022-02-09','List1 2022-02-09','List2 2022-02-09', 'List3 2022-02-09', 'List3 2022-02-09','List1 2022-02-10','List2 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10']}


call = pd.DataFrame(data = a)

A seconde Dataframe with a calculated column (DimsDrops)[# drop] ; The join with the (Call)table is done with the (Call)[Key].

DimsDrops table which appears as following:

b = {'Key': ['List1 2022-02-09', 'List2 2022-02-09', 'List3 2022-02-09','List1 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10'], 
     '# drop': [2,1,1,0,1,1]}

DimsDrops = pd.DataFrame(data = b)

I want deleting the rows based on (DimsDrops)[# drop] and with (call)[Status] = “A”

Desired result is the following:

|   List          |  Status  |    Key
--------------------------------------------------
|   List1         |  A        |   List1 2022-02-09
|   List1         |  DO       |   List1 2022-02-09
|   List3         |  DO       |   List3 2022-02-09
|   List1         |  C        |   List1 2022-02-10
|   List2         |  A        |   List2 2022-02-10

By using call.drop( export.query(‘Status = “A” & …).index, inplace= True)), I am blocked how to recover the number of line to delete by making the join with “key”

Advertisement

Answer

EDIT:

First repeat rows by number of deleted rows by Index.repeat and DataFrame.loc and create counter column by GroupBy.cumcount:

df1 = DimsDrops.loc[DimsDrops.index.repeat(DimsDrops['# drop'])]
df1['g'] = df1.groupby('Key').cumcount(ascending=False)
print (df1)
                Key  # drop  g
0  List1 2022-02-09       2  1
0  List1 2022-02-09       2  0
1  List2 2022-02-09       1  0
2  List3 2022-02-09       1  0
4  List2 2022-02-10       1  0
5  List3 2022-02-10       1  0

Then filter only rows with A and create counter in call:

call['g'] = call[call['Status'].eq('A')].groupby('Key').cumcount(ascending=False)
print (call)
     List Status               Key    g
0   List1      A  List1 2022-02-09  2.0
1   List1      A  List1 2022-02-09  1.0
2   List1     DO  List1 2022-02-09  NaN
3   List1      A  List1 2022-02-09  0.0
4   List2      A  List2 2022-02-09  0.0
5   List3      A  List3 2022-02-09  0.0
6   List3     DO  List3 2022-02-09  NaN
7   List1      C  List1 2022-02-10  NaN
8   List2      A  List2 2022-02-10  1.0
9   List2      A  List2 2022-02-10  0.0
10  List3      A  List3 2022-02-10  0.0

Join both DataFrames by left join and indicator=True parameter:

df = call.merge(df1, how='left', indicator=True)
print (df)
     List Status               Key    g  # drop     _merge
0   List1      A  List1 2022-02-09  2.0     NaN  left_only
1   List1      A  List1 2022-02-09  1.0     2.0       both
2   List1     DO  List1 2022-02-09  NaN     NaN  left_only
3   List1      A  List1 2022-02-09  0.0     2.0       both
4   List2      A  List2 2022-02-09  0.0     1.0       both
5   List3      A  List3 2022-02-09  0.0     1.0       both
6   List3     DO  List3 2022-02-09  NaN     NaN  left_only
7   List1      C  List1 2022-02-10  NaN     NaN  left_only
8   List2      A  List2 2022-02-10  1.0     NaN  left_only
9   List2      A  List2 2022-02-10  0.0     1.0       both
10  List3      A  List3 2022-02-10  0.0     1.0       both

So last filter not both rows with remove helper g column:

df = call[df['_merge'].ne('both')].drop('g', axis=1)
print (df)
    List Status               Key
0  List1      A  List1 2022-02-09
2  List1     DO  List1 2022-02-09
6  List3     DO  List3 2022-02-09
7  List1      C  List1 2022-02-10
8  List2      A  List2 2022-02-10
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement