I have a dataframe that defines list of call (call)[List]. each call has an answer status (call)[Status]. I created a column to have a unique field (call)[Key]
Call DataFrame which appears as following:
a =  {'List':['List1','List1','List1','List1','List2','List3','List3','List1','List2','List2', 'List3'], 
      'Status': ['A', 'A', 'DO','A','A', 'A', 'DO','C','A', 'A', 'A'],
      'Key': ['List1 2022-02-09', 'List1 2022-02-09', 'List1 2022-02-09','List1 2022-02-09','List2 2022-02-09', 'List3 2022-02-09', 'List3 2022-02-09','List1 2022-02-10','List2 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10']}
call = pd.DataFrame(data = a)
A seconde Dataframe with a calculated column (DimsDrops)[# drop] ; The join with the (Call)table is done with the (Call)[Key].
DimsDrops table which appears as following:
b = {'Key': ['List1 2022-02-09', 'List2 2022-02-09', 'List3 2022-02-09','List1 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10'], 
     '# drop': [2,1,1,0,1,1]}
DimsDrops = pd.DataFrame(data = b)
I want deleting the rows based on (DimsDrops)[# drop] and with (call)[Status] = “A”
Desired result is the following:
| List | Status | Key -------------------------------------------------- | List1 | A | List1 2022-02-09 | List1 | DO | List1 2022-02-09 | List3 | DO | List3 2022-02-09 | List1 | C | List1 2022-02-10 | List2 | A | List2 2022-02-10
By using call.drop( export.query(‘Status = “A” & …).index, inplace= True)), I am blocked how to recover the number of line to delete by making the join with “key”
Advertisement
Answer
EDIT:
First repeat rows by number of deleted rows by Index.repeat and DataFrame.loc and create counter column by GroupBy.cumcount:
df1 = DimsDrops.loc[DimsDrops.index.repeat(DimsDrops['# drop'])]
df1['g'] = df1.groupby('Key').cumcount(ascending=False)
print (df1)
                Key  # drop  g
0  List1 2022-02-09       2  1
0  List1 2022-02-09       2  0
1  List2 2022-02-09       1  0
2  List3 2022-02-09       1  0
4  List2 2022-02-10       1  0
5  List3 2022-02-10       1  0
Then filter only rows with A and create counter in call:
call['g'] = call[call['Status'].eq('A')].groupby('Key').cumcount(ascending=False)
print (call)
     List Status               Key    g
0   List1      A  List1 2022-02-09  2.0
1   List1      A  List1 2022-02-09  1.0
2   List1     DO  List1 2022-02-09  NaN
3   List1      A  List1 2022-02-09  0.0
4   List2      A  List2 2022-02-09  0.0
5   List3      A  List3 2022-02-09  0.0
6   List3     DO  List3 2022-02-09  NaN
7   List1      C  List1 2022-02-10  NaN
8   List2      A  List2 2022-02-10  1.0
9   List2      A  List2 2022-02-10  0.0
10  List3      A  List3 2022-02-10  0.0
Join both DataFrames by left join and indicator=True parameter:
df = call.merge(df1, how='left', indicator=True)
print (df)
     List Status               Key    g  # drop     _merge
0   List1      A  List1 2022-02-09  2.0     NaN  left_only
1   List1      A  List1 2022-02-09  1.0     2.0       both
2   List1     DO  List1 2022-02-09  NaN     NaN  left_only
3   List1      A  List1 2022-02-09  0.0     2.0       both
4   List2      A  List2 2022-02-09  0.0     1.0       both
5   List3      A  List3 2022-02-09  0.0     1.0       both
6   List3     DO  List3 2022-02-09  NaN     NaN  left_only
7   List1      C  List1 2022-02-10  NaN     NaN  left_only
8   List2      A  List2 2022-02-10  1.0     NaN  left_only
9   List2      A  List2 2022-02-10  0.0     1.0       both
10  List3      A  List3 2022-02-10  0.0     1.0       both
So last filter not both rows with remove helper g column:
df = call[df['_merge'].ne('both')].drop('g', axis=1)
print (df)
    List Status               Key
0  List1      A  List1 2022-02-09
2  List1     DO  List1 2022-02-09
6  List3     DO  List3 2022-02-09
7  List1      C  List1 2022-02-10
8  List2      A  List2 2022-02-10
