I have a dataframe that defines list of call (call)[List]. each call has an answer status (call)[Status]. I created a column to have a unique field (call)[Key]
Call DataFrame which appears as following:
a = {'List':['List1','List1','List1','List1','List2','List3','List3','List1','List2','List2', 'List3'],
'Status': ['A', 'A', 'DO','A','A', 'A', 'DO','C','A', 'A', 'A'],
'Key': ['List1 2022-02-09', 'List1 2022-02-09', 'List1 2022-02-09','List1 2022-02-09','List2 2022-02-09', 'List3 2022-02-09', 'List3 2022-02-09','List1 2022-02-10','List2 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10']}
call = pd.DataFrame(data = a)
A seconde Dataframe with a calculated column (DimsDrops)[# drop] ; The join with the (Call)table is done with the (Call)[Key].
DimsDrops table which appears as following:
b = {'Key': ['List1 2022-02-09', 'List2 2022-02-09', 'List3 2022-02-09','List1 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10'],
'# drop': [2,1,1,0,1,1]}
DimsDrops = pd.DataFrame(data = b)
I want deleting the rows based on (DimsDrops)[# drop] and with (call)[Status] = “A”
Desired result is the following:
| List | Status | Key -------------------------------------------------- | List1 | A | List1 2022-02-09 | List1 | DO | List1 2022-02-09 | List3 | DO | List3 2022-02-09 | List1 | C | List1 2022-02-10 | List2 | A | List2 2022-02-10
By using call.drop( export.query(‘Status = “A” & …).index, inplace= True)), I am blocked how to recover the number of line to delete by making the join with “key”
Advertisement
Answer
EDIT:
First repeat rows by number of deleted rows by Index.repeat and DataFrame.loc and create counter column by GroupBy.cumcount:
df1 = DimsDrops.loc[DimsDrops.index.repeat(DimsDrops['# drop'])]
df1['g'] = df1.groupby('Key').cumcount(ascending=False)
print (df1)
Key # drop g
0 List1 2022-02-09 2 1
0 List1 2022-02-09 2 0
1 List2 2022-02-09 1 0
2 List3 2022-02-09 1 0
4 List2 2022-02-10 1 0
5 List3 2022-02-10 1 0
Then filter only rows with A and create counter in call:
call['g'] = call[call['Status'].eq('A')].groupby('Key').cumcount(ascending=False)
print (call)
List Status Key g
0 List1 A List1 2022-02-09 2.0
1 List1 A List1 2022-02-09 1.0
2 List1 DO List1 2022-02-09 NaN
3 List1 A List1 2022-02-09 0.0
4 List2 A List2 2022-02-09 0.0
5 List3 A List3 2022-02-09 0.0
6 List3 DO List3 2022-02-09 NaN
7 List1 C List1 2022-02-10 NaN
8 List2 A List2 2022-02-10 1.0
9 List2 A List2 2022-02-10 0.0
10 List3 A List3 2022-02-10 0.0
Join both DataFrames by left join and indicator=True parameter:
df = call.merge(df1, how='left', indicator=True)
print (df)
List Status Key g # drop _merge
0 List1 A List1 2022-02-09 2.0 NaN left_only
1 List1 A List1 2022-02-09 1.0 2.0 both
2 List1 DO List1 2022-02-09 NaN NaN left_only
3 List1 A List1 2022-02-09 0.0 2.0 both
4 List2 A List2 2022-02-09 0.0 1.0 both
5 List3 A List3 2022-02-09 0.0 1.0 both
6 List3 DO List3 2022-02-09 NaN NaN left_only
7 List1 C List1 2022-02-10 NaN NaN left_only
8 List2 A List2 2022-02-10 1.0 NaN left_only
9 List2 A List2 2022-02-10 0.0 1.0 both
10 List3 A List3 2022-02-10 0.0 1.0 both
So last filter not both rows with remove helper g column:
df = call[df['_merge'].ne('both')].drop('g', axis=1)
print (df)
List Status Key
0 List1 A List1 2022-02-09
2 List1 DO List1 2022-02-09
6 List3 DO List3 2022-02-09
7 List1 C List1 2022-02-10
8 List2 A List2 2022-02-10