I have a dataframe that defines list of call (call)[List]. each call has an answer status (call)[Status]. I created a column to have a unique field (call)[Key]
Call DataFrame which appears as following:
a = {'List':['List1','List1','List1','List1','List2','List3','List3','List1','List2','List2', 'List3'], 'Status': ['A', 'A', 'DO','A','A', 'A', 'DO','C','A', 'A', 'A'], 'Key': ['List1 2022-02-09', 'List1 2022-02-09', 'List1 2022-02-09','List1 2022-02-09','List2 2022-02-09', 'List3 2022-02-09', 'List3 2022-02-09','List1 2022-02-10','List2 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10']} call = pd.DataFrame(data = a)
A seconde Dataframe with a calculated column (DimsDrops)[# drop] ; The join with the (Call)table is done with the (Call)[Key].
DimsDrops table which appears as following:
b = {'Key': ['List1 2022-02-09', 'List2 2022-02-09', 'List3 2022-02-09','List1 2022-02-10', 'List2 2022-02-10', 'List3 2022-02-10'], '# drop': [2,1,1,0,1,1]} DimsDrops = pd.DataFrame(data = b)
I want deleting the rows based on (DimsDrops)[# drop] and with (call)[Status] = “A”
Desired result is the following:
| List | Status | Key -------------------------------------------------- | List1 | A | List1 2022-02-09 | List1 | DO | List1 2022-02-09 | List3 | DO | List3 2022-02-09 | List1 | C | List1 2022-02-10 | List2 | A | List2 2022-02-10
By using call.drop( export.query(‘Status = “A” & …).index, inplace= True)), I am blocked how to recover the number of line to delete by making the join with “key”
Advertisement
Answer
EDIT:
First repeat rows by number of deleted rows by Index.repeat
and DataFrame.loc
and create counter column by GroupBy.cumcount
:
df1 = DimsDrops.loc[DimsDrops.index.repeat(DimsDrops['# drop'])] df1['g'] = df1.groupby('Key').cumcount(ascending=False) print (df1) Key # drop g 0 List1 2022-02-09 2 1 0 List1 2022-02-09 2 0 1 List2 2022-02-09 1 0 2 List3 2022-02-09 1 0 4 List2 2022-02-10 1 0 5 List3 2022-02-10 1 0
Then filter only rows with A
and create counter in call
:
call['g'] = call[call['Status'].eq('A')].groupby('Key').cumcount(ascending=False) print (call) List Status Key g 0 List1 A List1 2022-02-09 2.0 1 List1 A List1 2022-02-09 1.0 2 List1 DO List1 2022-02-09 NaN 3 List1 A List1 2022-02-09 0.0 4 List2 A List2 2022-02-09 0.0 5 List3 A List3 2022-02-09 0.0 6 List3 DO List3 2022-02-09 NaN 7 List1 C List1 2022-02-10 NaN 8 List2 A List2 2022-02-10 1.0 9 List2 A List2 2022-02-10 0.0 10 List3 A List3 2022-02-10 0.0
Join both DataFrames by left join and indicator=True
parameter:
df = call.merge(df1, how='left', indicator=True) print (df) List Status Key g # drop _merge 0 List1 A List1 2022-02-09 2.0 NaN left_only 1 List1 A List1 2022-02-09 1.0 2.0 both 2 List1 DO List1 2022-02-09 NaN NaN left_only 3 List1 A List1 2022-02-09 0.0 2.0 both 4 List2 A List2 2022-02-09 0.0 1.0 both 5 List3 A List3 2022-02-09 0.0 1.0 both 6 List3 DO List3 2022-02-09 NaN NaN left_only 7 List1 C List1 2022-02-10 NaN NaN left_only 8 List2 A List2 2022-02-10 1.0 NaN left_only 9 List2 A List2 2022-02-10 0.0 1.0 both 10 List3 A List3 2022-02-10 0.0 1.0 both
So last filter not both
rows with remove helper g
column:
df = call[df['_merge'].ne('both')].drop('g', axis=1) print (df) List Status Key 0 List1 A List1 2022-02-09 2 List1 DO List1 2022-02-09 6 List3 DO List3 2022-02-09 7 List1 C List1 2022-02-10 8 List2 A List2 2022-02-10