I am trying to create a mapping where there are duplicates in certain columns in a dataframe.
Here are two examples of dataframes I am working with:
JavaScript
x
11
11
1
issue_status trading_state reason
2
100 'A0'
3
100 None 'F'
4
400 None
5
100 None
6
400 None 'SL'
7
100 'B2'
8
400 None 'L'
9
100 None
10
400 'A6'
11
Here is what I need; 3 conditional python logic that does:
- when we see the first
issue_status
of100
andtrading_state
of None, mapF
in thereason
column. - when we see the second last
issue_status
of400
andtrading_state
of None, mapSL
in thereason
column. - when we see the last
issue_status
of400
andtrading_state
of None, mapL
in thereason
column.
Here is another example:
JavaScript
1
8
1
issue_status trading_state reason
2
400 None 'SL'
3
100 'A0'
4
400 None 'L'
5
400 'A0'
6
100 None 'F'
7
100 None
8
@jezrael, I am getting the following error, for your last line of code
(market_info_df1['reason'] = s1.combine_first(s2)
):
JavaScript
1
15
15
1
@wraps(func)
2
def outer(*args, **kwargs):
3
try:
4
return_value = func(*args, **kwargs)
5
except NormalisedDataError:
6
raise
7
except Exception as exc:
8
original = exc.__class__.__name__
9
> raise NormalisedDataError(f'{original}: {exc}', original_exception_type=original) from exc
10
SettingWithCopyError:
11
E A value is trying to be set on a copy of a slice from a DataFrame.
12
E Try using .loc[row_indexer,col_indexer] = value instead
13
E
14
E See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
15
Any idea what is causing this?
Advertisement
Answer
You can filter by 400
and None
values for df1
, create helper Series
with range
and mapping last and second last values, for first 100
and None
values use Series.duplicated
, last join both Series
by Series.combine_first
:
JavaScript
1
25
25
1
#if None is string
2
#m1 = df['trading_state'].eq('None')
3
m1 = df['trading_state'].isna()
4
5
m2 = df['issue_status'].eq(400)
6
m3 = df['issue_status'].eq(100)
7
8
df1 = df[m1 & m2].copy()
9
10
s1 = pd.Series(range(len(df1), 0, -1), index=df1.index).map({1:'L', 2:'SL'})
11
s2 = df.loc[m1 & m3, 'issue_status'].copy().duplicated().map({False:'F'})
12
13
df['reason'] = s1.combine_first(s2)
14
print (df)
15
issue_status trading_state reason
16
0 100 'A0' NaN
17
1 100 None F
18
2 400 None NaN
19
3 100 None NaN
20
4 400 None SL
21
5 100 'B2' NaN
22
6 400 None L
23
7 100 None NaN
24
8 400 'A6' NaN
25
For second:
JavaScript
1
10
10
1
df['reason'] = s1.combine_first(s2)
2
print (df)
3
issue_status trading_state reason
4
0 400 None SL
5
1 100 'A0' NaN
6
2 400 None L
7
3 400 'A0' NaN
8
4 100 None F
9
5 100 None NaN
10
If necessary empty strings in reason
column use:
JavaScript
1
2
1
df['reason'] = df['reason'].fillna('')
2