I am trying to create a mapping where there are duplicates in certain columns in a dataframe.
Here are two examples of dataframes I am working with:
issue_status trading_state reason 100 'A0' 100 None 'F' 400 None 100 None 400 None 'SL' 100 'B2' 400 None 'L' 100 None 400 'A6'
Here is what I need; 3 conditional python logic that does:
- when we see the first
issue_status
of100
andtrading_state
of None, mapF
in thereason
column. - when we see the second last
issue_status
of400
andtrading_state
of None, mapSL
in thereason
column. - when we see the last
issue_status
of400
andtrading_state
of None, mapL
in thereason
column.
Here is another example:
issue_status trading_state reason 400 None 'SL' 100 'A0' 400 None 'L' 400 'A0' 100 None 'F' 100 None
@jezrael, I am getting the following error, for your last line of code
(market_info_df1['reason'] = s1.combine_first(s2)
):
@wraps(func) def outer(*args, **kwargs): try: return_value = func(*args, **kwargs) except NormalisedDataError: raise except Exception as exc: original = exc.__class__.__name__ > raise NormalisedDataError(f'{original}: {exc}', original_exception_type=original) from exc SettingWithCopyError: E A value is trying to be set on a copy of a slice from a DataFrame. E Try using .loc[row_indexer,col_indexer] = value instead E E See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Any idea what is causing this?
Advertisement
Answer
You can filter by 400
and None
values for df1
, create helper Series
with range
and mapping last and second last values, for first 100
and None
values use Series.duplicated
, last join both Series
by Series.combine_first
:
#if None is string #m1 = df['trading_state'].eq('None') m1 = df['trading_state'].isna() m2 = df['issue_status'].eq(400) m3 = df['issue_status'].eq(100) df1 = df[m1 & m2].copy() s1 = pd.Series(range(len(df1), 0, -1), index=df1.index).map({1:'L', 2:'SL'}) s2 = df.loc[m1 & m3, 'issue_status'].copy().duplicated().map({False:'F'}) df['reason'] = s1.combine_first(s2) print (df) issue_status trading_state reason 0 100 'A0' NaN 1 100 None F 2 400 None NaN 3 100 None NaN 4 400 None SL 5 100 'B2' NaN 6 400 None L 7 100 None NaN 8 400 'A6' NaN
For second:
df['reason'] = s1.combine_first(s2) print (df) issue_status trading_state reason 0 400 None SL 1 100 'A0' NaN 2 400 None L 3 400 'A0' NaN 4 100 None F 5 100 None NaN
If necessary empty strings in reason
column use:
df['reason'] = df['reason'].fillna('')