Skip to content
Advertisement

How can I use python conditionals to map columns in a dataframe with duplicates in them?

I am trying to create a mapping where there are duplicates in certain columns in a dataframe.

Here are two examples of dataframes I am working with:

issue_status        trading_state         reason
    100                  'A0'                
    100                  None               'F'
    400                  None
    100                  None
    400                  None               'SL'
    100                  'B2'
    400                  None               'L'
    100                  None 
    400                  'A6'

Here is what I need; 3 conditional python logic that does:

  • when we see the first issue_status of 100 and trading_state of None, map F in the reason column.
  • when we see the second last issue_status of 400 and trading_state of None, map SL in the reason column.
  • when we see the last issue_status of 400 and trading_state of None, map L in the reason column.

Here is another example:

 issue_status      trading_state      reason
     400                None            'SL'
     100                'A0'
     400                None            'L'
     400                'A0'
     100                None            'F'
     100                None 

@jezrael, I am getting the following error, for your last line of code (market_info_df1['reason'] = s1.combine_first(s2)):

@wraps(func)
    def outer(*args, **kwargs):
        try:
            return_value = func(*args, **kwargs)
        except NormalisedDataError:
            raise
        except Exception as exc:
            original = exc.__class__.__name__
>           raise NormalisedDataError(f'{original}: {exc}', original_exception_type=original) from exc
SettingWithCopyError: 
E           A value is trying to be set on a copy of a slice from a DataFrame.
E           Try using .loc[row_indexer,col_indexer] = value instead
E           
E           See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Any idea what is causing this?

Advertisement

Answer

You can filter by 400 and None values for df1, create helper Series with range and mapping last and second last values, for first 100 and None values use Series.duplicated, last join both Series by Series.combine_first:

#if None is string
#m1 = df['trading_state'].eq('None')
m1 = df['trading_state'].isna()

m2 = df['issue_status'].eq(400)
m3 = df['issue_status'].eq(100)

df1 = df[m1 & m2].copy()

s1 = pd.Series(range(len(df1), 0, -1), index=df1.index).map({1:'L', 2:'SL'})
s2 = df.loc[m1 & m3, 'issue_status'].copy().duplicated().map({False:'F'})

df['reason'] = s1.combine_first(s2)
print (df)
   issue_status trading_state reason
0           100          'A0'    NaN
1           100          None      F
2           400          None    NaN
3           100          None    NaN
4           400          None     SL
5           100          'B2'    NaN
6           400          None      L
7           100          None    NaN
8           400          'A6'    NaN

For second:

df['reason'] = s1.combine_first(s2)
print (df)
   issue_status trading_state reason
0           400          None     SL
1           100          'A0'    NaN
2           400          None      L
3           400          'A0'    NaN
4           100          None      F
5           100          None    NaN

If necessary empty strings in reason column use:

df['reason'] = df['reason'].fillna('')
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement