Skip to content
Advertisement

Better (maybe more SQL-ish) way to populate pandas dataframe column from row and meta data than iterating over rows, please

My data looks like this:

import pandas as pd
codes = ['a', 'b', 'c']
to_flatten = [[c] * 6 for c in codes]
nums = [1, 2, 3, 4, 5, 6]
data_df = pd.DataFrame({'code': [item for subl in to_flatten for item in subl], 'num': nums * 3})

Because I used a pandas.groupby() process to generate my metadata, it looks like this:

meta_df = pd.DataFrame({'code': codes, 'start_at': [3, 2, 2], 'end_at': [5, 3, 2]})

Now, if my metadata looked like:

meta_dict = {'a': (3, 5), 'b': (1, 3), 'c': (2, 2)}

I could easily write:

def f(row):
    start_at, end_at = meta_dict[row['code']]
    if start_at <= row['num'] <= end_at:
        return 1
    return 0
data_df['valid'] = data_df.apply(f, axis=1)
print(data_df)

I feel that there should be a different, pandas-oriented, way to directly use the metadata in the meta_df dataframe format that I have, and that it’ll probably be more efficient than looping through the whole of the data_df. But what?

Is there a better approach, or should I just wrangle the values for each code out of meta_df and iterate with my function f() upon the rows of data_df?

Advertisement

Answer

dfn = pd.merge(data_df, meta_df, on='code', how='left')
dfn['valid'] = np.where(dfn['num'].between(dfn.start_at, dfn.end_at), 1, 0)
print(dfn)
   code  num  start_at  end_at  valid
0     a    1         3       5      0
1     a    2         3       5      0
2     a    3         3       5      1
3     a    4         3       5      1
4     a    5         3       5      1
5     a    6         3       5      0
6     b    1         2       3      0
7     b    2         2       3      1
8     b    3         2       3      1
9     b    4         2       3      0
10    b    5         2       3      0
11    b    6         2       3      0
12    c    1         2       2      0
13    c    2         2       2      1
14    c    3         2       2      0
15    c    4         2       2      0
16    c    5         2       2      0
17    c    6         2       2      0
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement