Skip to content
Advertisement

How to use pd.apply() to instantiate new columns?

Instead of doing this:

df['A'] = df['A'] if 'A' in df else None
df['B'] = df['B'] if 'B' in df else None
df['C'] = df['C'] if 'C' in df else None
df['D'] = df['D'] if 'D' in df else None
...

I want to do this in one line or function. Below is what I tried:

def populate_columns(df):
        
    col_names = ['A', 'B', 'C', 'D', 'E', 'F', ...]
               
    def populate_column(df, col_name):
        df[col_name] = df[col_name] if col_name in df else None
        return df[col_name]
        
    df[col_name] = df.apply(lambda x: populate_column(x) for x in col_names)    
    return df

But I just get Exception has occurred: ValueError. What can I do here?

Advertisement

Answer

Looks like you can replace your whole code with a reindex:

ensure_cols = ['A', 'B', 'C', 'D']
df = df.reindex(columns=df.columns.union(ensure_cols))

NB. By default the fill value is NaN, if you really want None use fill_value=None.

If you want to fix your code, just use a single loop:

col_names = ['A', 'B', 'C', 'D']
for c in col_names:
    if c not in df:
        df[c] = None
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement