Instead of doing this:
df['A'] = df['A'] if 'A' in df else None df['B'] = df['B'] if 'B' in df else None df['C'] = df['C'] if 'C' in df else None df['D'] = df['D'] if 'D' in df else None ...
I want to do this in one line or function. Below is what I tried:
def populate_columns(df): col_names = ['A', 'B', 'C', 'D', 'E', 'F', ...] def populate_column(df, col_name): df[col_name] = df[col_name] if col_name in df else None return df[col_name] df[col_name] = df.apply(lambda x: populate_column(x) for x in col_names) return df
But I just get Exception has occurred: ValueError
. What can I do here?
Advertisement
Answer
Looks like you can replace your whole code with a reindex
:
ensure_cols = ['A', 'B', 'C', 'D'] df = df.reindex(columns=df.columns.union(ensure_cols))
NB. By default the fill value is NaN
, if you really want None
use fill_value=None
.
If you want to fix your code, just use a single loop:
col_names = ['A', 'B', 'C', 'D'] for c in col_names: if c not in df: df[c] = None