I have multiple Pandas dataframes like this one (for different years):
df1=
Unnamed: 0 b c Monthly Flow (2018) 1 nan -0.041619 43.91 -0.041619 2 nan 0.011913 43.91 -0.041619 3 nan -0.048801 43.91 -0.041619 4 nan 0.002857 43.91 -0.041619 5 nan 0.002204 43.91 -0.041619 6 nan -0.007692 43.91 -0.041619 7 nan -0.014992 43.91 -0.041619 8 nan -0.035381 43.91 -0.041619
And I would like to assign to the nan
the year in the Monthly Flow (2018)
column, thus achieving this output:
Year b c Monthly Flow (2018) 1 2018 -0.041619 43.91 -0.041619 2 2018 0.011913 43.91 -0.041619 3 2018 -0.048801 43.91 -0.041619 4 2018 0.002857 43.91 -0.041619 5 2018 0.002204 43.91 -0.041619 6 2018 -0.007692 43.91 -0.041619 7 2018 -0.014992 43.91 -0.041619 8 2018 -0.035381 43.91 -0.041619
I know how to replace these nan
by a specific year, one dataframe at a time.
But, since I have a lot of dataframes (and will have more in the future), I would like to know a way to do this automatically, for example by extracting the year value from column Monthly Flow (2018)
.
Advertisement
Answer
Assuming Monthly flow is always the 5th column, you can do it like this:
import re df = df.rename(columns={'Unnamed: 0': 'Year'}) df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)
Explanation:
re.search
looks for 4 numbers in a row and extracts them from the fifth column.
I rename the Unnamed
column as Year
.
Working code:
import pandas as pd import numpy as np import re df = pd.DataFrame({'Unnamed: 0': {0: np.nan}, 'a': {0: 1}, 'a2': {0: 1}, 'a3': {0: 1}, 'Monthly Flow (2018)': {0: 'b'}}) df = df.rename(columns={'Unnamed: 0': 'Year'}) df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)