I have multiple Pandas dataframes like this one (for different years):
df1=
JavaScript
x
10
10
1
Unnamed: 0 b c Monthly Flow (2018)
2
1 nan -0.041619 43.91 -0.041619
3
2 nan 0.011913 43.91 -0.041619
4
3 nan -0.048801 43.91 -0.041619
5
4 nan 0.002857 43.91 -0.041619
6
5 nan 0.002204 43.91 -0.041619
7
6 nan -0.007692 43.91 -0.041619
8
7 nan -0.014992 43.91 -0.041619
9
8 nan -0.035381 43.91 -0.041619
10
And I would like to assign to the nan
the year in the Monthly Flow (2018)
column, thus achieving this output:
JavaScript
1
10
10
1
Year b c Monthly Flow (2018)
2
1 2018 -0.041619 43.91 -0.041619
3
2 2018 0.011913 43.91 -0.041619
4
3 2018 -0.048801 43.91 -0.041619
5
4 2018 0.002857 43.91 -0.041619
6
5 2018 0.002204 43.91 -0.041619
7
6 2018 -0.007692 43.91 -0.041619
8
7 2018 -0.014992 43.91 -0.041619
9
8 2018 -0.035381 43.91 -0.041619
10
I know how to replace these nan
by a specific year, one dataframe at a time.
But, since I have a lot of dataframes (and will have more in the future), I would like to know a way to do this automatically, for example by extracting the year value from column Monthly Flow (2018)
.
Advertisement
Answer
Assuming Monthly flow is always the 5th column, you can do it like this:
JavaScript
1
4
1
import re
2
df = df.rename(columns={'Unnamed: 0': 'Year'})
3
df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)
4
Explanation:
re.search
looks for 4 numbers in a row and extracts them from the fifth column.
I rename the Unnamed
column as Year
.
Working code:
JavaScript
1
11
11
1
import pandas as pd
2
import numpy as np
3
import re
4
df = pd.DataFrame({'Unnamed: 0': {0: np.nan},
5
'a': {0: 1},
6
'a2': {0: 1},
7
'a3': {0: 1},
8
'Monthly Flow (2018)': {0: 'b'}})
9
df = df.rename(columns={'Unnamed: 0': 'Year'})
10
df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)
11