I have a .csv file like this format
Then I want to convert it to
How can I do it with python pandas
Thank you
Advertisement
Answer
If you load it then you will have DataFrame
like
Y M 1 2 3 0 2019 1 A E H 1 2020 2 B F I 2 2021 3 C G J
Set multi-index usinig year
and month
df = df.set_index(['Y','M'])
1 2 3 Y M 2019 1 A E H 2020 2 B F I 2021 3 C G J
Reshape it using stack()
df = df.stack()
Y M 2019 1 1 A 2 E 3 H 2020 2 1 B 2 F 3 I 2021 3 1 C 2 G 3 J dtype: object
You can add name to column with day
df.index.set_names(['Y', 'M', 'D'], inplace=True)
Y M D 2019 1 1 A 2 E 3 H 2020 2 1 B 2 F 3 I 2021 3 1 C 2 G 3 J dtype: object
Reset index to get normal columns
df = df.reset_index()
Y M D 0 0 2019 1 1 A 1 2019 1 2 E 2 2019 1 3 H 3 2020 2 1 B 4 2020 2 2 F 5 2020 2 3 I 6 2021 3 1 C 7 2021 3 2 G 8 2021 3 3 J
Create column with dates
df['date'] = df.apply(lambda row: "{}/{}/{}".format(row['D'], row['M'], row['Y']), axis=1)
Y M D 0 date 0 2019 1 1 A 1/1/2019 1 2019 1 2 E 2/1/2019 2 2019 1 3 H 3/1/2019 3 2020 2 1 B 1/2/2020 4 2020 2 2 F 2/2/2020 5 2020 2 3 I 3/2/2020 6 2021 3 1 C 1/3/2021 7 2021 3 2 G 2/3/2021 8 2021 3 3 J 3/3/2021
You can remove year
, month
, day
df.drop(['Y', 'M','D'], axis=1, inplace=True)
0 date 0 A 1/1/2019 1 E 2/1/2019 2 H 3/1/2019 3 B 1/2/2020 4 F 2/2/2020 5 I 3/2/2020 6 C 1/3/2021 7 G 2/3/2021 8 J 3/3/2021
You can also rename column
df.rename(columns={0:'value'}, inplace=True)
value date 0 A 1/1/2019 1 E 2/1/2019 2 H 3/1/2019 3 B 1/2/2020 4 F 2/2/2020 5 I 3/2/2020 6 C 1/3/2021 7 G 2/3/2021 8 J 3/3/2021
And you can change order of columns
df = df[['date', 'value']]
date value 0 1/1/2019 A 1 2/1/2019 E 2 3/1/2019 H 3 1/2/2020 B 4 2/2/2020 F 5 3/2/2020 I 6 1/3/2021 C 7 2/3/2021 G 8 3/3/2021 J
Minimal working code
import pandas as pd data = { 'Y': [2019, 2020, 2021], 'M': [1,2,3], '1': ['A','B','C'], '2': ['E','F','G'], '3': ['H','I','J'], } df = pd.DataFrame(data) print(df) df = df.set_index(['Y','M']) print(df) df = df.stack() print(df) df.index.set_names(['Y', 'M', 'D'], inplace=True) print(df) df = df.reset_index() print(df) df['date'] = df.apply(lambda row: "{}/{}/{}".format(row['D'], row['M'], row['Y']), axis=1) print(df) df.drop(['Y', 'M','D'], axis=1, inplace=True) print(df) df.rename(columns={0:'value'}, inplace=True) print(df) df = df[['date', 'value']] print(df)