I have a .csv file like this format
Then I want to convert it to
How can I do it with python pandas
Thank you
Advertisement
Answer
If you load it then you will have DataFrame
like
JavaScript
x
5
1
Y M 1 2 3
2
0 2019 1 A E H
3
1 2020 2 B F I
4
2 2021 3 C G J
5
Set multi-index usinig year
and month
JavaScript
1
2
1
df = df.set_index(['Y','M'])
2
JavaScript
1
6
1
1 2 3
2
Y M
3
2019 1 A E H
4
2020 2 B F I
5
2021 3 C G J
6
Reshape it using stack()
JavaScript
1
2
1
df = df.stack()
2
JavaScript
1
12
12
1
Y M
2
2019 1 1 A
3
2 E
4
3 H
5
2020 2 1 B
6
2 F
7
3 I
8
2021 3 1 C
9
2 G
10
3 J
11
dtype: object
12
You can add name to column with day
JavaScript
1
2
1
df.index.set_names(['Y', 'M', 'D'], inplace=True)
2
JavaScript
1
12
12
1
Y M D
2
2019 1 1 A
3
2 E
4
3 H
5
2020 2 1 B
6
2 F
7
3 I
8
2021 3 1 C
9
2 G
10
3 J
11
dtype: object
12
Reset index to get normal columns
JavaScript
1
2
1
df = df.reset_index()
2
JavaScript
1
11
11
1
Y M D 0
2
0 2019 1 1 A
3
1 2019 1 2 E
4
2 2019 1 3 H
5
3 2020 2 1 B
6
4 2020 2 2 F
7
5 2020 2 3 I
8
6 2021 3 1 C
9
7 2021 3 2 G
10
8 2021 3 3 J
11
Create column with dates
JavaScript
1
2
1
df['date'] = df.apply(lambda row: "{}/{}/{}".format(row['D'], row['M'], row['Y']), axis=1)
2
JavaScript
1
11
11
1
Y M D 0 date
2
0 2019 1 1 A 1/1/2019
3
1 2019 1 2 E 2/1/2019
4
2 2019 1 3 H 3/1/2019
5
3 2020 2 1 B 1/2/2020
6
4 2020 2 2 F 2/2/2020
7
5 2020 2 3 I 3/2/2020
8
6 2021 3 1 C 1/3/2021
9
7 2021 3 2 G 2/3/2021
10
8 2021 3 3 J 3/3/2021
11
You can remove year
, month
, day
JavaScript
1
2
1
df.drop(['Y', 'M','D'], axis=1, inplace=True)
2
JavaScript
1
11
11
1
0 date
2
0 A 1/1/2019
3
1 E 2/1/2019
4
2 H 3/1/2019
5
3 B 1/2/2020
6
4 F 2/2/2020
7
5 I 3/2/2020
8
6 C 1/3/2021
9
7 G 2/3/2021
10
8 J 3/3/2021
11
You can also rename column
JavaScript
1
2
1
df.rename(columns={0:'value'}, inplace=True)
2
JavaScript
1
11
11
1
value date
2
0 A 1/1/2019
3
1 E 2/1/2019
4
2 H 3/1/2019
5
3 B 1/2/2020
6
4 F 2/2/2020
7
5 I 3/2/2020
8
6 C 1/3/2021
9
7 G 2/3/2021
10
8 J 3/3/2021
11
And you can change order of columns
JavaScript
1
2
1
df = df[['date', 'value']]
2
JavaScript
1
11
11
1
date value
2
0 1/1/2019 A
3
1 2/1/2019 E
4
2 3/1/2019 H
5
3 1/2/2020 B
6
4 2/2/2020 F
7
5 3/2/2020 I
8
6 1/3/2021 C
9
7 2/3/2021 G
10
8 3/3/2021 J
11
Minimal working code
JavaScript
1
37
37
1
import pandas as pd
2
3
data = {
4
'Y': [2019, 2020, 2021],
5
'M': [1,2,3],
6
'1': ['A','B','C'],
7
'2': ['E','F','G'],
8
'3': ['H','I','J'],
9
}
10
11
df = pd.DataFrame(data)
12
print(df)
13
14
df = df.set_index(['Y','M'])
15
print(df)
16
17
df = df.stack()
18
print(df)
19
20
df.index.set_names(['Y', 'M', 'D'], inplace=True)
21
print(df)
22
23
df = df.reset_index()
24
print(df)
25
26
df['date'] = df.apply(lambda row: "{}/{}/{}".format(row['D'], row['M'], row['Y']), axis=1)
27
print(df)
28
29
df.drop(['Y', 'M','D'], axis=1, inplace=True)
30
print(df)
31
32
df.rename(columns={0:'value'}, inplace=True)
33
print(df)
34
35
df = df[['date', 'value']]
36
print(df)
37