I have the dataframe df1 with the columns type, Date and amount. My goal is to create a Dataframe df2 with a subset of dates from df1, in which each type has a column with the amounts of the type as values for the respective date.
Input Dataframe:
df1 =
JavaScript
x
14
14
1
,type,Date,amount
2
0,42,2017-02-01,4
3
1,42,2017-02-02,5
4
2,42,2017-02-03,7
5
3,42,2017-02-04,2
6
4,48,2017-02-01,6
7
5,48,2017-02-02,8
8
6,48,2017-02-03,3
9
7,48,2017-02-04,6
10
8,46,2017-02-01,3
11
9,46,2017-02-02,8
12
10,46,2017-02-03,3
13
11,46,2017-02-04,4
14
Desired Output, if the subset of Dates are 2017-02-02 and 2017-02-04:
df2 =
JavaScript
1
4
1
,Date,42,48,46
2
0,2017-02-02,5,8,8
3
1,2017-02-04,2,6,4
4
I tried it like this:
JavaScript
1
7
1
types = list(df1["type"].unique())
2
dates = ["2017-02-02","2017-02-04"]
3
df2 = pd.DataFrame()
4
df2["Date"]=dates
5
for t in types:
6
df2[t] = df1[(df1["type"]==t)&(df1[df1["type"]==t][["Date"]]==df2["Date"])][["amount"]]
7
but with this solution I get a lot of NaNs, it seems my comparison condition is wrong. This is the Ouput I get:
JavaScript
1
4
1
,Date,42,48,46
2
0,2017-02-02,,,
3
1,2017-02-04,,,
4
Advertisement
Answer
You can use .pivot_table()
and then filter data:
JavaScript
1
6
1
df2 = df1.pivot_table(
2
index="Date", columns="type", values="amount", aggfunc="sum"
3
)
4
dates = ["2017-02-02", "2017-02-04"]
5
print(df2.loc[dates].reset_index())
6
Prints:
JavaScript
1
4
1
type Date 42 46 48
2
0 2017-02-02 5 8 8
3
1 2017-02-04 2 4 6
4