I have the dataframe df1 with the columns type, Date and amount. My goal is to create a Dataframe df2 with a subset of dates from df1, in which each type has a column with the amounts of the type as values for the respective date.
Input Dataframe:
df1 =
,type,Date,amount 0,42,2017-02-01,4 1,42,2017-02-02,5 2,42,2017-02-03,7 3,42,2017-02-04,2 4,48,2017-02-01,6 5,48,2017-02-02,8 6,48,2017-02-03,3 7,48,2017-02-04,6 8,46,2017-02-01,3 9,46,2017-02-02,8 10,46,2017-02-03,3 11,46,2017-02-04,4
Desired Output, if the subset of Dates are 2017-02-02 and 2017-02-04:
df2 =
,Date,42,48,46 0,2017-02-02,5,8,8 1,2017-02-04,2,6,4
I tried it like this:
types = list(df1["type"].unique()) dates = ["2017-02-02","2017-02-04"] df2 = pd.DataFrame() df2["Date"]=dates for t in types: df2[t] = df1[(df1["type"]==t)&(df1[df1["type"]==t][["Date"]]==df2["Date"])][["amount"]]
but with this solution I get a lot of NaNs, it seems my comparison condition is wrong. This is the Ouput I get:
,Date,42,48,46 0,2017-02-02,,, 1,2017-02-04,,,
Advertisement
Answer
You can use .pivot_table()
and then filter data:
df2 = df1.pivot_table( index="Date", columns="type", values="amount", aggfunc="sum" ) dates = ["2017-02-02", "2017-02-04"] print(df2.loc[dates].reset_index())
Prints:
type Date 42 46 48 0 2017-02-02 5 8 8 1 2017-02-04 2 4 6