Skip to content
Advertisement

dataframe operations – column attributes to new columns in a new subset dataframe with conditions

I have the dataframe df1 with the columns type, Date and amount. My goal is to create a Dataframe df2 with a subset of dates from df1, in which each type has a column with the amounts of the type as values for the respective date.

Input Dataframe:
df1 =

,type,Date,amount
0,42,2017-02-01,4
1,42,2017-02-02,5
2,42,2017-02-03,7
3,42,2017-02-04,2
4,48,2017-02-01,6
5,48,2017-02-02,8
6,48,2017-02-03,3
7,48,2017-02-04,6
8,46,2017-02-01,3
9,46,2017-02-02,8
10,46,2017-02-03,3
11,46,2017-02-04,4

Desired Output, if the subset of Dates are 2017-02-02 and 2017-02-04:
df2 =

,Date,42,48,46
0,2017-02-02,5,8,8
1,2017-02-04,2,6,4

I tried it like this:

types = list(df1["type"].unique())
dates = ["2017-02-02","2017-02-04"]
df2 = pd.DataFrame()
df2["Date"]=dates
for t in types:
    df2[t] = df1[(df1["type"]==t)&(df1[df1["type"]==t][["Date"]]==df2["Date"])][["amount"]]

but with this solution I get a lot of NaNs, it seems my comparison condition is wrong. This is the Ouput I get:

,Date,42,48,46
0,2017-02-02,,,
1,2017-02-04,,,

Advertisement

Answer

You can use .pivot_table() and then filter data:

df2 = df1.pivot_table(
    index="Date", columns="type", values="amount", aggfunc="sum"
)
dates = ["2017-02-02", "2017-02-04"]
print(df2.loc[dates].reset_index())

Prints:

type        Date  42  46  48
0     2017-02-02   5   8   8
1     2017-02-04   2   4   6
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement