I have a dataframe like as shown below I would like to apply 2 rules to the logout_date column Rule 1 – If person type is B, C,D,E AND logout_date is NaN, then copy the login date value Rule 2 – If person type is A AND logout_date is NaN, then add 2 days to the login date I tried
Tag: pandas-groupby
Python Dataframe Sum and Rank the Rows based on the group they belong
My df has USA states-related information. I want to rank the states based on its contribution. My code: Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.
How to calculate 12 month rolling sum based on groupby?
I am trying to calculate the 12 month rolling sum for the number of orders and revenue based on a person’s name using Python for the following dataframe: In order to give the following output: The rolling sum should add up all the totals in the past 12 months grouped by the name column. I have tried the following: but
Using groupby and querying that group
I have a dataframe that I would like to group by one column (dadate) and then query another column (Place) to count only those with the value 1. The above is what I have tired with the error “‘DataFrameGroupBy’ object has no attribute ‘query'” Answer Create the Boolean Series then sum that within group to see how many Places ==
Python Rank with non numeric columns
I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this: Looking for Python equivalent to: The output to be: I’ve tried to use rank() and groupby() but I keep running into a problem of No numeric types to aggregate. Is
Pandas append row based on conditional sum in long form
So, I have some sample data as such: which gives a dataframe in long form like: I want to, for each pair/grouping of location and time, conditionally sum the value column based on the value in the fruit column. Specifically: I want to sum the apple and orange but NOT the banana rows for each grouping. Resulting in the below
Pandas: calculate first purchase amount
I need to calculate the first purchase amount for every client. This is my code: ticket.groupby([‘user_reference_id’,’total_amount’]).reference_date.min().reset_index()“ And i have this result: user_reference_id total_amount reference_date* enter image description here I need it grouped by user_reference_id with the minimum reference_date (first date when a customer made the purchase) and corresponding total_amount. In this case i need the next output: reference_date 2019-06-14, user_reference_id
Replace duplicate value with NaN using groupby
Dataset(MWE) I am trying to replace duplicates from columns {people_vaccinated,people_fully_vaccinated,people_vaccinated_per_hundred} with NaN while using groupby() on location. I tried some solution online, but couldn’t get them working for me, so instead used the below logic The above logic fails when you have consecutive nulls(more than 2). I need to replace duplicates(while keeping the first instance) with NaNs. What is the
Groupby aggregate and transpose in pandas
df= Off all the genres in the genre field, I only need to consider ‘Rock’, ‘Latin’, ‘Metal’, ‘Blues’ and build a new dataframe based on the following requirements a.how many songs the singer has from that genre (count of each genre must be in a separate column). b.Count of how many albums the singer has in the data. c.Count of
How to Efficiently Perform Multiplication within MultiIndex Groupby
I am trying to use two of my second level indices to calculate a third index. However, I can’t find an idiomatic way to do this. How can I calculate one second level index from two other second level indices? Each group has the same second level indices. My Code This produces the following data frame: What I Have Note