Tag: pandas-groupby

Elegant way to write np.where for different values in a column

dataframe pandas pandas-groupby python python-3.x

I have a dataframe like as shown below I would like to apply 2 rules to the logout_date column Rule 1 – If person type is B, C,D,E AND logout_date is NaN, then copy the login date value Rule 2 – If person type is A AND logout_date is NaN, then add 2 days to the login date I tried

Python Dataframe Sum and Rank the Rows based on the group they belong

dataframe numpy pandas pandas-groupby python

My df has USA states-related information. I want to rank the states based on its contribution. My code: Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.

How to calculate 12 month rolling sum based on groupby?

pandas pandas-groupby python rolling-computation

I am trying to calculate the 12 month rolling sum for the number of orders and revenue based on a person’s name using Python for the following dataframe: In order to give the following output: The rolling sum should add up all the totals in the past 12 months grouped by the name column. I have tried the following: but

Using groupby and querying that group

dataframe pandas pandas-groupby python

I have a dataframe that I would like to group by one column (dadate) and then query another column (Place) to count only those with the value 1. The above is what I have tired with the error “‘DataFrameGroupBy’ object has no attribute ‘query'” Answer Create the Boolean Series then sum that within group to see how many Places ==

Python Rank with non numeric columns

pandas pandas-groupby python

I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this: Looking for Python equivalent to: The output to be: I’ve tried to use rank() and groupby() but I keep running into a problem of No numeric types to aggregate. Is

Pandas append row based on conditional sum in long form

dataframe pandas pandas-groupby python sum

So, I have some sample data as such: which gives a dataframe in long form like: I want to, for each pair/grouping of location and time, conditionally sum the value column based on the value in the fruit column. Specifically: I want to sum the apple and orange but NOT the banana rows for each grouping. Resulting in the below

Pandas: calculate first purchase amount

calculated-columns pandas pandas-groupby python select

I need to calculate the first purchase amount for every client. This is my code: ticket.groupby([‘user_reference_id’,’total_amount’]).reference_date.min().reset_index()“ And i have this result: user_reference_id total_amount reference_date* enter image description here I need it grouped by user_reference_id with the minimum reference_date (first date when a customer made the purchase) and corresponding total_amount. In this case i need the next output: reference_date 2019-06-14, user_reference_id

Replace duplicate value with NaN using groupby

numpy pandas pandas-groupby python

Dataset(MWE) I am trying to replace duplicates from columns {people_vaccinated,people_fully_vaccinated,people_vaccinated_per_hundred} with NaN while using groupby() on location. I tried some solution online, but couldn’t get them working for me, so instead used the below logic The above logic fails when you have consecutive nulls(more than 2). I need to replace duplicates(while keeping the first instance) with NaNs. What is the

Groupby aggregate and transpose in pandas

dataframe pandas pandas-groupby python python-3.x

df= Off all the genres in the genre field, I only need to consider ‘Rock’, ‘Latin’, ‘Metal’, ‘Blues’ and build a new dataframe based on the following requirements a.how many songs the singer has from that genre (count of each genre must be in a separate column). b.Count of how many albums the singer has in the data. c.Count of

How to Efficiently Perform Multiplication within MultiIndex Groupby

multi-index pandas pandas-groupby python

I am trying to use two of my second level indices to calculate a third index. However, I can’t find an idiomatic way to do this. How can I calculate one second level index from two other second level indices? Each group has the same second level indices. My Code This produces the following data frame: What I Have Note