Tag: pandas-groupby

How to groupby and calculate new field with python pandas?

I’d like to group by a specific column within a data frame called ‘Fruit’ and calculate the percentage of that particular fruit that are ‘Good’ See below for my initial dataframe Dataframe See below for my desired output data frame Note: Because there is 1 “Good” Apple and 1 “Bad” Apple, the percentage of Good Apples is 50%. See below

Get values from dataframe with MultiIndex index containg NaNs

multi-index pandas pandas-groupby python

I cannot access the values of an index position that has a nan in it and wonder how I could solve this. (In my project this index has a very special meaning and I really need to keep it, otherwise I would need to make some dirty manual modifications: “there is always a solution” even if it is a very

Apply multiple criteria to select current and prior row – Pandas

dataframe pandas pandas-groupby python series

I have a dataframe like as shown below I would like to select rows based on the criteria below criteria 1 – pick all rows where source-system = I criteria 2 – pick prior row (n-1) only when source-system of (n-1)th is O and diff is zero. This criteria 2 should be applied only when nth row has source-system =

perform df.loc to groupby df

dataframe pandas pandas-groupby python

I’ve a df consisted of person, origin and destination the df: I have grouped by the df with df_grouped = df.groupby([‘O’,’D’]) and match them with another dataframe, taxi. similarly, I group by the taxi with their O and D. Then I merged them after aggregating and counting the PersonID and TaxiID per O-D pair. I did it to see how

How to get all last rows at second level in MultiIndex DataFrame whose second level has variable length

dataframe indexing pandas pandas-groupby python

I have this dataframe: And I want to keep all the last second level rows, meaning that: For thread_id==0 I want to keep the row message_id_in_thread==1 For thread_id==1 I want to keep the row message_id_in_thread==2 For thread_id==2 I want to keep the row message_id_in_thread==1 This can easily be achieved by doing df.iterrows(), but I would like to know if there

Pandas groupby collapse 1st rows of group

pandas pandas-groupby python

I have a system that lets me export data in a table of this format: where there are many columns like ‘data’ and they can have any values that don’t necessarily follow a pattern. I need to get the data into this format: I’ve tried reading the documentation on gropuby and searching similar questions, but I can’t find a

Group by Issue with Years Pandas

dataframe pandas pandas-groupby python

I’m following the answer for this StackOverflow post to group a column of years by decades to make it easier for me to visualize later, but I’m not getting the same results. It seems like when DSM did it, it yielded integers for years, while mine is yielding floats for years. I’ve implemented: My Results: Picture of Results Answer You

Shift column position to right based on criteria using Pandas

dataframe pandas pandas-groupby python python-3.x

I have a dataframe that looks like below I would like to position shift by 1 cell to the right if there is NA in the column dep_id. I tried the below but it wasn’t working Any efficient and elegant approach to shift column position on big data? For example, I expect my output to be like as shown below

Applying abbreviation to the column of a dataframe based on another column of the same dataframe

nlp pandas pandas-groupby python text-classification

I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its full form. Since these