I’d like to group by a specific column within a data frame called ‘Fruit’ and calculate the percentage of that particular fruit that are ‘Good’ See below for my initial dataframe Dataframe See below for my desired output data frame Note: Because there is 1 “Good” Apple and 1 “Bad” Apple, the percentage of Good Apples is 50%. See below
Tag: pandas-groupby
Get values from dataframe with MultiIndex index containg NaNs
I cannot access the values of an index position that has a nan in it and wonder how I could solve this. (In my project this index has a very special meaning and I really need to keep it, otherwise I would need to make some dirty manual modifications: “there is always a solution” even if it is a very
Apply multiple criteria to select current and prior row – Pandas
I have a dataframe like as shown below I would like to select rows based on the criteria below criteria 1 – pick all rows where source-system = I criteria 2 – pick prior row (n-1) only when source-system of (n-1)th is O and diff is zero. This criteria 2 should be applied only when nth row has source-system =
Group by calculation pandas
I have a dataframe after applying groupby: On this, I want to add a new column with the calculation: 10 / (no of items per category). For the example data, this would be: How can this be done? Answer Use Series.value_counts with Series.map: Or:
perform df.loc to groupby df
I’ve a df consisted of person, origin and destination the df: I have grouped by the df with df_grouped = df.groupby([‘O’,’D’]) and match them with another dataframe, taxi. similarly, I group by the taxi with their O and D. Then I merged them after aggregating and counting the PersonID and TaxiID per O-D pair. I did it to see how
How to get all last rows at second level in MultiIndex DataFrame whose second level has variable length
I have this dataframe: And I want to keep all the last second level rows, meaning that: For thread_id==0 I want to keep the row message_id_in_thread==1 For thread_id==1 I want to keep the row message_id_in_thread==2 For thread_id==2 I want to keep the row message_id_in_thread==1 This can easily be achieved by doing df.iterrows(), but I would like to know if there
Pandas groupby collapse 1st rows of group
I have a system that lets me export data in a table of this format: where there are many columns like ‘data’ and they can have any values that don’t necessarily follow a pattern. I need to get the data into this format: I’ve tried reading the documentation on gropuby and searching similar questions, but I can’t find a
Group by Issue with Years Pandas
I’m following the answer for this StackOverflow post to group a column of years by decades to make it easier for me to visualize later, but I’m not getting the same results. It seems like when DSM did it, it yielded integers for years, while mine is yielding floats for years. I’ve implemented: My Results: Picture of Results Answer You
Shift column position to right based on criteria using Pandas
I have a dataframe that looks like below I would like to position shift by 1 cell to the right if there is NA in the column dep_id. I tried the below but it wasn’t working Any efficient and elegant approach to shift column position on big data? For example, I expect my output to be like as shown below
Applying abbreviation to the column of a dataframe based on another column of the same dataframe
I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its full form. Since these