I have a pandas DataFrame of the following format: Input: where (version, branch) is a MultiIndex. PROBLEM DESCRIPTION: I want to groupby version and set the values in the column X with branch overall to the sum of the values in the column X for the remaining branches (having the same version), weighted by the values in the column N.
Tag: aggregate
How to split in train and test by month
I have a dataframe structured like this I have data for all days and months from 2018 to 2021, with around 50k observations How can I aggregate all the data for the same month and perform a Train-Test splitting for each month? I.e. for all the data of the months of January, February, March and so on. Answer try this:
Pandas: using groupby to calculate a ratio by specific values
Hi I have a dataframe that looks like this: and I want to calculate a ratio in the column ‘count_number’, based on the values in the column ‘tone’ by this formula: [‘blue’+’grey’]/’red’ per each unite combination of ‘participant_id’, ‘session’, ‘block’ – here is part of my dataset as text, the left column ‘RATIO’ is my expected output: participant_id session block
Can we use iterables in pandas groupby agg function?
I have a pandas groupby function. I have another input in the form of dict which has {column:aggfunc} structure as shown below: I want to use this dict to apply aggregate function as follows: Is there some way I can achieve this using the input dict d (may be by using dict comprehensions)? Answer If dictionary contains columns name and
Pandas – What datatype should a duration column (mm:ss) be to use aggregates on it?
I’m doing some NBA analysis and have a “Minutes Played” column for players in a mm:ss format. What dtype should this column be to perform aggregate functions (mean, min, max, etc…) on it? The df has over 20,000 rows, so here is a sample of the column in question: I ran this code to change the format to datetime –
how to find $avg and $sum for fields which contain NaN value in mongodb?
I can find and limit columns which contain NaN value before using $group clause in mongodb when I use mongo cli or JavaScript. However, when I use python and its major library “pymongo” it seems not to be able to do the same. As following code NaN is not part of python syntax. Whereas it is easy and straight forward
Aggregate data with two conditions
I have a data frame that looks something like this: What I would like to do is aggregate the data if the dates are the same – but only if the name is different. So the above data frame should actually become: Currently I am almost doing it with: However, this will also aggregate the ones where the name is
How to sort aggregated numpy array?
My first post on stackoverflow + am very new to programming. Apologies in advance for any poor formatting and missing information. :) I aggregated two columns in a csv file (one column of seller names, the other of transactional amounts) to find how much each seller has made in total: I want to sort it in descending order to find
Pandas: groupby followed by aggregate – unexpected behaviour when joining strings
Having a pandas data frame containing two columns of type str: which is created as follows: df = pd.DataFrame({“group”:[1,2,2,1],”sc”:[“A”,”B”,”C”,”D”],”wc”:[“word1”, “word2”, “word3″,”word4”]}) When grouping by group and joining the individual columns, I can use: However, when specifying a single column (wc) to perform this operation on: which appears to be a join on the column names. But why is it handled
PySpark Dataframe melt columns into rows
As the subject describes, I have a PySpark Dataframe that I need to melt three columns into rows. Each column essentially represents a single fact in a category. The ultimate goal is to aggregate the data into a single total per category. There are tens of millions of rows in this dataframe, so I need a way to do the