Skip to content
Advertisement

Tag: group-by

groupby with diff function

I have a groupby with a diff function, however I want to add an extra mean column for heart rate, how can I do this the best way? this is the code where should I add in the piece of code to calculate the average heart rate? output will be the amount of seconds in high power zone and then

Creating time delta diff column based on groupby id

I have the following sample df I want to groupby Id, and get the timedelta difference between the timestamps, i manage to get something similar to the wanted series. Through this code. Although, it is taking quite a long time, is there a way to do it more efficiently? Wanted series Answer here is one way about it btw, if

expand row based on integer in column and split into number of months between dates

I have the following dataframe: id date_start date_end reporting_month reporting_month_number months_length 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6 I would like to split each id row so I can have a row for each of the months_length, starting on the date of reporting_month,

Apply function to each unique value of column seperately

I have a dataframe with more than 500 cities which look like this city value datetime london 23 2022-03-25 17:59:18 dubai 12 2022-03-25 17:59:36 berlin 5 2022-03-25 17:59:42 london 25 2022-03-25 18:01:18 dubai 12 2022-03-25 18:02:18 berlin 5 2022-03-25 18:03:18 I have a function called rolling_mean which creates a new column ‘rolling_mean’ which calculates the last hour rolling average. However

Divide into groups according to the specified attribute

I need to group the data in such a way that if the difference between the adjacent values from column a1 was equal to the same pre-specified value, then they belong to the same group. If the value between two adjacent elements is different, then all subsequent data belong to a different group. For example, I have such a data

Taking the recency time of client purchase with PySpark

The sample of the dataset I am working on: I’d like to take the customer’s most recent purchase (the customer’s date.max()) and the previous maximum purchase (the penultimate purchase) and take the difference between the two (I’m assuming the same product on all purchases). I still haven’t found something in pyspark that does this. One example of my idea was

Advertisement