pandas groupby column to list and keep certain values

I have the following dataframe: id occupations 111 teacher 111 student 222 analyst 333 cook 111 driver 444 lawyer I create a new column with a list of the all the …

pandas groupby dataframes, calculate diffs between consecutive rows

Using pandas, I open some csv files in a loop and set the index to the cycleID column, except the cycleID column is not unique. See below: for filename in all_files: abfdata = pd.read_csv(filename,…

How to groupby 2 columns but order descending by count()

i have a dataframe and want to group 2 columns, which is working fine. df.groupby([“Sektor, CustomerID”]).count().head(10) _Order_ID_ Order_timezone Order_weight …

Simple calculation on table. Please help me to make my code more effective

Please help me to make my code more effective. This is my df: df = pd.DataFrame([[‘A’, 80], [‘A’, 64], [‘A’, 55], [‘B’, 56], [‘B’, 89], [‘B’, 73], [‘C’, 78], [‘C’, 100], [‘C’, 150], [‘C’, 76], [‘C’, …

pandas group by and fill in the missing time interval sequence

I have a data frame like as shown below df = pd.DataFrame({‘person_id’: [11,11,11,21,21,21,31,31,31,31,31], ‘time’ :[-1,5,17,11,25,39,46,4,100,150,1], ‘value’:[…

How to set value of first several rows in a Pandas Dataframe for each Group

I am a noob to groupby methods in Pandas and can’t seem to get my head wrapped around it. I have data with ~2M records and my current code will take 4 days to execute – due to the inefficient use of ‘…

Pandas Grouping by Hostname. Average of Sessions(on host) by Hour

The dataframe looks like this. datetime hostname sessions 0 2020-10-27 00:00:05 server001 22 1 2020-10-27 00:00:10 server001 25 2 2020-10-27 00:00:15 server001 …

How to get unique counts based on a different column, with pandas groupby

I have the following dataframe: df = pd.DataFrame({ ‘user’: [‘user122’, ‘user122’, ‘user124’, ‘user125’, ‘user125’, ‘user126’, ‘user126’], ‘effortduration’ : [‘2 weeks’, np.nan, ‘2 weeks’, ‘3 …

How to extract elements from a filename and move them to different columns?

I have a filenames which I converted into a list. The list has the following elements: My goal is to extract elements from this list and fill out a dataframe, which should look like this: LINK TO THE GOOGLE SHEETS CONTAINING THE IMAGE ABOVE: https://docs.google.com/spreadsheets/d/1kuX3M4RFCNWtNoE7Hm1ejxWMwF-Cs4p8SsjA3JzdidA/edit?usp=sharing WHAT I’VE DONE SO FAR is the following code: But, this one does not leave empty spaces thus not doing what I needed. Thank you very much in advanced. Answer As per number of the comments. It’s a pain because the tokens in the filename are not fully fixed format. Quite a lot of conditional

How to use pandas to create a column that stores count of first occurrences on a group-by?

Q1. Given data frame 1, I am trying to get group-by unique new occurrences & another column that gives me existing ID count per month Expected output for unique newly added group-by ID values & for existing sum of ID values Note: Mar-2020 ID_Count is ZERO because ID 1, 2, and 3 were present in previous months. Note: Existing count is 0 for Jan-2020 because there were zero IDs before Jan. The existing count for Feb-2020 is 1 because before Feb there was only 1. Mar-2020 has 3 existing counts as it adds Jan + Feb and so on Answer