Tag: dataframe

Select all rows of a dataframe where exactly M columns in any order satisfy a condition based on N columns

I want to select all the rows of a dataset where exactly M columns satisfy a condition based on N columns (where N >= M). Consider the following dataset The code below selects conditions where at least one (or more) of the columns (y0, y1, y2, y3) are True. However, I want to select rows where exactly 2 (analogously 1

Grouping all the rows with close timestamps in pandas dataframe

cluster-analysis dataframe pandas python

I have a df that looks like this, it contains frequencies recorded at some specific time and place. I want to group all the rows which are just 2 seconds apart (like there are 3 rows index 5-7 which have a time difference of just 2 seconds). Similarly, index 8-10 also have the same difference and I want to place

Dataframe – for each row, compare values of two columns, get value of third column on match

dataframe pandas python

I have a pandas dataframe in Python that contains a list of different stocks by ticker symbol, and for each one, it also records current price and a low and high price alert threshold value. Below shows a sample of the dataframe: TICKER CURRENT PRICE($) ALERT PRICE HIGH ($) ALERT PRICE LOW ($) AMZN 114 180 105 APPL 140 110

Column as a sum of its cumulative value only if other column meets condition

cumulative-sum dataframe numpy python

I am struggling to produce below df[‘res’] without loop / within pandas. Loop implementation of df[‘res’] In pandas, it could be something like The issue is that df[‘res’] is previously empty. Any hint how to think about these decompositions? Answer As per your requirement, the value for temp will be reset as soon as we reach a 0 in a.

How to calculate cumulative subtraction with a threshold and reset the subtraction after threshold within groups in pandas dataframe in python?

dataframe pandas python

This is a dataframe, with 4 columns. The primary dataframe contains two columns, trip and timestamps, and I calculated ‘TimeDistance’ which is the difference between rows of timestamps, and ‘cum’ which is the cumulative sum over TimeDistance column. in order to reach my goal, but I could not. this is the output: This output is not my desired output, I

Python Dataframe For loop (if i.contains)

dataframe for-loop if-statement pandas python

I need a for loop on a column specific. With this for loop, I will assign the value ‘Normal’ to a list if the column contains ‘Themes’. But I don’t know how to write it here. I would be glad if you help. Thanks in advance :) Dataset For loops that I try Answer You can use np.where which is

how to convert decimal to integer without rounding the decimal in python?

dataframe pandas python

i am having one DataFrame as shown below,i want to convert the decimal number to integer without rounding. Desired output: Answer Option 1 One way to do that is to cast to the type int as follows Option 2 One can also do it with .applymap() as follows Or using custom lambda functions Option 3 Another way is using .apply()

How to assign values to multiple columns using conditions for values from other multiple columns?

dataframe numpy pandas python

Dataset is something like this (there will be duplicate rows in the original): Code: Output should be this: Code: ‘series1′ column values starts row by row as 0, 1, 2, and so on but resets to 0 when: ’email_ID’ column value changes. ‘screen’ column value == ‘rewardapp.PaymentFinalConfirmationActivity’ ‘series2’ column values starts with 0 and increments by 1 whenever ‘series1’ resets.

Sum of the column values if the rows meet the conditions

dataframe if-statement lambda loops python

I am trying to calculate the sum of sales for stores in the same neighborhood based on their geographic coordinates. I have sample data: ID SALE X Y X_MIN Y_MIN X_MAX Y_MAX 1 100 23 44 22 43 24 45 2 120 22 45 21 44 23 46 3 110 21 41 20 40 22 42 4 95 24 46

Python: Calculate max profit by day and after current timestamp

dataframe pandas python

I have a dataframe like this: The df contains a price value for every minute of the day starting from 9:30 through 16:00. I am looking to get the maximum possible profit for each minute of the day. I am currently doing something like this: This gives me the percentage of the profit for each row to the highest value