Tag: dataframe

how to check if a number exists between two columns of pandas dataframe & replace a value

I have a data frame and integer like this: I want to check if the given number(17) is in between the min & max column of any row. If the number is between min & max columns, then the max column value in that row should be replaced by that integer. In the example, the integer 17 exists between 13

Join to dataframes based on index where the second dataframe has repeated indexes related to the first dataframe

data-science dataframe numpy pandas python

I have two data frames where first dataframe has index starting from zero. The second dataframe has repeated indexes starting from zero. I want to to join the two dataframes based on their indexes. First dataframe is like this The second dataframe is I want to join these two dataframes based on index i.e the new dataframe should look like

Divide dataframe into list of rows containing all columns

data-science dataframe pandas python rows

From dataframe sructured like this I need to get list like this: Answer It looks like you want: example: output: other options

Pandas: str.extract() giving unexpected NaN

dataframe numpy pandas python

I have a data set which has a column that looks like this I need only the numbers. Here’s my code: I was expecting an output like: but I got Just to test, I dumped the dataframe to a .csv and read it back with pd.read_csv(). That gave me just the numbers, as I need (though of course that’s not

How to compare each value of column B with the value of column A?

dataframe pandas python

Compare each value in B column with the first value in A column until it is greater than it, then set the expected column to true. Then compare the value of A column with the expected column that is true until B column value is greater than it,then set the expected column to true. Input: Expected Output Answer You must

Add additional timestamp to Pandas DataFrame items based on item timestamp/index

dataframe pandas python

I have a large time-indexed Pandas DataFrame with time-series data of a couple of devices. The structure of this DataFrame (in code below self._combined_data_frame) looks like this: The DateTimeIndex and device_name are filled for every row, the other columns contain nan values. Sample data is available on Google Drive: Sample data set Then there is a list with reference timestamps

Creating another column in pandas based on a pre-existing column

data-cleaning dataframe pandas python

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a ‘user/’ prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).

Update column based on grouped date values

dataframe numpy pandas python

Edited/reposted with correct sample output. I have a dataframe that looks like the following: This dataframe is split into groups by ID. I would like to make an updated combined column based on if df[‘bool’] == True, but only if df[‘bool’] == True AND there is another ‘finished’ row in the same group with a LATER (not the same) year.

In python pandas, How do you eliminate rows of data that fail to meet a condition of grouped data?

dataframe pandas python

I have a data set that contains hourly data of marketing campaigns. There are several campaigns and not all of them are active during the 24 hours of the day. My goal is to eliminate all rows of active hour campaigns where I don’t have the 24 data rows of a single day. The raw data contains a lot of