Tag: pandas

Adding an increment to duplicates within a python dataframe

I’m looking to concatenate two columns in data frame and, where there are duplicates, append an integer number at the end. The wrinkle here is that I will keep receiving feeds of data and the increment needs to be aware of historical values that were generated and not reuse them. I’ve been trying …

Pandas union with parent ids in the same dataframe

pandas python

I have a pandas dataframe that looks like this: I want to return the folder structure for each line in a separate colunm, for example line 1: A is the root there is no parent all ids are 0 = A line 2: B is under A, id = 1, so the path is A/B line 3: C is under

Iterate through a dictionary and update dataframe values

dataframe pandas python

i have a dictionary and a df column contains the country code “BHR”,”SAU”,”ARE”..etc how to update this column so if it find any of the dict keys it will create new column [“TIMEZONE”] row to the dict value. also add if statement that if the row is not equal to …

Python/Pandas time series correlation on values vs differences

correlation numpy pandas python statistics

I am familiar with Pandas Series corr function to compute the correlation between two Series, so example: This willl compute the correlation in the VALUES of the two series, but if I’m working with a Time Series, I might want to compute teh correlation on changes (absolute changes or percentage changes …

Create Pandas DF by searching for multiple record values across multiple columns

dataframe pandas python

I am trying to create a new dataframe that can pull rows based on multiple terms across multiple columns. I have a huge excel file (65k row) I am pulling into a df so that I can pull out new priority reports. So as an example, this is what I am using to search for multiple terms across 1 column

Converting a named aggregate prior to pandas/python3

pandas python

For the below: What would be the proper way to do this before named aggregates were introduced, including the aliasing of columns? Answer As mentioned by Henry Yik, use .agg() followed by .rename(). For example:

Python dataframe vectorizing for loop

pandas python vectorization

I would like to vectorize this piece of python code with for loop conditioned on current state for speed and efficiency. values for df_B are computed based on current-state (state) AND corresponding df_A value. Any ideas would be appreciated. Answer This seems overkill. Your state variable basically is the pr…

pandas countif negative using where()

dataframe pandas python

Below is the code and output, what I’m trying to get is shown in the “exp” column, as you can see the “countif” column just counts 5 columns, but I want it to only count negative values. So for example: index 0, df1[0] should equal 2 What am I doing wrong? Python Output Answer Fi…

Sort categorical x-axis in a seaborn scatter plot

matplotlib pandas python scatter-plot seaborn

I am trying to plot the top 30 percent values in a data frame using a seaborn scatter plot as shown below. The reproducible code for the same plot: Here, I want sort the x-axis in order = [‘virginica’,’setosa’,’versicolor’]. When I tried to use order as one of the parameter…