Tag: pandas

Compare two pandas series and remove duplicates

I have two series: I want to compare ser1 and ser2 and then remove the duplicates and put the result into ser1 to have something like this: I tried pd.concat but this gave me the combination of the two series without removing the duplicates. Answer

How to group by a dataframe and concatenate by a column

dataframe pandas python

I have a df with the following structure: Store Sku Value 1 A 20 2 A 20 1 B 10 2 B 25 And I have to transform it, so that the stores with the same sku and same value, end up concatenated in a cell with a “-” separator like the following. Store Sku Value 1 – 2 A

Compare two dataframe columns on a histogram

matplotlib pandas python

I have a dataframe that looks similar to: I am required to give a visual comparison of true and estimated distances. My actual df shape is: How do I show true_distance side-by-side estimated_distance on a plot, where one can easily see the difference in each row, considering the side of my df_actual? Answer H…

Determinate Consecutive Values (Invoices) Pandas

invoice pandas python supplier

I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0. For example: And what I want is a third column like this: EDIT Thanks for your answers, this options works great, but when I tried it in a real database I realized

Select rows from a pandas dataframe using a set of values

pandas python

I have a dataframe with a column named label_id which is a string value. I also have a set of label_id values in required_labels. I would like to select the rows of the dataframe where the label_id value is contained in the set. I understand that I need to use df.loc for this, but when I try to generate a

AttributeError: ‘numpy.ndarray’ object has no attribute

numpy pandas python scikit-learn

I am applying selectKbest feature selection technique but it is giving me the following error: here is the portion of my code: (Note: the original data is in CSV format) Answer X is a numpy array, and you can only call the .columns method on a dataframe. You need to convert to a dataframe first, then call the…

Pandas Dataframe – Sum values for a specific date then divide by the count of that date

pandas python

I have a pandas dataframe with several dates, and several values for each date, I’m trying to sum the values of each date then divide by the number of records for that same date. Example: date value 2022-09-16 1 2022-09-16 2 2022-09-16 3 2022-09-15 6 2022-09-15 2 2022-09-15 2 2022-09-14 7 The expected r…

skip na while removing consecutive same numbers in a column in python

dataframe pandas python

I have to remove the consecutive same number from the column in a dataframe. I am able to remove the number one time, but when I try to do it for the second time the loop does not work as there is one na value in between. dataframe is I dont know why the last line of code i.e. elif

Python pandas dataframe with daily data – keep first and last rows per month

dataframe pandas python

I have a Python pandas dataframe that looks like this: I want to keep the first and the last row per month. How can I do that? I tried using the following code: but I don’t get the results I want. Answer pandas groupby operations don’t sort each group prior to aggregation, which is why ‘firs…

Append new level to DataFrame column

dataframe pandas python

Given a DataFrame, how can I add a new level to the columns based on an iterable given by the user? In other words, how do I append a new level? The question How to simply add a column level to a pandas dataframe shows how to add a new level given a single value, so it doesn’t cover this