Skip to content

Tag: pandas

How to combine DataFrame columns of strings into a single column?

I have a DF with about 50 columns. 5 of them contain strings that I want to combine into a single column, separating the strings with commas but also keeping the spaces within each of the strings. Moreover, some values are missing (NaN). The last requirement would be to remove duplicates if they exist. So I h…

Average for similar looking data in a column using Pandas

I’m working on a large data with more than 60K rows. I have continuous measurement of current in a column. A code is measured for a second where the equipment measures it for 14/15/16/17 times, depending on the equipment speed and then the measurement moves to the next code and again measures for 14/15/…

selecting the row with given datetime index

There are 2 datasets I wanna use to find the evaluation score which data_pred , data_test First of all, the data_test is the data that is used to check the accuracy which looks like this the data_pred is got from ARIMA prediction which looks ​like this The reason I can’t find the MSE score between these…

How to select rightmost column with a value?

I have a DataFrame df with some country statistics for years from 2014 to 2018. Some of the countries have values for each of the years, while some countries are missing some. The DataFrame looks like this: I want to keep only the most recent data value, so for the DataFrame above, the result should be: Answe…

Removing min, max and calculating average

I have columns of numbers and I would need to remove only one min. and one max. and then calculate the average of the numbers that remain. The hitch is that the min/max could be anywhere in the column and some rows may be blank (null) or have a zero, or the column might have only 3 values. All numbers

How to flag an outlier(s) /anomaly in selected columns in python?

In the dataset df below. I want to flag the anomalies in all columns except A, B,C and L. Any value less than 1500 or greater than 400000 is regarded as an anomaly. Attempt: Result of the code: Desired output should look like this: Thanks for the effort! Answer If you set the subset as the argument of the app…