Tag: pandas

Closing figures from previous sessions

I am running a python code that produces some figures with Matplotlib and Pandas. After a few runs of the code, I am getting to following error: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed a…

Pandas: Check each row for condition and insert row below if condition is met

dataframe datetime pandas python

this is my first question here as I really couldn’t figure it out with related answers: I have a list of dataframes “df_list”, for each user I have a dataframe which basically looks like: Data: I would like to go through all the dataframes in my df_list and inside each df I would like to add…

pyspark create all possible combinations of column values of a dataframe

pandas pyspark python

I want to get all the possible combinations of size 2 of a column in pyspark dataframe. My pyspark dataframe looks like One way would be to collect the values and get them into python iterable (list, pandas df) and use itertools.combinations to generate all combinations. However, I want to avoid collecting th…

calculate sum of squares with rows above

dataframe mean numpy pandas python

I have a dataset that looks like this: I want to iterate through each row and calculate a sum of squares value for each row above (only if the Type matches). I want to put this value in the X.sq column. So for example, in the first row, there’s nothing above. So only (-1.975767 x -1.975767). In the seco…

Why is pd.to_datetime() only changing type if utc is True?

jupyter jupyter-notebook pandas python

After loading my csv file into my notebook in VS Code i wanted to change the columns type from object to datetime for some columns. So i did the following: object values of columns These are example values in the columns. check convert check After converting the columns i wanted to check if everything worked …

Missing categorical data should be encoded with an all-zero one-hot vector

data-science machine-learning pandas python scikit-learn

I am working on a machine learning project with very sparsely labeled data. There are several categorical features, resulting in roughly one hundred different classes between the features. For example: After I put these through scikit’s OneHotEncoder I am expecting the missing data to be encoded as 00, …

Creating Dataframes for different clusters

dataframe pandas python

I have a dataset Using this dataset, I clustered the dataset based on the number of times “System” is repeated for a particular “Name”. In the above example, Names A, B and D have one “AZ” “Subset” while C, E have two “AY” subsets and F has two AZ so…

CET timezone strings to datetime

datetime pandas python

I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes I can see that the date_cet column is object. Since I need it further on, I want to convert it to a datetime object. However, the stuff I’ve tried just

Return columns that are binary in ndarray?

numpy pandas python

So for the same thing does for a panda dataframe. I want to know how to do this when it is a ndarray? 1 40 0 0 0 0 0 0 2 58 0 0 1 0 0 0 3 41 0 1 1 0 0 1 4 45 0 0 1 1 0 1 5 60 0 1 0

Pandas combine rows in groups to get rid of Nans

pandas python

I want to do something similar to what pd.combine_first() does, but as a row-wise operation performed on a shared index. And to also add a new column in place of the old ones – while keeping the original_values of shared column names. In this case the ‘ts’ column is one that I want to replac…