I am running a python code that produces some figures with Matplotlib and Pandas. After a few runs of the code, I am getting to following error: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed a…
Tag: pandas
Pandas: Check each row for condition and insert row below if condition is met
this is my first question here as I really couldn’t figure it out with related answers: I have a list of dataframes “df_list”, for each user I have a dataframe which basically looks like: Data: I would like to go through all the dataframes in my df_list and inside each df I would like to add…
pyspark create all possible combinations of column values of a dataframe
I want to get all the possible combinations of size 2 of a column in pyspark dataframe. My pyspark dataframe looks like One way would be to collect the values and get them into python iterable (list, pandas df) and use itertools.combinations to generate all combinations. However, I want to avoid collecting th…
calculate sum of squares with rows above
I have a dataset that looks like this: I want to iterate through each row and calculate a sum of squares value for each row above (only if the Type matches). I want to put this value in the X.sq column. So for example, in the first row, there’s nothing above. So only (-1.975767 x -1.975767). In the seco…
Why is pd.to_datetime() only changing type if utc is True?
After loading my csv file into my notebook in VS Code i wanted to change the columns type from object to datetime for some columns. So i did the following: object values of columns These are example values in the columns. check convert check After converting the columns i wanted to check if everything worked …
Missing categorical data should be encoded with an all-zero one-hot vector
I am working on a machine learning project with very sparsely labeled data. There are several categorical features, resulting in roughly one hundred different classes between the features. For example: After I put these through scikit’s OneHotEncoder I am expecting the missing data to be encoded as 00, …
Creating Dataframes for different clusters
I have a dataset Using this dataset, I clustered the dataset based on the number of times “System” is repeated for a particular “Name”. In the above example, Names A, B and D have one “AZ” “Subset” while C, E have two “AY” subsets and F has two AZ so…
CET timezone strings to datetime
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes I can see that the date_cet column is object. Since I need it further on, I want to convert it to a datetime object. However, the stuff I’ve tried just
Return columns that are binary in ndarray?
So for the same thing does for a panda dataframe. I want to know how to do this when it is a ndarray? 1 40 0 0 0 0 0 0 2 58 0 0 1 0 0 0 3 41 0 1 1 0 0 1 4 45 0 0 1 1 0 1 5 60 0 1 0
Pandas combine rows in groups to get rid of Nans
I want to do something similar to what pd.combine_first() does, but as a row-wise operation performed on a shared index. And to also add a new column in place of the old ones – while keeping the original_values of shared column names. In this case the ‘ts’ column is one that I want to replac…