Tag: pandas

Testing string membership using (in) keyword in python is very slow

I have the following text dataset: 4 million paragraphs of length between (10-60 words each). Also I have a set of 30,000 unique sentences: I want to check if ANY of the sentences in the set are in those 4 million paragraphs. If any of those 30,000 sentences are in one of those paragraphs I want to keep that …

Subplotting of Pandas.DataFrameGroupBy[group_name] does not yield expected results

dataframe matplotlib pandas python

This is a re-opening of my initial question with the same title which was closed as duplicate. As None of the suggested duplicates helped me to solve my problem, I post this question again. I have a DataFrame with time series related to some devices which come from a hdf-file: This produces the following outp…

How to use the value in a variable as name to create a panda data frame?

dataframe pandas python

In [182]: colname Out[182]: ‘col1’ In [183]: x= ‘df_’ + colname In [184]: x Out[184]: ‘df_col1’ May I know how to create a new pandas data frame with x, such that the new data frame’s name would be df_col1 Answer You can use the locals() function as given below,

Python: for loop that drops a column to meet condition

pandas python

I have a dataframe that looks as follows: Beta is calculated as ((sum of each row)^2)/10. I want to keep dropping columns until Beta is less than or equal to 1 for all rows. So far I have How can I stop the loop when all values of beta are below or equal to 1? Answer First of all, if

How to fit a power law to the dataframe and plot it?

matplotlib pandas python scipy

I have two columns(rcs,range) in a dataframe. rcs range -40 12.9 -35 14.9 -30 22.9 -25 35.44 -20 43.48 -15 62.4 -10 92.4 -5 132.99 0 182.6 5 252.99 I want to plot a curve with equation rcs = range^4 I tried the following 1.as a polynomial curve fitting in the above plot,the curve is not a smooth curve and

Add column with a specific sequence of numbers depending on value

dataframe pandas python running-count

I have this dataframe: I want to add a new column Sequence with a sequence of numbers. The condition is when the first True appears in the Condition column, the following rows must contain the sequence 1, 2, 3, 1, 2, 3… until another True appears again, at which point the sequence is restarted again. Fu…

Pandas array filter NaN and keep the first value in group

pandas python

I have the following pandas dataframe. There are many NaN but there are lots of NaN value (I skipped the NaN value to make it look shorter). I would like to filter all the NaN value and also only keep the first value out of the NaN (e.g. from index 27-29 there are three values, I would like to keep

Days between dates into minimum non-date measurement

pandas python

I have a column that represents the number of days from an event until today. I am trying to figure out a way to represent this as a string such that it shows the rounded number of days / weeks / months / years. However, I would like it to choose “D”/”W”/”M”/”Y”…

Updating values within python column based on date

numpy pandas python

I have a dataset where I would like to replace and update values within a column when a data condition is met. Data Desired Doing Still researching, any suggestion is appreciated- Perhaps I need to convert quarters to datetime longdate and base the condition off of this column. Answer here is one way to do it…