I did export the following dataframe in Google Colab. Whichever method I used, when I import it later, my dataframe appears as pandas.core.series.Series, not as an array. After importing the dataframe looks like below Note: The first image and second image can be different order in terms of numbers (It can be…
Tag: pandas
select non-NaN rows with multiple conditions from a pandas dataframe
Assume there is a dataframe such as I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks Answer Because of the operator precedence (bitwise operators, e…
Merging segments from the same trips into a single trip for analysis
In the MWE below, I show my attempt to line-plot trips (from my df aggregated per month): I realised in my df, some trips contains jump (maybe due to data log), so they should be merged into single trip before aggregation. In the given df example above (before grouping). User 154 does undertake 2-trips, not 3…
Manipulating DataFrame
I have the following dataframe df where there are 3 columns: Date, value and topic. I want to create a new dataframe df1 where the topic is the column and is indexed by day, and each topic has its own value per day. My problem is that I don’t know how to match the value to the topic per day.
Python looping over a list to check if any of the list elements are equal to variable values in pandas dataframe
I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list. How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not? I
How to remove features from regression results using bonferroni correction results?
I implemented a regression model using After fitting a regression model, I ran a bonferroni correction using And I get the following result: I want to use these arrays to remove the features in model_a that are False and create a new model ‘train_simplified’. I’m using the following manual a…
Pandas apply function to each row by calculating multiple columns
I have been stacked by an easy question, and my question title might be inappropriate. I want to calculate (df.amount * df.con)/df.groupby(‘name’).agg({‘amount’:’sum’}).reset_index().loc(df.name==i).amount) (Sorry, this line will return error, but what I want is to calculat…
what is the best way to create running total columns in pandas
What is the most pandastic way to create running total columns at various levels (without iterating over the rows)? input: output: The test column can only contain X’s or NaNs. The number of consecutive X’s is random. In the ‘desired_output_level_1’ column, trying to count up the numbe…
how to drop rows with ‘nan’ in a column in a pandas dataframe?
I have a dataframe (denoted as ‘df’) where some values are missing in a column (denoted as ‘col1’). I applied a set function to find unique values in the column: I am trying to drop these ‘nan’ rows from the dataframe where I have tried this: However, the column rows remain…
adding legend to lineplot according to maplotlib’s axvspan
OK, I have this line plot of data trend over this period. Figure: But I want to add legend corresponding to each period (coloured) covereds, such that: 2021-03 to 2021-06 the green area bears the legend spring, 2021-06 to 2021-09 blue area is legend summer, and 2021-09 to 2021-12 (magenta) legend winter. Answ…