Tag: pandas

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN: …i.e. something like df.drop(….) to get this resulting dataframe: How do I do that? Answer Don’t drop, just take the rows where EPS is not NA:

T-test in Pandas

hypothesis-test pandas python scipy statistics

If I want to calculate the mean of two categories in Pandas, I can do it like this: I have a lot of data formatted this way, and now I need to do a T-test to see if the mean of cat1 and cat2 are statistically different. How can I do that? Answer it depends what sort of t-test you

pandas pivot dataframe to 3d data

pandas python

There seem to be a lot of possibilities to pivot flat table data into a 3d array but I’m somehow not finding one that works: Suppose I have some data with columns=[‘name’, ‘type’, ‘date’, ‘value’]. When I try to pivot via I get Am I reading docs from dev p…

How to add a new column to an existing DataFrame?

chained-assignment dataframe pandas python

I have the following indexed DataFrame with named columns and rows not- continuous numbers: I would like to add a new column, ‘e’, to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame). How can I add colum…

Remove duplicates by columns A, keeping the row with the highest value in column B

duplicates pandas python

I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B. So this: Should turn into this: I’m guessing there’s probably an easy way to do this—maybe as easy as sorting the DataFrame before dropping duplicates—but I don’t…

Pandas: create two new columns in a dataframe with values calculated from a pre-existing column

pandas python

I am working with the pandas library and I want to add two new columns to a dataframe df with n columns (n > 0). These new columns result from the application of a function to one of the columns in the dataframe. The function to apply is like: One method for creating a new column for a function returning

pandas: filter rows of DataFrame with operator chaining

dataframe pandas python

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I’ve found to filter rows is via normal bracket indexing This is unappealing as it requires I assign df to a variable before being able to filter on its values. Is there something more…

Create a Pandas Dataframe by appending one row at a time

append dataframe pandas python

How do I create an empty DataFrame, then add rows, one by one? I created an empty DataFrame: Then I can add a new row at the end and fill a single field with: It works for only one field at a time. What is a better way to add new row to df? Answer You can use df.loc[i], where

How to add hovering annotations to a plot

matplotlib mplcursors pandas python seaborn

I am using matplotlib to make scatter plots. Each point on the scatter plot is associated with a named object. I would like to be able to see the name of an object when I hover my cursor over the point on the scatter plot associated with that object. In particular, it would be nice to be able to quickly