Skip to content

Tag: statistics

Constructing a co-occurrence matrix in python pandas

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring. For example a matrix df: would yield: Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code. Answe…

Boxplots in matplotlib: Markers and outliers

I have some questions about boxplots in matplotlib: Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?                        Question B How does matplotlib identify outliers? (i.e. how does it know that they are…

T-test in Pandas

If I want to calculate the mean of two categories in Pandas, I can do it like this: I have a lot of data formatted this way, and now I need to do a T-test to see if the mean of cat1 and cat2 are statistically different. How can I do that? Answer it depends what sort of t-test you