Skip to content

Tag: statistics

Pandas sum of count per percentile of rows

Here is a link to a working example on Google Colaboratory. I have a dataset that represents the reviews (between 0.0 to 10.0) that users have left on various books. It looks like this: The first rows have 1 review while the last ones have thousands. I want to see the distribution of the reviews across the user population. I

how can I find a date with incorrect Syntax and fix it

I am new to python. I have a dataset I converted it to dataframe. all my dates are objects now. I need to convert them into dates in order to find the age of patients. My dimensions are 3400×14 long. there are date values inside which have incorrect syntax. I cannot find them. is there a way to find them?

Simulating expectation of continuous random variable

Currently I want to generate some samples to get expectation & variance of it. Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise} I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here But here is what I have when simulating using python: What am I doing

Processing multiple modes in pandas

I’m obviously dealing with slightly more complex and realistic data, but to showcase my trouble, let’s assume we have these data: I want to find modal values of purchases by date: agg_mode will show that for user_id 100 we have two modal values: [cookies, jam]. This is totally fine with me, when it comes to real data we’ve come up