# Tag: statistics

## Pandas sum of count per percentile of rows

Here is a link to a working example on Google Colaboratory. I have a dataset that represents the reviews (between 0.0 to 10.0) that users have left on various books. It looks like this: The first rows have 1 review while the last ones have thousands. I want to see the distribution of the reviews across the user population. I

## How to sample data points for two variables that has highest (close to +1) or lowest (close to zero) correlation coefficient?

Let’s assume that we have N (N=212 in this case) number of datapoints for both variables A and B. I have to sample n (n=50 in this case) number of data points for A and B such that A and B should have the highest possible positive correlation coefficient or lowest correlation coefficient (close to zero) for that sample set.

## Show Cancer Specific Survival at exact time (Kaplan Meier in Lifelines)

shows me Cancer Specific Survival (CSS) of my cohort at different times (0, 4, 6…128 month). How can CSS be shown at exactly 120 month? Answer The survival_function_at_times() method will get you that value. Here is an example with a sample dataset:

## how can I find a date with incorrect Syntax and fix it

I am new to python. I have a dataset I converted it to dataframe. all my dates are objects now. I need to convert them into dates in order to find the age of patients. My dimensions are 3400×14 long. there are date values inside which have incorrect syntax. I cannot find them. is there a way to find them?

## How can I find the mode (a number) of a kde histogram in python

I want to determine the X value that has the highest pick in the histogram. The code to print the histogram: Histogram and value wanted (in fact, I would like all 4): Answer You will need to retrieve the underlying x and y data for your lines using matplotlib methods. If you are using displot, as in your excerpt, then

## Simulating expectation of continuous random variable

Currently I want to generate some samples to get expectation & variance of it. Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise} I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable But here is what I have when simulating using python: What am I doing

## Processing multiple modes in pandas

I’m obviously dealing with slightly more complex and realistic data, but to showcase my trouble, let’s assume we have these data: I want to find modal values of purchases by date: agg_mode will show that for user_id 100 we have two modal values: [cookies, jam]. This is totally fine with me, when it comes to real data we’ve come up

## How do I distribute a value between numbers in a list

I am creating a bias dice rolling simulator I want the user to: 1.Input the number they would like to change the prob of 2.Input the prob (in decimal form) Then I would like my program to distribute the remainder between the other values, this is my first post – let me know if any other info is needed My

## How to generate random values for a predefined function?

I have a predefined function, for example this: How can I generate random values against it so I can plot the results of the function using matplotlib? Answer If you want to plot, don’t use random x values but rather a range. Also you should use numpy.exp that can take a vector as input and your y in the lambda

## Create a for loop of wilcoxon rank sum tests in python to generate a list of p-values?

I have a dataframe that follows this format: It is much larger (it has about 1000 genes, i.e., columns). Each number corresponds to an mRNA abundance value. I need to compare AC and SCC subtypes for each gene using the Wilcoxon rank sum test. I need to do this for every gene in my dataset, so I essentially need to