Let’s assume that we have N (N=212 in this case) number of datapoints for both variables A and B. I have to sample n (n=50 in this case) number of data points for A and B such that A and B should have the highest possible positive correlation coefficient or lowest correlation coefficient (close to zero) for that sample set.

# Tag: statistics

## Show Cancer Specific Survival at exact time (Kaplan Meier in Lifelines)

shows me Cancer Specific Survival (CSS) of my cohort at different times (0, 4, 6…128 month). How can CSS be shown at exactly 120 month? Answer The survival_function_at_times() method will get you that value. Here is an example with a sample dataset:

## how can I find a date with incorrect Syntax and fix it

I am new to python. I have a dataset I converted it to dataframe. all my dates are objects now. I need to convert them into dates in order to find the age of patients. My dimensions are 3400×14 long. there are date values inside which have incorrect syntax. I cannot find them. is there a way to find them?

## How can I find the mode (a number) of a kde histogram in python

I want to determine the X value that has the highest pick in the histogram. The code to print the histogram: Histogram and value wanted (in fact, I would like all 4): Answer You will need to retrieve the underlying x and y data for your lines using matplotlib methods. If you are using displot, as in your excerpt, then

## Simulating expectation of continuous random variable

Currently I want to generate some samples to get expectation & variance of it. Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise} I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable But here is what I have when simulating using python: What am I doing

## Processing multiple modes in pandas

I’m obviously dealing with slightly more complex and realistic data, but to showcase my trouble, let’s assume we have these data: I want to find modal values of purchases by date: agg_mode will show that for user_id 100 we have two modal values: [cookies, jam]. This is totally fine with me, when it comes to real data we’ve come up

## How do I distribute a value between numbers in a list

I am creating a bias dice rolling simulator I want the user to: 1.Input the number they would like to change the prob of 2.Input the prob (in decimal form) Then I would like my program to distribute the remainder between the other values, this is my first post – let me know if any other info is needed My

## How to generate random values for a predefined function?

I have a predefined function, for example this: How can I generate random values against it so I can plot the results of the function using matplotlib? Answer If you want to plot, don’t use random x values but rather a range. Also you should use numpy.exp that can take a vector as input and your y in the lambda

## Two parameter non-linear function for modeling a 3-D surface

I’m interested in modeling this surface with a simple equation that takes in two parameters (x,y) values and produces a z value. Ideally an equation that has a simple form. I have tried Monkey Saddle, …

## how to compare two columns and get the mean value of the the 3rd column for all matching items in the two in python pandas dataframe?

I have the following table named Rides : start_id end_id eta A B 5 B C 4 A C 6 A B 5 B A 3 C A 3 B C 6 C A 5 A B 8 From the Rides Table , I want to Create a new table which should look like …