I have the following dataframe: And I want value NaN to be filled with the conditional mean of previous and next value based on the same column. Just like this, value 6 is the mean with 5 and 7. And this is a little part of my dataframe, so I need to replace all the NaN. Answer EDIT: For replace
Tag: dataframe
Concat string in column values where it is missing in Python
I have a dataframe I want to append string chr in column CHROM where it’s missing. I can do it in R with grepl and paste, but wanted to try in Python. I came up with these two commands, but not sure how to index the column because pd.Series is generating NaNs. Answer String operations in pandas are not optimized,
Pandas Join Two Dataframes According to Range and Date
I have two dataframes like this: I want to bring the RATE values to the second df in accordance with the DATE. Also, the AMOUNT and DAY values in the relevant DATE must be within the appropriate range (MAX_AMOUNT & MIN_AMOUNT, MAX_DAY & MIN_DAY). Desired output like this: Could you please help me about this? Answer Use merge first with
replace whitespace with comma in multiline string (doc string), but keeping end-of-line
I have a multiline string (and not a text file) like this: The column white spaces are unequal. I want to replace the whitespace with a comma, but keep the end-of-line. So the result would look like this: …or alternatively as a pandas dataframe. what i have tried I can use replace(”) with different spaces, but need to count the
How to check if a row of a Pandas dataframe has a cell with a specific value and if it does modify the last cell?
I have a dataframe df: name age_5_9 age_10_14 age_15_19 Alice no bones broken no bones broken broke 1 bone Bob no bones broken broke 2 bones no bones broken Charles no bones broken no bones broken no bones broken I would like to create a column broke_a_bone that is 1 when any of the rows has a value ‘broke 1
Vectorization assign the newest value based on datetime
I have two dataframe. The first dataframe have only one column: email, the first dataframe is a complete list of email. The second dataframe is a dataframe with three column: email, subscribe_or_unsubscribe, date. The second dataframe is a history of user subcribing or unsubscribing from the email system. The second dataframe is sorted by date with oldest date at index
efficient way to find the most recent entry in another dataframe for each entry of a dataframe indexed by datetime in pandas
I have two dataframes, and both of them are indexed by datetime. for example, the dataframe 1 is something below: and the dataframe 2 looks like: For each entry in dataframe 1, I want to find the most recent one entry in dataframe 2, and create a new column in dataframe 1 to setup the relationship between the two dataframes.
Pandas convert dummies to a new column
I have a dataframe that discretize the customers into different Q’s, which looks like: What I want to do is adding a new column, Q, to the dataframe which shows which sector this customer is in, so it looks like: The only way I can think about is using for loop but it will give me a mess. Any other
Merge Dataframe rows based on the date
I have a dataframe that looks like this, It has the name of the company, the date and the title of a headline that was published regarding that company on that day. There are multiple headlines published on that single day and every single one of those headlines take up a different row even for the same date. What I
Groupby several columns, summing them up based on the presence of a sub-string
Context: I’m trying to sum all values based in a list only if they start with or contain a string So with a config file like this: And a dataframe like this: How can I group by if they all start by a given substring present on the granularity_suffix_list? Desired output: Attempts: I was trying this: But It doesn’t work.