Consider the below dataframe: Assuming my index is unique, I’m looking to retrieve the unique values per index row, to an output like the one below. I wish to keep the empty rows. I have a working, albeit slow, solution, see below. The output number order is not relevant, as long all values are presente…
Tag: dataframe
How to add prefix to the selected records in python pandas df
I have df where some of the records in the column contains prefix and some of them not. I would like to update records without prefix. Unfortunately, my script adds desired prefix to each record in df: How can I ommit records with the prefix? I’ve tried with df[df[‘ids’].str.contains(prefix)…
How can I select top k rows based on another dataframe in python?
I have data as follows. Users are 1001 to 1004 (but actual data has one million users). Each user has corresponding probabilities for the variables AT1 to AT6. I would like to select the top 3 users for each choice based on the following data. In the output, top1 to top3 are the top 3 users based on probabili…
Accessing pandas cell value using df.itertuples() and column name gives AttributeError
I have the following dataframe from where I want to retrieve the cell values using index and column names. The left column indicates the index values whereas the column names are from 1 to 5. This is a dummy dataframe which looks small but going forward I will be using this code to access a dataframe with 100…
Pandas: Unable to merge on two date columns
I have two dataframes that look like: df1: df2: Both date columns have been made using the pd.to_datetime() method, and they both supposedly have <M8[ns] data types when using df1.Date.dtype and df2.Date.dtype. However when trying to merge the dataframes with pd.merge(df,hpi,how=”left”,on=̶…
Check if value from one dataframe exists in another dataframe and create column
I am looking to compare values of columns in two different datasets and create a column with the results that have matched. DF1: DF2 = Expected Result DF1: I can see the matches using the following code: And i can return a 1/0 with the following: However, i cannot seem to merge them both Answer Output:
Pandas look for substring then write in another
So I’m trying to look down a specific column of my csv file for a partial string. If that meets a certain condition, it’ll write something else in a different column. For example: The “Percentage” column will always have the same format of “Ninety Five Percent” that is numb…
Create New Columns Using Multiple Conditions And Time Difference
I have the following dataframe with a tricky problem: I have to make 4 columns (0-90 days, 91-180 days, 181-270 days, 271-360 days) based on the following conditions: Desired output: What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks! Answer You can write a custom functio…
: aggregate() missing 1 required positional argument: ‘func_or_funcs’
I try to aggregate per ptid based on the diag_date, and calculate max, min and visit counts based on the diag_date: However, when I do the above (following all rules for agg) does not seem to work as I get the following error: Any ideas, are greatly appreciated! Answer To answer my questions, after getting va…
Replace Values of Multiple Columns in Pandas Dataframe More Efficiently
I have a DataFrame, df, where I would like to replace several values user1 user2 user3 apple yoo apple mango ram mango Instead of doing to get the final DataFrame of user1 user2 user3 0 2 0 1 3 1 Is there any way I make the code above more efficient such that I can change the values of apple,