Tag: dataframe

Pandas unique values per row, variable number of columns with data

Consider the below dataframe: Assuming my index is unique, I’m looking to retrieve the unique values per index row, to an output like the one below. I wish to keep the empty rows. I have a working, albeit slow, solution, see below. The output number order is not relevant, as long all values are presente…

How to add prefix to the selected records in python pandas df

dataframe pandas python

I have df where some of the records in the column contains prefix and some of them not. I would like to update records without prefix. Unfortunately, my script adds desired prefix to each record in df: How can I ommit records with the prefix? I’ve tried with df[df[‘ids’].str.contains(prefix)…

How can I select top k rows based on another dataframe in python?

dataframe numpy pandas python

I have data as follows. Users are 1001 to 1004 (but actual data has one million users). Each user has corresponding probabilities for the variables AT1 to AT6. I would like to select the top 3 users for each choice based on the following data. In the output, top1 to top3 are the top 3 users based on probabili…

Accessing pandas cell value using df.itertuples() and column name gives AttributeError

dataframe loops pandas python

I have the following dataframe from where I want to retrieve the cell values using index and column names. The left column indicates the index values whereas the column names are from 1 to 5. This is a dummy dataframe which looks small but going forward I will be using this code to access a dataframe with 100…

Pandas: Unable to merge on two date columns

dataframe numpy pandas python python-datetime

I have two dataframes that look like: df1: df2: Both date columns have been made using the pd.to_datetime() method, and they both supposedly have <M8[ns] data types when using df1.Date.dtype and df2.Date.dtype. However when trying to merge the dataframes with pd.merge(df,hpi,how=”left”,on=&#822…

Check if value from one dataframe exists in another dataframe and create column

dataframe pandas python

I am looking to compare values of columns in two different datasets and create a column with the results that have matched. DF1: DF2 = Expected Result DF1: I can see the matches using the following code: And i can return a 1/0 with the following: However, i cannot seem to merge them both Answer Output:

Pandas look for substring then write in another

dataframe pandas python

So I’m trying to look down a specific column of my csv file for a partial string. If that meets a certain condition, it’ll write something else in a different column. For example: The “Percentage” column will always have the same format of “Ninety Five Percent” that is numb…

Create New Columns Using Multiple Conditions And Time Difference

dataframe pandas python timedelta

I have the following dataframe with a tricky problem: I have to make 4 columns (0-90 days, 91-180 days, 181-270 days, 271-360 days) based on the following conditions: Desired output: What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks! Answer You can write a custom functio…

: aggregate() missing 1 required positional argument: ‘func_or_funcs’

dataframe pandas python

I try to aggregate per ptid based on the diag_date, and calculate max, min and visit counts based on the diag_date: However, when I do the above (following all rules for agg) does not seem to work as I get the following error: Any ideas, are greatly appreciated! Answer To answer my questions, after getting va…

Replace Values of Multiple Columns in Pandas Dataframe More Efficiently

dataframe pandas python

I have a DataFrame, df, where I would like to replace several values user1 user2 user3 apple yoo apple mango ram mango Instead of doing to get the final DataFrame of user1 user2 user3 0 2 0 1 3 1 Is there any way I make the code above more efficient such that I can change the values of apple,