I want to have a function that creates a new dataframe from two dataframes. I want to show the mismatched columns based on id number and a given column. dataframes as input: expected output: Answer STEP 1 // add the table name Prefix on column name STEP 2 // Concat both df STEP 3 // Using lambda function findout which
Tag: dataframe
Convert nested dictionary to pandas dataframe
I have a nested dictionary as below: I need to convert it into a dataframe like below I have tired the following code from this answer I am getting the dataframe like below This is the closest I got to the desired output. What changes do I need to make to the code to get the desired dataframe? Answer Change
Using Pandas df.loc
I have a DataFrame of a csv file which is being read by pandas. What I am attempting to do is use df.loc to add a new column but only insert values into the column when values from another column, called “SKU” end with “-RF” and “-NEW”. The code I was working on is below. It has the csv file
Substitute numbers in a list of type object pandas
I have a dataframe df looking as follows: What I would like to do is to substitute into df[‘cited_ids’] 0 whenever the corresponding id has d=0 (i) and replace d=1 if there is at least one 0 in the list of df[‘cited_ids’] and the previous d was not 0 (ii). In other words, the first step (i) would result in:
Pandas take number out string
In my data, I have this column “price_range”. Dummy dataset: I am using pandas. What is the most efficient way to get the upper and lower bound of the price range in seperate columns? Answer Alternatively, you can parse the string accordingly (if you want to limits for each row, rather than the total range: Output:
Python, comparing dataframe rows, adding new column – Truth Value Error?
I am quite new to Python, and somewhat stuck here. I just want to compare floats with a previous or forward row in a dataframe, and mark a new column accordingly. FWIW, I have 6000 rows to compare. I need to output a string or int as the result. My Python code: I get the error: ValueError: The truth value
Identify pairs of events then calculate the time elapsed between events
I have a dataframe with messages sent and received. I want to calculate the time it took for someone to reply to the message. The method I thought of using was identifying pairs, so if sent =A and received =B, then there should be another entry with sent=B and received =A. Then once I identify the pairs, I can calculate
Create multiple columns at once based off of existing columns
I have this dataframe df: I want to create two new columns at once which are simply the character length of the existing two columns. The result should look like this: I’ve tried to use for comprehension to generate the lists at once, such as so: but I get I would like to do more complex operations and create lots
Fastest way to get all first-matched rows given a sequence of column values in Pandas
Say I have a Pandas dataframe with 10 rows and 2 columns. Now that I am given a sequence of ‘col1’ values in a numpy array: I want to find the rows that have the first occurence of 3, 1 and 2 in ‘col1’, and then get the corresponding ‘col2’ values in order. Right now I am using a list
Using np.select to change mix data types (int and str) in a Pandas column
I’ve been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this: The categories I’ve been tring to change them to: This has been the way I’ve been trying to solve this: But, I get this error: ValueError: shape mismatch: objects cannot