Tag: dataframe

Add column to existing panas dataframe with values as ‘Top’ and ‘Bottom’

I want to create a column in the existing dataframe with values as ‘Top’ and Bottom’, catch is, size of the dataframe changes according to calculations. For example: I will always have even number of rows. Please suggest a solution, thanks! Answer I don’t know exactly how your data is, but you can try something like this: Hypothetical data: Creating

Pandas compare and sum values between two DataFrame with different size

dataframe pandas python

Suppose I have two Dataframes with different sizes: to which I have: and: Now I want to add a third column to df1 say total_volume, where it is the summation of the volume that lie between individual row of xlow and xup of df1. I can do this using: we can check the value of say the second row as:

python pandas get distinct matches in columns

dataframe pandas python

I have a dataframe which looks a bit like what this code gives: What I want to end up with is a list of lists or dataframe or something similar which tells me the distinct matches across both columns in both directions. It’d be something like this: I have tried to do it but I can’t get it to go

Listing path and data from a xml file to store in a dataframe

dataframe elementtree python xml xpath

Here is a xml file : I want to save in a dataframe : 1) the path and 2) the text of the elements corresponding to the path. To do this dataframe, I am thinking to do a dictionary to store both. So first I would like to get a dictionary like that (where I have the values associated to

Apply transformation only on string columns with Pandas, ignoring numeric data

dataframe pandas python

So, I have a pretty large dataframe with 85 columns and almost 90,000 rows and I wanted to use str.lower() in all of them. However, there are several columns containing numerical data. Is there an easy solution for this? Than, after using something like df.applymap(str.lower) I would get: Currently it’s showing this error message: Answer From pandas 1.X you can

How to replicate same values based on the index value of other column in python

dataframe if-statement pandas python string

I have a dataframe like below and I want to add another column that is replicated untill certain condition is met. Now I want to add another column which contains additional information about the dataframe. For instance, I want to replicate Yes untill id is B and No when it is below B and Yes from C to D and

Dataframe increase speed of for loop for set value of column

dataframe pandas python

I have dataframe from pandas (import pandas as pd) I want count +1 in ‘C3’ after rising edges (rising edges start when C1 =1 and C2=0) I tried with iterrow() for a dataframe with 300000 row, it’s little bit slow, does it have a simple way to make it more faster? Thanks a lot for your help! Answer You can:

How to take specific columns in pandas dataframe only if they exist (different CSVs)

csv dataframe pandas python

I downloaded a bunch of football data from the internet in order to analyze it (around 30 CSV files). Each season’s game data is saved as a CSV file with different data columns. Some data columns are common to all files e.g. Home team, Away team, Full time result, ref name, etc… Earlier years CSV data columns picture – These

Most efficient way to find shared members of a list inside a dataframe?

dataframe pandas python

Hello experts: I’m looking at so-called ‘COVID-19 bubbles’ inside pro cycling – I’ve compiled a list of riders for each team and a list of each race they’ve done. There are about 30 riders per team, and there have been a few dozen races after the sport started up again in July. I’m stumped right now on how to proceed

Trouble when adding values for NaN in DataFrame

dataframe for-loop nan python

I have this DataFrame: And I want to fill the NaN values with keyword taken from the description. To that end I created a list with the keywords I want: Finally, I want to loop over each row in the DataFrame. Split the contents from the “description” column in each row and, if that word is also in the “keyword”