Tag: dataframe

How can a create a percentage matrix based on a dataframe

I have a dataframe that looks like that : Place A Place B Type Number New York Paris A 34 Oslo London B 42 Oslo London A 24 i need to have the percentage number of each type according to the routes. I don’t know witch command to use to get a dataframe that looks like this xxx Paris Oslo

How to convert Excel file to json using pandas?

dataframe excel json pandas python

I would like to parse Excel file which has a couple of sheets and save a data in JSON file. I don’t want to parse first and second sheet, and also the last one. I want to parse those in between and that number of sheets is not always equal and names of those sheets are not always the same.

Trying to compare to values in a pandas dataframe for max value

dataframe max pandas python

I’ve got a pandas dataframe, and I’m trying to fill a new column in the dataframe, which takes the maximum value of two values situated in another column of the dataframe, iteratively. I’m trying to build a loop to do this, and save time with computation as I realise I could probably do it w…

Filter pyspark DataFrame by string match

dataframe pandas pyspark python

i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row. input expected output Answer The most efficient here is to loop, you can use set intersection: Output: Used input: With a minor variation you could check for substring matc…

for every row find the last column with value 1 in binary data frame

dataframe pandas python python-3.x

consider a data frame of binary numbers: how do I find, for each row, the rightmost column in which a 1 is observed? so for the dataframe above it would be: Answer One option is to reverse the column order, then use idxmax: Output:

Sum values of a coulmn in specific rows in a dataframe

dataframe jupyter jupyter-notebook python

I would like to learn how to specify a “subset-sum” in a dataframe My dataframe looks like this: The Data/Time column is the dataframes’ index With I get the total sum of column A. My aim is to sum up a subset of rows only like between 2022-03-18 07:37:51 and 2022-03-18 07:37:55 so that I ge…

How to count letter based similarity on pandas dataframe

dataframe pandas python similarity

Here’s my first dataframe df1 Here’s my second dataframe df2 Similarity Matrix, columns is Id from df1, rows is Id from df2 Note: 0 value in (1,1), (2,1) and (3,2) because no letter similar 0.25 value in (3,1) is because of only 1 letter from raUw avaliable in 4 letter `dnag’ (1/4 equals 0.2…

How does rsuffix and lsuffix work while joining multiple dataframes?

dataframe join pandas python

I have written the following code however I am unable to understand how to name the rsuffix and lsuffix parameters All my dfs have same column names example: When I am printing dfs_list[2].reset_index() I do get my expected output but I am unable to comprehend the suffix names. How do we define it? output: Ca…

split a string representation with ranges into a list of dates

dataframe pandas python split

I have this pandas dataframe column with timeranges (02.07.2021 – 07.07.2021 ) and single days (04.08.2021) as a list. Dates ‘02.07.2021 – 07.07.2021 , 04.08.2021, 19.06.2021 – 21.06.2021’ ‘13.02.2021 – 15.02.2021 , 03.03.2021 ‘ NaN NaN I want this: Dates 02.07.…