I have a dataframe that looks like that : Place A Place B Type Number New York Paris A 34 Oslo London B 42 Oslo London A 24 i need to have the percentage number of each type according to the routes. I don’t know witch command to use to get a dataframe that looks like this xxx Paris Oslo
Tag: dataframe
How to convert Excel file to json using pandas?
I would like to parse Excel file which has a couple of sheets and save a data in JSON file. I don’t want to parse first and second sheet, and also the last one. I want to parse those in between and that number of sheets is not always equal and names of those sheets are not always the same.
Trying to compare to values in a pandas dataframe for max value
I’ve got a pandas dataframe, and I’m trying to fill a new column in the dataframe, which takes the maximum value of two values situated in another column of the dataframe, iteratively. I’m trying to build a loop to do this, and save time with computation as I realise I could probably do it w…
Filter pyspark DataFrame by string match
i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row. input expected output Answer The most efficient here is to loop, you can use set intersection: Output: Used input: With a minor variation you could check for substring matc…
for every row find the last column with value 1 in binary data frame
consider a data frame of binary numbers: how do I find, for each row, the rightmost column in which a 1 is observed? so for the dataframe above it would be: Answer One option is to reverse the column order, then use idxmax: Output:
Sum values of a coulmn in specific rows in a dataframe
I would like to learn how to specify a “subset-sum” in a dataframe My dataframe looks like this: The Data/Time column is the dataframes’ index With I get the total sum of column A. My aim is to sum up a subset of rows only like between 2022-03-18 07:37:51 and 2022-03-18 07:37:55 so that I ge…
How to count letter based similarity on pandas dataframe
Here’s my first dataframe df1 Here’s my second dataframe df2 Similarity Matrix, columns is Id from df1, rows is Id from df2 Note: 0 value in (1,1), (2,1) and (3,2) because no letter similar 0.25 value in (3,1) is because of only 1 letter from raUw avaliable in 4 letter `dnag’ (1/4 equals 0.2…
All cells getting updated in pandas df using loc
So I create an empty pandas df, where I initialize all the cell values to empty lists, except the diagonals, which are set to math.inf The indexes are the start position, and the column headers are the end position I want to get the start and end positions, and the difference between the days to get from star…
How does rsuffix and lsuffix work while joining multiple dataframes?
I have written the following code however I am unable to understand how to name the rsuffix and lsuffix parameters All my dfs have same column names example: When I am printing dfs_list[2].reset_index() I do get my expected output but I am unable to comprehend the suffix names. How do we define it? output: Ca…
split a string representation with ranges into a list of dates
I have this pandas dataframe column with timeranges (02.07.2021 – 07.07.2021 ) and single days (04.08.2021) as a list. Dates ‘02.07.2021 – 07.07.2021 , 04.08.2021, 19.06.2021 – 21.06.2021’ ‘13.02.2021 – 15.02.2021 , 03.03.2021 ‘ NaN NaN I want this: Dates 02.07.…