so I want to get the count of ‘expert’ and ‘user’ from every row of data frame and from every list. And after getting a count of experts and users, I want to store the respective ids in another list. I have tried converting them into the dictionary and calculate using key but it is not working. Can anyone help
Tag: dataframe
Elegant way to write np.where for different values in a column
I have a dataframe like as shown below I would like to apply 2 rules to the logout_date column Rule 1 – If person type is B, C,D,E AND logout_date is NaN, then copy the login date value Rule 2 – If person type is A AND logout_date is NaN, then add 2 days to the login date I tried
How to create dataframe and set index with dictionary of dictionaries?
I want to create a Dataframe with the columns as the Days of the week, and each person’s name and corresponding start/end times. So far I can get the data from the dictionary to the Dataframe, but I am struggling to get the index correct. I managed to get a bit of help from this question Python – how to
Sum pandas dataframe column values grouped by another column then update row with sum and remove duplicates
I’m trying to sum two columns (in the below example Seasons and Rating) in a pandas df for each Actor in the below example. I then want the totals to be stored per Actor and any other rows containing that Actor to be removed. In the below example the ‘Name’ that is retained or disgarded is not important. For Example
Optimal way to acquire percentiles of DataFrame rows
Problem I have a pandas DataFrame df: My desired output, i.e. new_df, contains the 9 different percentiles including the median, and should have the following format: Attempt The following was my initial attempt: However, instead of returning the percentiles of all columns, it calculated these percentiles for each val column and therefore returned 1000 columns. As it calculated the percentiles
Pivotting DataFrame with fixed column names
Let’s say I have below dataframe: and by design each user has 3 rows. I want to turn my DataFrame into: I was trying to groupBy(col(‘user’)) and then pivot by ticker but it returns as many columns as different tickers there are so instead I wish I could have fixed number of columns. Is there any other Spark operator I
Pandas: Replace value in column by using another column, if condition is true
I have the following dataframe: I would like to replace value in column Sector_y by using column Sector_x, if Sector_y = ” so that I get the following result: I tried using the code but didn’t deliver the result I wanted. Any suggestions how to solve the problem? Answer Fix np.where
pandas: Create new column by comparing DataFrame rows with columns of another DataFrame
Assume I have df1: And a df2: I’m looking for a way to create a new column in df2 that gets number of rows based on a condition where all columns in df1 has values greater than their counterparts in df2 for each row. For example: To elaborate, at row 0 of df2, df1.alligator_apple has 4 rows which values are
Set indices without manually typing them (too many), regular numerical sequences as indices
I have a pandas dataframe with 1111 rows and want to reindex the rows having the following names: First 11 rows: Next 100 rows: next 1000 rows: Additionally, for the last 900 rows, I need the block above, substituting the first 1s (1s after the p) with 2s, next block with 3s, next block with 4s, …, last block with
Combine two tables, one with header only, another with table values for bs4
Want to combine two <table> , one with header another with table values, the first table consist with <table>, <thead> and no value in <tbody> with header information only, the second table consist with <table>, no value in <thead> and <tbody> with table value only HTML code Python Code Execution Result Expected Result (5 columns) Answer Output: Or applying to