Tag: dataframe

Elegant way to write np.where for different values in a column

dataframe pandas pandas-groupby python python-3.x

I have a dataframe like as shown below I would like to apply 2 rules to the logout_date column Rule 1 – If person type is B, C,D,E AND logout_date is NaN, then copy the login date value Rule 2 – If person type is A AND logout_date is NaN, then add 2 days to the login date I tried

How to create dataframe and set index with dictionary of dictionaries?

dataframe dictionary nested pandas python

I want to create a Dataframe with the columns as the Days of the week, and each person’s name and corresponding start/end times. So far I can get the data from the dictionary to the Dataframe, but I am struggling to get the index correct. I managed to get a bit of help from this question Python – how to

Sum pandas dataframe column values grouped by another column then update row with sum and remove duplicates

dataframe pandas python python-3.x

I’m trying to sum two columns (in the below example Seasons and Rating) in a pandas df for each Actor in the below example. I then want the totals to be stored per Actor and any other rows containing that Actor to be removed. In the below example the ‘Name’ that is retained or disgarded is not important. For Example

Optimal way to acquire percentiles of DataFrame rows

dataframe pandas percentile python

Problem I have a pandas DataFrame df: My desired output, i.e. new_df, contains the 9 different percentiles including the median, and should have the following format: Attempt The following was my initial attempt: However, instead of returning the percentiles of all columns, it calculated these percentiles for each val column and therefore returned 1000 columns. As it calculated the percentiles

Pivotting DataFrame with fixed column names

apache-spark dataframe pyspark python

Let’s say I have below dataframe: and by design each user has 3 rows. I want to turn my DataFrame into: I was trying to groupBy(col(‘user’)) and then pivot by ticker but it returns as many columns as different tickers there are so instead I wish I could have fixed number of columns. Is there any other Spark operator I

Pandas: Replace value in column by using another column, if condition is true

dataframe pandas python

I have the following dataframe: I would like to replace value in column Sector_y by using column Sector_x, if Sector_y = ” so that I get the following result: I tried using the code but didn’t deliver the result I wanted. Any suggestions how to solve the problem? Answer Fix np.where

pandas: Create new column by comparing DataFrame rows with columns of another DataFrame

dataframe pandas python

Assume I have df1: And a df2: I’m looking for a way to create a new column in df2 that gets number of rows based on a condition where all columns in df1 has values greater than their counterparts in df2 for each row. For example: To elaborate, at row 0 of df2, df1.alligator_apple has 4 rows which values are

Set indices without manually typing them (too many), regular numerical sequences as indices

dataframe indices python

I have a pandas dataframe with 1111 rows and want to reindex the rows having the following names: First 11 rows: Next 100 rows: next 1000 rows: Additionally, for the last 900 rows, I need the block above, substituting the first 1s (1s after the p) with 2s, next block with 3s, next block with 4s, …, last block with

Combine two tables, one with header only, another with table values for bs4

beautifulsoup dataframe html-table python

Want to combine two <table> , one with header another with table values, the first table consist with <table>, <thead> and no value in <tbody> with header information only, the second table consist with <table>, no value in <thead> and <tbody> with table value only HTML code Python Code Execution Result Expected Result (5 columns) Answer Output: Or applying to