I’m looking for an algorithm to create a new column based on values from other columns AND respecting pre-established rules. Here’s an example: artificial data The goal is to create a new_column based on the values of col_1, col_2, and col_3. For that, the rules are: If the value ‘Yes’ is present in any of the columns, the value of
Tag: data-wrangling
Create a List in Python from Value of Max Data
I want to create a list in Python from the dataset below: With some data wrangling by transposing the data to convert the value with this code: The data frame look like this: For example, Final result that I expected should be a list that consists a maximum value from each column: Answer You can try this: Output:
If there is a second column present then populate second column values, else populate first column values in Dataframe
I have a dataframe as seen below: I need two columns now, column A and Column B. Conditions summarized: The required dataframe should be as follows: Answer Try: The !=” will work if you truly have nothing in the cell (as opposed to a NaN etc.). If you have actual NaN values use:
How to split data in a column into some separate columns in Python?
So, I have a data frame given below: I want to have the results in the og dataframe with some single line strings separately, such as [107.625764, -6.910353], [107.625871, -6.910358], split to 107.625764, -6.910353 . The detail of expected results are in the picture below. Expected Results All I know that we can apply str.split method with specifying any specific
Can Pandas output inferred schema for a CSV file?
Is there a method I can use to output the inferred schema on a large CSV using pandas? In addition, any way to have it tell me with that type if it is nullable/blank based off the CSV? File is about 500k rows with 250 columns. With my new job, I’m constantly being handed CSV files with zero format documentation.
Most efficient way to combine large Pandas DataFrames based on multiple column values
I am processing information in several Pandas DataFrames with 10,000+ rows. I have… df1, student information df2, student responses I want… a DataFrame with columns for the class number, student ID, and unique assignment titles. The assignment columns should contain the students’ highest score for that assignment. There can be 20+ assignments / columns. A student can have many different