I am getting hold of data from google sheet(consisting of 26 columns) into a python dataframe. 4 columns A,B,C,D have data in the form of % values(eg 15.6%) and also contain some rows with N/A values. I am trying to convert these columns into numbers so that I can use them for other calculations, but am having problems doing so.
Tag: dataframe
Over and under sample multi-class training examples (rows) in a pandas dataframe to specified values
I would like to make a multi-class pandas dataframe more balanced for training. A simplified version of my training set looks as follows: Imbalanced dataframe: counts for class 0, 1 and 2 are respectively 7, 3 and 1 I made this with the code: Now I would like to randomly under sample the majority class(es) and randomly over sample the
Python Pandas style highlight specific cells for each column with different condition
I’m trying to highlight specific cells for each column with different condition which their value matches the condition for each row. Below image is what I want to achieve: The table I attempt to achieve I searched google and stackoverflow but none of these can meet my requirement. Can anyone who’s familiar with Pandas Style could assist? Below are the
Count value pairings from different columns in a DataFrame with Pandas
I have a df like this one: df: I want to transform this into a df that looks like this So for every item i want a row with the possible combinations of cup and size and an additional row with the frequency. What is the proper way to do this using pandas? Answer Let’s try: Add a frequency column
Pandas dataframe slice left assignment
I want to do a left assignment of one column’s values between DataFrame slices where the indexes don’t match. Is there a single expression that will work whether the left slice’s indexes are a subset or a superset of the right slice’s? The following attempt fails when left is a subset: Answer If you want the missing index in the
GroupBy Pandas with ratio
I am working on a dataset which looks something like this: I am trying to do 2 things: Find length of longest sequence of each type and find ratio of A/B and B/A for those sequences for each ID. Ratio attribute explanation: Calculate the total amount in the longest sequence for each ID(say length n). If the sequence is that
Automatic data wrangling on Pandas with multiple dataframes using lists and loops
for professional purposes I need to produce some reports that includes new entries every week. I have 16 dataframes having same column names (df names are week1, week2… week16). I created a list of the dataframes and then a loop. I wanted to test rename of column with index = 1 and I did not succeed. I am forced to
How to find all columns contains string and put in a new columns?
I was wondering how could I find all values that start with ‘orange’ from all the columns and parse it into new columns. expected output : Answer Let’s try stack then filter by str.contains: df1: Or melt for same order as OP: df1: regex ^orange: ^ asserts position at start of a line orange matches the characters orange literally (case
Shift column position to right based on criteria using Pandas
I have a dataframe that looks like below I would like to position shift by 1 cell to the right if there is NA in the column dep_id. I tried the below but it wasn’t working Any efficient and elegant approach to shift column position on big data? For example, I expect my output to be like as shown below
How to get all the rows with the same values on a certain set of columns of an other specified row in Pandas?
In a setup similar to this: My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3) I want this (index 3, set {B,C}): The problem now is that in