I have a df with a race column, which has 4 categories. However, I would like to only have three categories by combining the last two categories. This is what my current df looks like: I want to consolidate the race==3 and race ==4 into one value (which would be race ==3). So my new df output would look somet…
Tag: pandas
Fastest way to use if/else statements when looping through dataframe with pandas
I am trying to run conditional statements when iterating through pandas df rows and it results with a very slow code. For example: The df is only about 40k rows long and it’s very slow, as this is only one of the statements I am trying to incorporate with this loop. Can you help with a faster way to do
Split Pandas Dataframe With Equal Amount of Rows for each Column Value
This is for a machine learning project. I have a CSV file which I have read in as a Pandas dataframe. The CSV looks like this: I have decreased the sample size and equalized the data, so that I have a dataframe with 60,000 rows; 30,000 rows with label 1 and label 0. I now want to split the dataframe
Select dataframe index derived from comparing a dataframe column and a list
Considering the instrument_ticker dataframe and tickers list below: I do a split to select the first item of each value obtained from this operation of each line: Now, how can I get the index of instrument_ticker where Reduced_Ticker contains a ticker_reduced item? I need to save this in a list, like the foll…
pandas df apply condition on multiple columns
I have a df that looks like this: I want to create a new column named promoted from the columns master_feature, epic, and feature: value of promoted will be : master feature if adjacent master_feature column value is not null. feature if adjacent feature column value is not null ,and likewise for epic somethi…
Change certain values in a dataframe column based on conditions on several columns
Let’s take this sample dataframe : I would like to replace the “B” values in Category by “B2” where there is a C or a D in Subcategory. I tried the following but I get the error “The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()&#…
Better (maybe more SQL-ish) way to populate pandas dataframe column from row and meta data than iterating over rows, please
My data looks like this: Because I used a pandas.groupby() process to generate my metadata, it looks like this: Now, if my metadata looked like: I could easily write: I feel that there should be a different, pandas-oriented, way to directly use the metadata in the meta_df dataframe format that I have, and tha…
How to count the same (identical) titles in a row
Good day everyone. I’m trying to count title of soccer team in list of dictionaries. There are 10 (sometimes 20) games in list with data. So, home_team in list – looks like: {Team 1, … }, {Team 1, … }, {Team 2, … }, {Team 2, … }, {Team 1, … }, {Team 2, … }, {Tea…
Mapping from a different dataframe
I have a dataset of patients, e.g.: and a dataset of diseases of each patient (by ICD code): How can I flag each patient if he had history of a specific ICD code, desired output: I am currently doing it with iteration but this takes too long…. Answer If need indicators – it means only 0, 1 values …
Pandas string concatenation of row values of a column that have an implicit hierarchy
I have a dataframe signifying the temperature for three days for different regions in India. It’s given in the following image. original_dataframe I need to generate another column in the same dataframe that concatenates the string values of the state and the city which is seen in ‘Col5’ as …