Tag: pandas

Consolidating categories in columns

I have a df with a race column, which has 4 categories. However, I would like to only have three categories by combining the last two categories. This is what my current df looks like: I want to consolidate the race==3 and race ==4 into one value (which would be race ==3). So my new df output would look somet…

Fastest way to use if/else statements when looping through dataframe with pandas

conditional-statements dataframe loops pandas python

I am trying to run conditional statements when iterating through pandas df rows and it results with a very slow code. For example: The df is only about 40k rows long and it’s very slow, as this is only one of the statements I am trying to incorporate with this loop. Can you help with a faster way to do

Split Pandas Dataframe With Equal Amount of Rows for each Column Value

dataframe machine-learning pandas python

This is for a machine learning project. I have a CSV file which I have read in as a Pandas dataframe. The CSV looks like this: I have decreased the sample size and equalized the data, so that I have a dataframe with 60,000 rows; 30,000 rows with label 1 and label 0. I now want to split the dataframe

Select dataframe index derived from comparing a dataframe column and a list

pandas python

Considering the instrument_ticker dataframe and tickers list below: I do a split to select the first item of each value obtained from this operation of each line: Now, how can I get the index of instrument_ticker where Reduced_Ticker contains a ticker_reduced item? I need to save this in a list, like the foll…

pandas df apply condition on multiple columns

numpy pandas python

I have a df that looks like this: I want to create a new column named promoted from the columns master_feature, epic, and feature: value of promoted will be : master feature if adjacent master_feature column value is not null. feature if adjacent feature column value is not null ,and likewise for epic somethi…

Change certain values in a dataframe column based on conditions on several columns

dataframe numpy pandas python where-clause

Let’s take this sample dataframe : I would like to replace the “B” values in Category by “B2” where there is a C or a D in Subcategory. I tried the following but I get the error “The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()&#…

Better (maybe more SQL-ish) way to populate pandas dataframe column from row and meta data than iterating over rows, please

dataframe pandas python

My data looks like this: Because I used a pandas.groupby() process to generate my metadata, it looks like this: Now, if my metadata looked like: I could easily write: I feel that there should be a different, pandas-oriented, way to directly use the metadata in the meta_df dataframe format that I have, and tha…

How to count the same (identical) titles in a row

collections count pandas python

Good day everyone. I’m trying to count title of soccer team in list of dictionaries. There are 10 (sometimes 20) games in list with data. So, home_team in list – looks like: {Team 1, … }, {Team 1, … }, {Team 2, … }, {Team 2, … }, {Team 1, … }, {Team 2, … }, {Tea…

Mapping from a different dataframe

dataframe pandas python

I have a dataset of patients, e.g.: and a dataset of diseases of each patient (by ICD code): How can I flag each patient if he had history of a specific ICD code, desired output: I am currently doing it with iteration but this takes too long…. Answer If need indicators – it means only 0, 1 values …

Pandas string concatenation of row values of a column that have an implicit hierarchy

dataframe pandas python row string-concatenation

I have a dataframe signifying the temperature for three days for different regions in India. It’s given in the following image. original_dataframe I need to generate another column in the same dataframe that concatenates the string values of the state and the city which is seen in ‘Col5’ as …