I have the following df, & would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9. I have done this through other methods and I’m trying to understand list comprehension better. Answer Let’s unpack this a bit. The original stateme…
Tag: pandas
conditional groupby and update column – python, pandas, groupby
i have a df which I want to add a column that shows the student who is place (1) from the group(‘subject’, ‘class’) and update the column after there is a new place (1). code: ╔═════════╦═════════╦═════════╦═══════╗ ║ subject ║ class ║ student ║ place ║ ╠═════════╬═════════╬═════════╬═…
How to filter a dataframe and each row, based on the presence of strings (from another list) in different columns and add a new column with annotation
I have a dataframe (df1) where I would like to search each row for items from listA. If the dataframe has a row that contains ‘positive’ and one or more of the items from listA, I would like to generate another dataframe (df2) by adding a column called result, listing the listA item + present. Ite…
Pandas – What datatype should a duration column (mm:ss) be to use aggregates on it?
I’m doing some NBA analysis and have a “Minutes Played” column for players in a mm:ss format. What dtype should this column be to perform aggregate functions (mean, min, max, etc…) on it? The df has over 20,000 rows, so here is a sample of the column in question: I ran this code to cha…
How do I create a new dataframe column based on two other columns?
I want to create a binary column which indicates 1 if the values of both columns in the following table are within the same range. For example, if the value on cat_1 is between 5-10 and the value in cat_2 is also between 5-10 then it should indicate 1, otherwise, it should be 0. So far, I have tried the
Dividing 24h into working shifts in Python Pandas
I am dealing with dividing a day into working shifts. Let’s have a look at my sample code: I’d like to divide the time into 3 shifts, 00:00 to 08:00 is Shift1, 08:00 to 16:00 will be Shift2 and till 00:00 will be Shift3. What I get is true, but I would like to know if there is any elegant
Identify and count segments between a start and an end marker
The goal is to fill values only between two values (start and end) with unique numbers (will be used in a groupby later on), notice how the values between end and start are still None in the desired output: Code: Answer Usually problems like these are solved by fiddling with cumsum and shift. The main idea fo…
Running a for loop or .apply with a pandas series
I’m trying to run a for loop or .apply using lambdas for my pandas series. Here’s the code: What I’m trying to achieve is for each word in df[‘Filtered_text’], apply the analyzer.polartiy_scores(x[‘Filtered_text’]) through the column. An example of what is stored in d…
Pandas DataFrame Groupby two columns and get different relation in same keys insert list
I have this table : I have to create a dictionary with Head key and values equal to the relations but not repeated and for each value of the relations I have to insert the corresponding tail. example: I don’t really know how to do it. Is there someone who can help me? Second Example Input: the output:…
While merging 100+ CSV files, how to fill nan in a column if it doesn’t exist in “usecol”?
Considering that I have CSV files which looks roughly like this I am using the following script which was suggested here Most of the files have all three columns, while few of them do not have ColC. This will give an error (understandably) which is as follows: ValueError: Usecols do not match columns, columns…