Tag: pandas

Dataframe increase speed of for loop for set value of column

I have dataframe from pandas (import pandas as pd) I want count +1 in ‘C3’ after rising edges (rising edges start when C1 =1 and C2=0) I tried with iterrow() for a dataframe with 300000 row, it’s little bit slow, does it have a simple way to make it more faster? Thanks a lot for your help! A…

non fixed rolling window

pandas python rolling-computation

I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list: Something like this: and the result would be: 6.67 is calculated as average of 3 elements 10, 2, 8. I implemented a slow solution, and every idea is welcome to make it

How can I print a statement in Python to show the results of a mathematical operation on a Pandas Dataframe?

pandas python slice

So I’ve got a simple summary dataframe of sales by gender that goes: All I want to do now is to return a line in Python that reads: The mean gap in the amount sold is 16.67% This is simply 30 – 25 divided by 30, and multiplied by 100; and I want a % sign at the end. I

Error re.findall() when used in a iteration over a list from pandas dataframe

iteration pandas python

I converted a column to list from a pandas df: I wanted to separate this kind of strings: ‘BuyMeADrink’ into ‘Buy Me A Drink’ So I used one of the following: or or All of the above returned this: TypeError: expected string or bytes-like object I understand that findall() needs strings …

How to take specific columns in pandas dataframe only if they exist (different CSVs)

csv dataframe pandas python

I downloaded a bunch of football data from the internet in order to analyze it (around 30 CSV files). Each season’s game data is saved as a CSV file with different data columns. Some data columns are common to all files e.g. Home team, Away team, Full time result, ref name, etc… Earlier years CSV …

Most efficient way to find shared members of a list inside a dataframe?

dataframe pandas python

Hello experts: I’m looking at so-called ‘COVID-19 bubbles’ inside pro cycling – I’ve compiled a list of riders for each team and a list of each race they’ve done. There are about 30 riders per team, and there have been a few dozen races after the sport started up again in J…

How to get unique counts based on a different column, with pandas groupby

pandas pandas-groupby python

I have the following dataframe: I would like to groupby effortduration and get the count of each column based on the unique count of the user column. This is what I have tried so far: However, that is again not what I am looking for because the values of callbacks and applications are not based on the user co…

Pandas get column values based on duplicate rows

numpy pandas python python-3.x

I have a sample DF: OP: I am trying to get the values of columns – “A” and “B” wherever there are duplicate values in column col. For example the column col has value Apple in index – 0,1,3,5 and I am trying to get the respective values in column – A and B, ie I have …

Why can’t I define column names when I create the dataframe with pandas?

columnname dataframe pandas python

There are a few times when I have a dictionary (not very large) and try to convert it into a dataframe, the code above would yield one with each cell being NaN. Yet the code below works fine. I wonder what could be the difference? Answer What are your dictionary keys? I am guessing the keys don’t align …

Python MemoryError: Unable to allocate 10.8 TiB for an array with shape () and data type int64

pandas python

I am trying to combine two data sets. Using codes as follows It returns with error: MemoryError: Unable to allocate 10.8 TiB for an array with shape (1483050607760,) and data type int64 How can I solve this if my laptop is a 500GB+8GB one? Thank you in advance. Answer try dask then you can convert it to panda…