Starting from a DataFrame with a date and user column, I’d like to add a third count_past_5_days column to indicate the rolling count of occurrences of each row’s user during the past 5 days: date user count_past_5_days 2020-01-01 abc 1 2020-01-01 def 1 2020-01-02 abc 2 2020-01-03 abc 3 2020-01-04…
Tag: pandas
How do reduce a set of columns along another set of columns, holding all other columns?
I think this is a simple operation, but for some reason I’m not finding immediate indicators in my quick perusal of the Pandas docs. I have prototype working code below, but it seems kinda dumb IMO. I’m sure that there are much better ways to do this, and concepts to describe it. Is there a better…
Remove characters from column
I am trying to remove “0” and “:” from a column in a dataframe. The code I use is, Output: The result does not remove “0” and “:” How can I go about this? Answer You’re missing to assignment of the replacement back to the original column: Though you can ca…
Pandas merge stop at first match like vlookup instead of duplicating
I have two tables, PO data and commodity code data. Some genius decided that some material group codes should be the same as they are differentiated at a lower level by GL accounts. Because of that, I can’t merge on material groups, as I’ll get duplicate rows. Assume the following: As you can see,…
How can a create a percentage matrix based on a dataframe
I have a dataframe that looks like that : Place A Place B Type Number New York Paris A 34 Oslo London B 42 Oslo London A 24 i need to have the percentage number of each type according to the routes. I don’t know witch command to use to get a dataframe that looks like this xxx Paris Oslo
How to convert Excel file to json using pandas?
I would like to parse Excel file which has a couple of sheets and save a data in JSON file. I don’t want to parse first and second sheet, and also the last one. I want to parse those in between and that number of sheets is not always equal and names of those sheets are not always the same.
Trying to compare to values in a pandas dataframe for max value
I’ve got a pandas dataframe, and I’m trying to fill a new column in the dataframe, which takes the maximum value of two values situated in another column of the dataframe, iteratively. I’m trying to build a loop to do this, and save time with computation as I realise I could probably do it w…
Filter pyspark DataFrame by string match
i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row. input expected output Answer The most efficient here is to loop, you can use set intersection: Output: Used input: With a minor variation you could check for substring matc…
Python pandas – can I sort 1 column dataset into rows of matching data in another dataset
I have half written a code and got stuck at 2nd half. I have pulled info from a text doc and I have placed the info into pandas dataset column with data like I have pulled all unique categories into another dataset and made them into columns Now I want to cycle through the 1st dataset and fill out Each
two conditions multiplication in pandas
I have the following dataframe, and I am trying to get revenue column by a multiplication between columnA or columnB and columnC. The condition is: if columnB is NaN, then the revenue column = columnA * columnC if columnB is not NaN, then the revenue column = columnB * columnC how do I get this revenue column…