I have a dataframe similar to the following: If the column topic value is -1, then I want to look in the same row of columns 0 to 2, and change the value in topic, to the header of the max value. As an example, in the first row in the table above, the column topic has a value of
Tag: pandas
How can I parallelize a function with multiple arguments
I have written a function create_time_series(input_df1, info_df1, unit_name,start_date,end_date), which aims to create a time series based on log-files saved in input_df1. The problem of my function is that the execution is slow, therefore I thought of parallelizing it. The following code is my attempt at uti…
Replace XML variables which have the the same text with another text variable in python
I’m using python 3 and beautifulsoup4, pandas, counter, to convert one XML to CSV file There is several thousand products in this Xml. I have trouble with one particular problem. Many of this product in XML are a children of parent product, but parent product is not itself in XML. Each of this children …
Pandas adding rows to dataframe
I’m trying to add more rows or records to my data frame, let’s say it looks like this: and I have a CSV file stored in another data frame without headers now I want a new data frame that looks like this I have tried using append and concat but I didn’t get the result that I wanted Answer Ass…
Count occurrences in last 30 days with Pandas Dataframe
I have a pandas Dataframe with an ID column and a date column (YYYY-MM-DD), ID Date 001 2022-01-01 001 2022-01-04 001 2022-02-07 002 2022-01-02 002 2022-01-03 002 2022-01-28 There may be gaps in the date field, as shown. I would like to have a new column, “occurrences_last_month” where it counts t…
Make a customized filter on a grouped dataframe with multiple conditions
Please find below my input/desired output : INPUT OUTPUT (desired) The goal is firstly to have one line per Id in the output. The output will be made based on a this simple statement : This is what I’ve tried so far : Do you have any suggestion/propositions, please ? Any help we be so much appreciated !…
Remove duplicates and keep row that certain column is Yes in a pandas dataframe
I have a dataframe with duplicated values on column “ID”, like this one: I need a way to remove duplicates (by “ID”) but keep the ones that the column Primary is “Yes” (all unique values have “Yes” in that column and duplicated values have one record as “Y…
NotImplementedError when calling pandas_profiling.ProfileReport.to_widgets() inside Apache Zeppelin
I’m trying to use the pandas_profiling package to automagically describe some data frames from inside Apaceh Zeppelin. The code I’m running is: My result is: Any way to work around this? Any hope of working around it from inside Zeppelin? Answer The NotImplementedError is being raised from check_d…
Finding all the rows with the approximate values that match a condition in a dataframe
I have a panda’s dataframe that is something like this The el1, el2, and el3 do not matter at all. I want to find the row with the X nearest to x=20 so I do which gives me the index where x=19.3 is So far so good Now, the real problem Imagine I have a dataframe where x goes from
How to filter a dataframe column having multiple values in Python
I have a data frame that sometimes has multiple values in cells like this: Now, I want to filter the data frame having an apple in the value. So my output should look like this: I used the str.contains(‘apple’) but this is not returning the ideal result. Can anyone help me with how I can get this …