Tag: dataframe

Combining dummies and count for pandas dataframe

count dataframe dummy-variable pandas python

I have a pandas dataframe like this: as a plain text: {‘id;sub_id;value;total_stuff related to id and sub_id’: [‘aaa;1;cat;10’, ‘aaa;1;cat;10’, ‘aaa;1;dog;10’, ‘aaa;2;cat;7’, ‘aaa;2;dog;7’, ‘aaa;3;cat;5’, ‘bbb;1;panda;20’, ‘bbb;1;cat;20’, ‘bbb;2;panda;12’]} The desired output I want is this. Note that there are many different “values” possible, so I would need to automate the creation of dummies variables (nb_animals). But these dummies variables must contain the

Pandas – Repeat row if found in list and count

dataframe pandas python

Need help in repeating rows if found in list. If found value in list increment count If more than one instance found then repeat the row by incrementing count Dataframe: Df looks like: Input list: In need output like: Tried something like this to get first matching index so I can repeat the row but not sure how to do

How can I separate one row from a data set but repeat in each line some of the variables?

dataframe pandas python

I have a dataset where each row contains information that needs to be separated and printed in different rows, but I need to keep the name of the company on each newly printed row: example dataset These are the headers: These are 2 rows of data: I need to separate one line into as many as I need. Some companies

Python, Pandas : I want to use pandas with a user login form

dataframe pandas python

I have make a form with tKinter that has 2 Entries (username, password) I also have a csv file that contains my user info I am importing the file like so: I want to get the username and password that the user is going to give me and check if they are in my dataframe I figured out that I

Apply a function including if to each row of a dataframe in pandas without for loop

apply dataframe pandas python

Given a dataframe, I want to get the nonzero values of each row and then find the minimum of absolute values. I want to have a user defined function that does this for me. Also, I do not want to use any for loop since the data is big. My try I get ValueError: The truth value of a Series

How to remove a group of specific rows from a dataframe?

dataframe pandas python

I have a dataframe with 7581 rows and 3 columns (id,text,label). And I have a subgroup of this dataframe of 794 rows. What I need to do is to remove that subgroup of 794 rows (same labels) from the big dataframe of 7581. This is how the subgroup looks like: Photo I have tried to do this: But the following

Pandas dataframe – fillna with last of next month

dataframe datetime pandas python

I’ve been staring at this way too long and I think Ive lost my mind, it really shouldn’t be as complicated as I’m making it. I have a df: Date1 Date2 2022-04-01 2022-06-17 2022-04-15 2022-04-15 2022-03-03 NaT 2022-04-22 NaT 2022-05-06 2022-06-06 I want to fill the blanks in ‘Date2’ where it keeps the values from ‘Date2’ if they are present

Is it possibe to change similar libraries (Data Analysis) in Python within the same code?

dataframe modin pandas python

I use the modin library for multiprocessing. While the library is great for faster processing, it fails at merge and I would like to revert to default pandas in between the code. I understand as per PEP 8: E402 conventions, import should be declared once and at the top of the code however my case would need otherwise. Then I

Faster alternative to groupby, unstack then fillna

dataframe fillna pandas python

I’m currently doing the following operations based on a dataframe (A) made of two columns with multiple thousands of unique values each. The operations performed on this dataframe are: The output is a table (B) with unique values of col1 in rows and unique values of col2 in columns, and each cell is the count of rows, from the original

Replace value based on a corresponding value but keep value if criteria not met

apply arrays dataframe pandas python

Given the following dataframe, INPUT df: Cost_centre Pool_costs 90272 A 92705 A 98754 A 91350 A Replace Pool_costs value with ‘B’ given the Cost_centre value but keep the Pool_costs value if the Cost_centre value does not appear in list. OUTPUT df: Cost_centre Pool_costs 90272 B 92705 A 98754 A 91350 B Current Code: This code works up until the else