I have two dataframe, below I am trying to fill df[“Possibilites”] with df2[“first_name”] if key_words in df2[“first_name”] with this code: it returns what i expect but gives a warning also: ” SettingWithCopyWarning: A value is trying to be set on a copy of a slice fr…
Tag: dataframe
how do I succinctly create a new dataframe column based on matching existing column values with list of values?
I want to create a new column in a dataframe by matching the values in an existing column’s values with a predefined list of values. I have two approaches to this below. Both run but dont give me exactly what I want. I prefer the first approach over the second but not sure where I am going wrong with bo…
Is it possible to append a df (which is declared outside) from a function
I am trying to append a pandas df which is outside of a function. Here, I want to append df2 (inside the function) with df (is located outside of the function). I am getting UnboundLocalError: local variable ‘df’ referenced before assignment error (and that is expected because of the variable scop…
How to do explode and keep a fair proportion of number value for each new row Pandas
I have this dataframe: I would like to use explode function for column “A” and then to keep right and fair proportion for each exploded row in case with column “B” . So the result should look like this: Would this be possible with the explode function? I would manage to come to this re…
Pandas Selection of rows not working propelry
I am trying to delete rows of a df which are not part of an other columns entry from another table. For further explanation: I have a table with transactions including materialnumbers and another table with production information also including materialnumbers. I want to delete every row where a materialnumbe…
Joining dataframes using rust polars in Python
I am experimenting with polars and would like to understand why using polars is slower than using pandas on a particular example: Answer A pandas join uses the indexes, which are cached. A comparison where they do the same:
Finding similar rows in two dataframes using pandas
I have two data frames, The first one is the root data frame, second one is obtained from first data frame (which is based on a pattern that “Name” must be repeated 3 times and “Subset” must of the pattern as shown in dataframe 2 below). Based on these two dataframes, i need to add a &…
How to split a columns based on the index of the string in the columns while using a efficient method to parse all the Dataframe
I have a column filled with a string value: col_1 10500 25020 35640 45440 50454 62150 75410 I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that. Supposed result : col_1 col_2 col_3 10500 10 500 25020 25 020 35640 35
Why doesn’t str.replace replace ALL values in selected pandas dataframe column?
I’m working on a huge file that has names in columns that contain extraneous values (like the “|” key) that I want to remove, but for some reason my str.replace function only seems to apply to some rows in the column. My column in the dataframe summary looks something like this: As you can s…
How can I handle invalid phone numbers using python’s phonenumbers package and apply?
I have a dataframe containing a variety of phone numbers that I want to extract the time zone for. I am apply to loop over the series in the dataframe as follows And this works just fine as long as the phone number in x.external_number doesn’t contain a single invalid phone number; however, if one singl…