Tag: pandas

Splitting strings containing newline command and outputting to two columns

A minimal example of my data looks as follows: I am looking for a way (using pandas, preferably) to: identify rows in which the address column contains “C/O”, and split the string at the newline (n) command and output the part of the string before the newline command to the corresponding row in th…

Pandas Selection of rows not working propelry

data-manipulation dataframe pandas python

I am trying to delete rows of a df which are not part of an other columns entry from another table. For further explanation: I have a table with transactions including materialnumbers and another table with production information also including materialnumbers. I want to delete every row where a materialnumbe…

different output from pandas iterrows if .csv column headed “name” than other text

pandas python

i’m a beginner using pandas to look at a csv. i’m using .iterrows() to see if a given record matches today’s date, so far so good. however when calling (row.name) for a .csv with a column headed ‘name’ i get different output than if i rename the column and edit the (row.”co…

How do I rearrange the order of pie slices in pandas plot?

matplotlib pandas python

I am trying to rearrange the order of the pie slices in my pie chart. I am relatively new to Python and am having trouble figuring this out. Here is my current code: The pie chart slices arrange in alphabetical order by the names of the countries, even though my dataframe is different. So, how do I change the…

Joining dataframes using rust polars in Python

dataframe pandas python python-polars rust-polars

I am experimenting with polars and would like to understand why using polars is slower than using pandas on a particular example: Answer A pandas join uses the indexes, which are cached. A comparison where they do the same:

Finding similar rows in two dataframes using pandas

dataframe logical-operators pandas python

I have two data frames, The first one is the root data frame, second one is obtained from first data frame (which is based on a pattern that “Name” must be repeated 3 times and “Subset” must of the pattern as shown in dataframe 2 below). Based on these two dataframes, i need to add a &…

How to fix memory error while importing a very large csv file to mongodb in python?

csv json mongodb pandas python

Given below is the code for importing a pipe delimited csv file to monogdb. Below is the error I get when running the above code. I if modify the code with some indents under the for loop. The MongoDB gets imported with the same data all over again without stopping. Answer The memory issue can be solved by in…

How to split a columns based on the index of the string in the columns while using a efficient method to parse all the Dataframe

dataframe pandas python vectorization

I have a column filled with a string value: col_1 10500 25020 35640 45440 50454 62150 75410 I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that. Supposed result : col_1 col_2 col_3 10500 10 500 25020 25 020 35640 35

Why doesn’t str.replace replace ALL values in selected pandas dataframe column?

dataframe pandas python replace

I’m working on a huge file that has names in columns that contain extraneous values (like the “|” key) that I want to remove, but for some reason my str.replace function only seems to apply to some rows in the column. My column in the dataframe summary looks something like this: As you can s…

shape error while concating columns after Principal Analysis in csv

data-science pandas pca python python-3.x

I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error : This is my code: I guess I am getting error while concat my components and df[‘type’]. Can I get idea to get rid of this error? Thank you.…