Tag: pandas

Replace zero values on all columns if sum of that columns is greater than 1?

I have CSV files like following I want to replace Result1, Result2, and Result3 columns by all zero if the sum of their column values is greater than 1. I tried this script When I do this operation Result3 column is dropped since it has all zero values. How do I do this operation only on the columns which sat…

Adding a full stop to text when missing

dataframe numpy pandas python text

How to add a full stop to a text please? I am not able to get the desired combined text. Answer You can use where and cat: Result: (this also works for the case when text1 is given and text2 is NaN)

Problems with DataFrame indexing with pandas

dataframe dataset indexing pandas python

Using pandas, I have to modify a DataFrame so that it only has the indexes that are also present in a vector, which was acquired by performing operations in one of the df’s columns. Here’s the specific line of code used for that (please do not mind me picking the name ‘dataset’ instead…

how to remove white space from strings of data frame column?

pandas python strip

I am trying to loop through a column in a pandas data frame to remove unnecessary white space in the beginning and end of the strings within the column. My data frame looks like this: I tried the this answer here, but did not work either. The reason I need to remove the white space from the strings in this

Update columns with duplicate values from the DataFrame in Pandas

data-cleaning dataframe pandas python

I have a data set which has values for different columns as different entries with first name to identify the respective columns. For instance James’s gender is in first row and James’s age is in 5th row. DataFrame df1= Index First Name Age Gender Weight in lb Height in cm 0 James Male 1 John 175 …

Reshaping long format dataframe to wide format according to the number of elements in columns [closed]

dataframe pandas python

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 9 months ago. Improve this question I have the following pandas dataframe X in long format: And I would like to change to the f…

How to perform an operation between two rows at a different index, repeatedly

pandas python

I have an excel file that can calculate the last column with a formula. I am trying to replicate this into python code. How can I perform an operation to achieve such results? Essentially, the last column is supposed to take, for example, row 2 and subtract itself by row 0 (because they share the same industr…

How to read most recent file with Pandas? Output path is undefined?

excel glob pandas python undefined

I’m trying to read the two latest sheets in my folder READ1 and READ2 with pandas. Usually when I read files the file name has to be formatted at ‘File.xlsx’ but the method I’m using is printing in the terminal as File.xlsx. I tried changing the format with: Which outputs as [“&#…

Python DataFrame String replace accidently Returing NaN

dataframe pandas python python-3.x

I encounter a weird problem in Python Pandas, while I read a excel and replace a character “k”, the result gives me NaN for the rows without “K”. see below image It should return 173 on row #4，instead of NaN, but if I create a brand new excel, and type the same number. it will work. or…

Dataframe – Find sum of all values from dictionary column (row-wise) and then create new column for that Sum

dataframe pandas pyspark python

My pyspark Dataframe which has two columns, ID and count, count column is a dict/Map<str,int>. I want to create another column which is the total of all values of count I have I want something like, Sum of all the values of count column My approach But I am getting grouped by individual Key and then agg…