Tag: data-cleaning

Delete rows where any column contains a certain string

I am dealing with a dataset that uses “..” as a placeholder for null values. These null values span across all of my columns. My dataset looks as follows: Country Code Year GDP growth (%) GDP (constant) AFG 2010 3.5 .. AFG 2011 .. 2345 AFG 2012 1.4 3372 ALB 2010 .. 4567 ALB 2011 .. 5678 ALB 2012 4.2

Creating another column in pandas df based on partially empty columns

data-cleaning pandas python

I want to create a third column in my pandas dataframe that is based on cols 1 and 2. They are always matching, but I want to make it so that the third column takes whichever value is available. If I just go off of id1, sometimes it is blank, so the third col will end up being blank as

Creating another column in pandas based on a pre-existing column

data-cleaning dataframe pandas python

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a ‘user/’ prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).

How to clean survey data in pandas

data-cleaning dataframe numpy pandas python

Input: Output: here’s the data: d = {‘Morning’: [“Didn’t answer”, “Didn’t answer”, “Didn’t answer”, ‘Morning’, “Didn’t answer”], ‘Afternoon’: [“Didn’t answer”, ‘Afternoon’, “Didn’t answer”, ‘Afternoon’, “Didn’t answer”], ‘Night’: [“Didn’t answer”, ‘Night’, “Didn’t answer”, ‘Night’, ‘Night’], ‘Sporadic’: [“Didn’t answer”, “Didn’t answer”, ‘Sporadic’, “Didn’t answer”, “Didn’t answer”], ‘Constant’: [“Didn’t answer”, “Didn’t answer”, “Didn’t answer”, ‘Constant’, “Didn’t answer”]} I want the output to be:

Update columns with duplicate values from the DataFrame in Pandas

data-cleaning dataframe pandas python

I have a data set which has values for different columns as different entries with first name to identify the respective columns. For instance James’s gender is in first row and James’s age is in 5th row. DataFrame df1= Index First Name Age Gender Weight in lb Height in cm 0 James Male 1 John 175 2 Patricia 23 5

How to delete empty spaces from pandas DataFrame rows until first populated field?

data-cleaning dataframe pandas python rows

Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this: Name Type Date other1 other2 other3 Name1 ” ” Type1 ” Date1 Name2 ” ” ” Type2 Date2 Name3 ” ” Type3 Date3 ” Name4 ” Type4 ” ” Date4 Name5 Type5 ” Date5 ” ” And so on. As

How to delete a certain value in a cell in columns of csv using pandas

csv data-cleaning excel pandas python

I need help with deleting “None” along with extra comma in language columns that have one or more language Here is the existing csv: Where f now looks like: And the result should be like this: There are also other columns that have only ‘None’ values in language column, so I can’t just use the replace function in excel, and

Replacing HTML but saving the word sticking at the end

data-cleaning dataframe python python-re

I was working with text data, I want to remove anything HTML code that is things with “<” and “>”. For example << HTML > < p style=”text-align:justify” >Labour Solutions Australia (LSA) is a national labour hire and sourcing ` So I use the following code With the execution of the code we get the following result Solutions Australia LSA

How to find which doctor a patient is using, when only given a list of doctor’s patients? (code improvement request) [closed]

data-cleaning python

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 1 year ago. Improve this question I need to create a dataframe which lists all patients and their matching doctors. I have a txt file with

Is there a way to make new columns from cells and have their values be from another column

data-cleaning pandas python

I am trying to find a way to take information from one column in a pandas DataFrame and have its unique value be the new column and its score be the value in the newly formed column. I.e. Index Product Test Score 0 A Protection 5 1 A Comfort 6 2 B Protection 6 3 B Comfort 7 And the