Tag: pandas

How to create dummies for certain columns with pandas.get_dummies()

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df), all columns turned into dummies. I want the final result containing all of columns , which means column C and column B exit,like ‘A_x’,’A_y’,’B’,’C’,’D_j’,’D_l’. Answer It can be done without concatenation, using get_dummies() with required parameters

pandas dataframe str.contains() AND operation

dataframe pandas python string

I have a df (Pandas Dataframe) with three rows: The function df.col_name.str.contains(“apple|banana”) will catch all of the rows: How do I apply AND operator to the str.contains() method, so that it only grabs strings that contain BOTH “apple” & “banana”? I’d like to grab strings that contains 10-20 different words (grape, watermelon, berry, orange, …, etc.) Answer You can do

Pandas replace all items in a row with NaN if one value is NaN

pandas python

I want to get rid of some records with NaNs. This works perfectly: However, it changes the shape of my dataframe, and the index is no longer uniformly spaced. Therefore, I’d like to replace all items in these rows with np.nan. Is there a simple way to do this? I was thinking about resampling the dataframe after dropna, but that

Fill empty cells in column with value of other columns

pandas python

I have a HC list in which every entry should have an ID, but some entries do not have an ID. I would like to fill those empty cells by combining the the first name column and the last name column. How would I go about this? I tried googling for fillna and the like but couldn’t get it to

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

boolean dataframe filtering pandas python

I want to filter my dataframe with an or condition to keep rows with a particular column’s values that are outside the range [-0.25, 0.25]. I tried: But I get the error: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() Answer The or and and Python statements require truth-values. For pandas, these are considered

label-encoder encoding missing values

pandas python scikit-learn

I am using the label encoder to convert categorical data into numeric values. How does LabelEncoder handle missing values? Output: For the above example, label encoder changed NaN values to a category. How would I know which category represents missing values? Answer Don’t use LabelEncoder with missing values. I don’t know which version of scikit-learn you’re using, but in 0.17.1

dataframe to long format

pandas python

I have the following df: I would like to change it so that looks like this: The reason is that I have a df that is similarly shaped and I need to merge the two dfs. I have recently had similar df shaping issues that I have been unable to find simple quick solutions to with python. Does anyone know

In Pandas, how to create a unique ID based on the combination of many columns?

pandas python

I have a very large dataset, that looks like I need to create a ID variable, that is unique for every B-C combination. That is, the output should be I actually dont care about whether the index starts at zero or not, and whether the value for the missing columns is 0 or any other number. I just want something

Pandas Dataframe datetime slicing with Index vs MultiIndex

dataframe datetime pandas python slice

With single indexed dataframe I can do the following: Date time slicing works when you give it a complete day (i.e. 2016-01-01), and it also works when you give it a partial date, like just the year and month (2016-01). All this works great, but when you introduce a multiindex, it only works for complete dates. The partial date slicing

How to divide two columns element-wise in a pandas dataframe

dataframe pandas python

I have two columns in my pandas dataframe. I’d like to divide column A by column B, value by value, and show it as follows: The columns: And the expected result: How do I do this? Answer Just divide the columns: