I am working on a machine learning project with very sparsely labeled data. There are several categorical features, resulting in roughly one hundred different classes between the features. For example: After I put these through scikit’s OneHotEncoder I am expecting the missing data to be encoded as 00, since the docs state that handle_unknown=’ignore’ causes the encoder to return an
Tag: data-science
shape error while concating columns after Principal Analysis in csv
I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error : This is my code: I guess I am getting error while concat my components and df[‘type’]. Can I get idea to get rid of this error? Thank you. Answer
I’m trying to write a condition on the python list
I tried writing a Python program to accept a list of integers starting with 0 and ending with 20 from the user. Each integer of the said list either differs from the previous one by two or is four times the previous one. Return true or false Can someone correct the condition which I have written Answer You can use
If there is a second column present then populate second column values, else populate first column values in Dataframe
I have a dataframe as seen below: I need two columns now, column A and Column B. Conditions summarized: The required dataframe should be as follows: Answer Try: The !=” will work if you truly have nothing in the cell (as opposed to a NaN etc.). If you have actual NaN values use:
How to convert rows to columns in a Pandas groupby?
I have a table containing price data for a set of products over 6 months. Each product has a unique id (sku_id) and can be from size 6-12. We measured the price each day, and generated a table similar to the example below. Source indicates what website the price was on (can be 1-4). Now, I want to perform some
How to plot histogram for below Data Frame
For example this is the DataFrame I want to plot the histogram with X axis showing the Countries and the Y axis showing the amounts on the Y axis, Is this possible. Answer For a dataframe that looks like this: You can plot the graph you want like so: This will produce the following plot: You can check more about
Pandas how to explode several items of list for each new row
I have a dataframe: I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list. So I will get: What is the best way to do so? Answer Break list element into chunks first and then explode: If you
Create new pd dataframe column that gives a date based on day and week starting data
I have a pandas dataframe that has two columns, the first column is ‘Week Starting’ and the other is ‘Day’. I wanna create a new column that uses the data from the other two columns to give a full date. For example, from the table below, the first entry of the new column should be 5/04/2021 and the second should
Python/Pandas searching data in Dataframe
I want to explain my question with an example. I have a dataset which includes avocado average prices and many features about these prices(I guess avocado prices dataset is very popular, idk). And there is a feature called “region” that shows where avocadoes grew. I wrote this line of code to get to avocados feature which grews on “west”. my
How to remove urls between texts in pandas dataframe rows?
I am trying to solve a nlp problem, here in dataframe text column have lots of rows filled with urls like http.somethingsomething.some of the urls and other texts have no space between them for example- ‘:http:\something’,’;http:\something’,’,http:\something’. so there sometime , before url text without any space and sometime something else but mostly , ,. ,:, ;. and url either at