I have the following dataframe (which is a pretty reduced sample from my original one). I’m trying to get the top 2 ids for each year AND month. So, for example, the idea was to obtain the below df. My main problem here, is to get the Top n along with the dates, because the nlargest method applies to a
Tag: pandas
How can I use value_counts() only for certain values?
I want to extract how many positive reviews by brand are in a dataset which includes reviews from thousands of products. I used this code and I got a table including percentaje of positive and non-positive reviews. How can I get only the percentage of positive reviews by brand? I only want the “TrueR…
Python pandas conditional ffill. Fill the month beginning value till that month’s end
The code below produces sample dataframe The value on 1st of December is as follows Which outputs 9 My question is how to use “ffill” method to have this value 9 for all days of December? I want the month beginning value to be filled till end of that month Answer Replace values for all days except…
Pandas unique values per row, variable number of columns with data
Consider the below dataframe: Assuming my index is unique, I’m looking to retrieve the unique values per index row, to an output like the one below. I wish to keep the empty rows. I have a working, albeit slow, solution, see below. The output number order is not relevant, as long all values are presente…
How to combine dataframes based on index column name
Hello I am new to python and I have 2 dfs and a list of tickers and i would like to combine the 2 dfs based on a list of tickers. My second df had the tickers imported from an excel sheet and so the column names in the index are in a different order, I am not sure if
Expecting integer values in calculation, but getting
I am working on implementing an ID3 algorithm in python. In order to get past the first step I need to calculate the information gain per column. The comments are self-explanatory. The issue that I am trying to resolve is From the simple program shown below. The Test set for ID3.csv The Training set for ID3.c…
Error while using str.contains for checking numeric values in a column using regex
I have a dataframe. I want to check if a particular column has numeric values or not using regex matching. When I use str.contains it shows an error like below. What is the correct way to check if all the values in a column have numeric values or not? Answer You can use With .astype(str), you will be able to
How do I use pandas to open file and process each row for that file?
So I have a dataframe containing image filenames and multiple (X, Y) coordinates for each image, like this:: file x y File1 1 2 File1 103 6 File2 6 3 File2 4 9 I want to open each file and draw the points points on it. I want to open each file in the file column once only. (I know
scikit-learn LinearRegression IndexError
I am working on a LinearRegression model to fill the null values for the feature Rupeepersqft. When I run the code, I am receiving this error: This is the code which gives me the error: This is how the data looks like: Can anyone help me out with this? Answer To assign values to a column in Pandas.DataFrame y…
pandas dataframe add a column computing the median from the first row
I have a dataframe with a column filled with floats. I want to add a column which computes the median from the first row to the current row. I do no want to compute a rolling median but the median with all the inforamtion known ar each step. Answer You can check with expanding