I have output file like this from a pandas function. I’m trying to get an output with just the second column, i.e., by deleting top and bottom rows, first column. How do I do that? Answer You want just the .values attribute: You can convert to a list or access each value:
Tag: pandas
Plotting correlation heatmaps with Seaborn FacetGrid
I am trying to create a single image with heatmaps representing the correlation of features of data points for each label separately. With seaborn I can create a heatmap for a single class like so An I get this which makes sense: But then I try to make a list of all the labels like so: And sadly I get
Naturally sorting Pandas DataFrame
I have a pandas DataFrame with indices I want to sort naturally. Natsort doesn’t seem to work. Sorting the indices prior to building the DataFrame doesn’t seem to help because the manipulations I do to the DataFrame seem to mess up the sorting in the process. Any thoughts on how I can resort the indices naturally? Answer If you want
Shuffle DataFrame rows
I have the following DataFrame: The DataFrame is read from a CSV file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc. I would like to shuffle the order of the DataFrame’s rows so that all Type’s are mixed. A possible result could be: How
How to create a pivot table on extremely large dataframes in Pandas
I need to create a pivot table of 2000 columns by around 30-50 million rows from a dataset of around 60 million rows. I’ve tried pivoting in chunks of 100,000 rows, and that works, but when I try to recombine the DataFrames by doing a .append() followed by .groupby(‘someKey’).sum(), all my memory is taken up and python eventually crashes. How
Max and Min value for each colum of one Dataframe
Give this dataframe ‘x’: How I could get a list of pairs with the min and max of each column? The result would be: Answer You could define a function and call apply passing the function name, this will create a df with min and max as the index names: If you insist on a list of lists we can
Python Pandas replace NaN in one column with value from corresponding row of second column
I am working with this Pandas DataFrame in Python. I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column. This is what I need: If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I
Pandas secondary y axis for boxplots
I’d like to use a secondary y-axis for some boxplots in pandas, but it doesn’t seem available. Now, using the default line plot it’s easy enough to plot to a second y-axis: But if I use boxplot style, it doesn’t work: Is there any way (maybe through matplotlib) I can get pandas to plot 2 axes for boxplot? Using the
Faster way to read Excel files to pandas dataframe
I have a 14MB Excel file with five worksheets that I’m reading into a Pandas dataframe, and although the code below works, it takes 9 minutes! Does anyone have suggestions for speeding it up? Answer As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the
How to get last group in Pandas’ groupBy?
I wish to get the last group of my group by: but that gives the error: KeyError: -1 Using get_group is useless as I don’t know the last group’s value (unless there’s a specific way to get that value?). Also I might want to get the last 2 groups, etc How do I do this? Answer You can call last