Tag: pandas

Remove name, dtype from pandas output of dataframe or series

dataframe output-formatting pandas python series

I have output file like this from a pandas function. I’m trying to get an output with just the second column, i.e., by deleting top and bottom rows, first column. How do I do that? Answer You want just the .values attribute: You can convert to a list or access each value:

Plotting correlation heatmaps with Seaborn FacetGrid

pandas plot python seaborn

I am trying to create a single image with heatmaps representing the correlation of features of data points for each label separately. With seaborn I can create a heatmap for a single class like so An I get this which makes sense: But then I try to make a list of all the labels like so: And sadly I get

Naturally sorting Pandas DataFrame

natsort pandas python python-2.7 sorting

I have a pandas DataFrame with indices I want to sort naturally. Natsort doesn’t seem to work. Sorting the indices prior to building the DataFrame doesn’t seem to help because the manipulations I do to the DataFrame seem to mess up the sorting in the process. Any thoughts on how I can resort the indices naturally? Answer If you want

Shuffle DataFrame rows

dataframe pandas permutation python shuffle

I have the following DataFrame: The DataFrame is read from a CSV file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc. I would like to shuffle the order of the DataFrame’s rows so that all Type’s are mixed. A possible result could be: How

How to create a pivot table on extremely large dataframes in Pandas

pandas pivot-table python python-3.x

I need to create a pivot table of 2000 columns by around 30-50 million rows from a dataset of around 60 million rows. I’ve tried pivoting in chunks of 100,000 rows, and that works, but when I try to recombine the DataFrames by doing a .append() followed by .groupby(‘someKey’).sum(), all my memory is taken up and python eventually crashes. How

Python Pandas replace NaN in one column with value from corresponding row of second column

dataframe fillna nan pandas python

I am working with this Pandas DataFrame in Python. I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column. This is what I need: If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I

Pandas secondary y axis for boxplots

boxplot pandas python

I’d like to use a secondary y-axis for some boxplots in pandas, but it doesn’t seem available. Now, using the default line plot it’s easy enough to plot to a second y-axis: But if I use boxplot style, it doesn’t work: Is there any way (maybe through matplotlib) I can get pandas to plot 2 axes for boxplot? Using the

Faster way to read Excel files to pandas dataframe

import-from-excel pandas python

I have a 14MB Excel file with five worksheets that I’m reading into a Pandas dataframe, and although the code below works, it takes 9 minutes! Does anyone have suggestions for speeding it up? Answer As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the

How to get last group in Pandas’ groupBy?

pandas python

I wish to get the last group of my group by: but that gives the error: KeyError: -1 Using get_group is useless as I don’t know the last group’s value (unless there’s a specific way to get that value?). Also I might want to get the last 2 groups, etc How do I do this? Answer You can call last