Tag: pandas

How to create groupby subplots in Pandas?

matplotlib pandas python seaborn subplot

I’ve got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I’d like to perform a groupby plot on the dataframe so that it’s possible to explore trends in crime over time. I’ve got some code which does the job, but it’s a bit clumsy and it loses the time

how to use word_tokenize in data frame

nltk pandas python

I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe. Basically, i want to separate all the words and find the length of each text in the dataframe. I know

Is there a better more readable way to coalese columns in pandas

pandas python

I often need a new column that is the best I can achieve from other columns and I have a specific list of preference priorities. I am willing to take the first non null value. Results this code works (and the result are what I want) but it is not very fast. I get to pick my priorities if I

Pandas: Assign Datetime object to time intervals

datetime pandas python

I’m trying to create a new variable in which datetime64[ns] objects are assigned to 5 minute intervals. The new interval variable should span every 5 minute period from 00:00 to 23:55. The criteria for assignment is whether the time of the datetime64[ns] object falls within the corresponding 5 min interval. My actual data has numerous dates in the DateTime variable,

Imputation of missing values for categories in pandas

pandas python

The question is how to fill NaNs with most frequent levels for category column in pandas dataframe? In R randomForest package there is na.roughfix option : A completed data matrix or data frame. For numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains

Search and filter pandas dataframe with regular expressions

pandas python regex

I’d appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not

How to get value counts for multiple columns at once in Pandas DataFrame?

numpy pandas python

Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time? For example, suppose I generate a DataFrame as follows: I can get a DataFrame like this: How do I conveniently get the value counts for every column and obtain the following

merge few pivot tables in pandas

pandas python python-3.x

How I can merge two pandas pivot tables? When I try run my code I have error: keyerror Answer answer for my question is :

how to multiply pandas dataframe with numpy array with broadcasting

array-broadcasting numpy pandas python

I have a dataframe of shape (4, 3) as following: I want to multiply each column of the dataframe with a numpy array of shape (4,): In numpy, the following broadcasting trick works: However, it doesn’t work in the case of pandas dataframe, I get the following error: Any suggestions? Answer I find an alternative way to do the multiplication