Skip to content
Advertisement

Tag: pandas

How to create groupby subplots in Pandas?

I’ve got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I’d like to perform a groupby plot on the dataframe so that it’s possible to explore trends in crime over time. I’ve got some code which does the job, but it’s a bit clumsy and it loses the time

how to use word_tokenize in data frame

I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe. Basically, i want to separate all the words and find the length of each text in the dataframe. I know

Pandas: Assign Datetime object to time intervals

I’m trying to create a new variable in which datetime64[ns] objects are assigned to 5 minute intervals. The new interval variable should span every 5 minute period from 00:00 to 23:55. The criteria for assignment is whether the time of the datetime64[ns] object falls within the corresponding 5 min interval. My actual data has numerous dates in the DateTime variable,

Imputation of missing values for categories in pandas

The question is how to fill NaNs with most frequent levels for category column in pandas dataframe? In R randomForest package there is na.roughfix option : A completed data matrix or data frame. For numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains

Search and filter pandas dataframe with regular expressions

I’d appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not

Iterating through pandas groupby groups

I have a pandas dataframe school_df that looks like this: Each row represents one project by that school. I’d like to add two columns: for each unique school_id, a count of how many projects were posted before that date and a count of how many projects were completed before that date. The code below works, but I have ~300,000 unique

Advertisement