I’ve got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I’d like to perform a groupby plot on the dataframe so that it’s possible to explore trends in crime over time. I’ve got some code which does the job, but it’s a bit clumsy and it loses the time
Tag: pandas
how to use word_tokenize in data frame
I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe. Basically, i want to separate all the words and find the length of each text in the dataframe. I know
Is there a better more readable way to coalese columns in pandas
I often need a new column that is the best I can achieve from other columns and I have a specific list of preference priorities. I am willing to take the first non null value. Results this code works (and the result are what I want) but it is not very fast. I get to pick my priorities if I
Pandas: Assign Datetime object to time intervals
I’m trying to create a new variable in which datetime64[ns] objects are assigned to 5 minute intervals. The new interval variable should span every 5 minute period from 00:00 to 23:55. The criteria for assignment is whether the time of the datetime64[ns] object falls within the corresponding 5 min interval. My actual data has numerous dates in the DateTime variable,
Imputation of missing values for categories in pandas
The question is how to fill NaNs with most frequent levels for category column in pandas dataframe? In R randomForest package there is na.roughfix option : A completed data matrix or data frame. For numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains
Search and filter pandas dataframe with regular expressions
I’d appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not
How to get value counts for multiple columns at once in Pandas DataFrame?
Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time? For example, suppose I generate a DataFrame as follows: I can get a DataFrame like this: How do I conveniently get the value counts for every column and obtain the following
Iterating through pandas groupby groups
I have a pandas dataframe school_df that looks like this: Each row represents one project by that school. I’d like to add two columns: for each unique school_id, a count of how many projects were posted before that date and a count of how many projects were completed before that date. The code below works, but I have ~300,000 unique
merge few pivot tables in pandas
How I can merge two pandas pivot tables? When I try run my code I have error: keyerror Answer answer for my question is :
how to multiply pandas dataframe with numpy array with broadcasting
I have a dataframe of shape (4, 3) as following: I want to multiply each column of the dataframe with a numpy array of shape (4,): In numpy, the following broadcasting trick works: However, it doesn’t work in the case of pandas dataframe, I get the following error: Any suggestions? Answer I find an alternative way to do the multiplication