Skip to content

Tag: python

Using Sklearn’s TfidfVectorizer transform

I am trying to get the tf-idf vector for a single document using Sklearn’s TfidfVectorizer object. I create a vocabulary based on some training documents and use fit_transform to train the TfidfVectorizer. Then, I want to find the tf-idf vectors for any given testing document. The problem is that this r…

Matplotlib connect scatterplot points with line – Python

I have two lists, dates and values. I want to plot them using matplotlib. The following creates a scatter plot of my data. plt.plot(dates, values) creates a line graph. But what I really want is a scatterplot where the points are connected by a line. Similar to in R: , which gives me a scatterplot of points o…

What is the most efficient way of counting occurrences in pandas?

I have a large (about 12M rows) DataFrame df: The following ran in a timely fashion: However, this is taking an unexpectedly long time to run: What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame? ran pretty well, so I really did not expect this Occurrences_of_Words Data…

Pandas dataframe get first row of each group

I have a pandas DataFrame like following: I want to group this by [“id”,”value”] and get the first row of each group: Expected outcome: I tried following, which only gives the first row of the DataFrame. Any help regarding this is appreciated. Answer If you need id as column: To get n …

Convert one row of a pandas dataframe into multiple rows

I want to turn this: Into this: Context: I have data stored with one value coded for all ages (age = 99). However, the application I am developing for needs the value explicitly stated for every id-age pair (id =1, age = 25,50, and 75). There are simple solutions to this: iterate over id’s and append a …