Tag: text

How to count word similarity between two pandas dataframe

Here’s my first dataframe df1 Here’s my second dataframe df2 Similarity Matrix, columns is Id from df1, rows is Id from df2 Note: 0 value in (1,1) and (3,2) because no text similar 1 value in (3,1) is because of Bersatu and Kita’ (Id 1ondf2is avalilable in Id3ondf1` 0.33 is counted because o…

Python regular expression how to deal with multiple back slash

python regex text

I’m dealing with text data and having problem erasing multiple back slashes. I found out that using .sub works quite well. So I coded as below to erase back slash+r n t f v However, the code above can’t deal with the string below. So coded as this: But it’s showing result like this.. I don’t know why this hap…

Problems Removing Duplicated Words from Pandas Row

pandas python text

I am working on an NLP assignment and having some problems removing duplicated strings from a pandas column. The data I am using is tagged, so some of the rows of data were repeated because the same comment could have multiple tags. So what I did was group the data by ID and Comment and aggregated based on ta…

Convert Text File into Pandas Dataframe

pandas python text

I want to create a dataframe from a textfile. I scraped some data from a website and wrote it into a .txt file. There are 10 ‘columns’, as shown in the first 10 lines of the text file. Can anyone help me with seperating the lines into the respective columns in a pandas dataframe format? Much appre…

I’m trying to create a table from text

dataframe pandas python text

I want to create a table with two columns separated by “:”. So the capitalized words as the first column and everything after the “:” as the second column. I was originally tried to do this from a PDF but that wasn’t working so I copied it to a text file thinking it might be easi…

Split a nested XML string to get a string using parser

parsing python text xml xml-parsing

I have this string : My goal is to extract Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1. so the text between <run> and </run> I did it with regular expression but it doesn&#…

How to remove urls between texts in pandas dataframe rows?

data-science dataframe pandas python text

I am trying to solve a nlp problem, here in dataframe text column have lots of rows filled with urls like http.somethingsomething.some of the urls and other texts have no space between them for example- ‘:http:\something’,’;http:\something’,’,http:\something’. so there some…

Transparency problem displaying text with Pygame

pygame pygame-surface python text transparency

enter image description hereI would like to display transparent text on a surface that is sized based on the length of the text. The problem is that the text has a black background even though “None” is specified as the background in the “render” command. I tried to apply the solutions…

Meaningless Spacy Nouns

python spacy text wordnet

I am using Spacy for extracting nouns from sentences. These sentences are grammatically poor and may contain some spelling mistakes as well. Here is the code that I am using: Code Output: Similarly for sentence “fast foward2”, I get Spacy noun as Which shows that these nouns have some meaningless …