Tag: text-processing

Python re.findall regex and text processing

postgresql python sql-server text-processing

I’m looking to find and modify some sql syntax around the convert function. I want basically any convert(A,B) or CONVERT(A,B) in all my files to be selected and converted to B::A. So far I tried selecting them with re.findall(r”bconvertb(.*?,.*)”, l, re.IGNORECASE) But it’s only returning a small selection out of what I want and I also have trouble actually manipulating

How to use python to replace a value in a field in a text file

python text-processing

I have a file that looks like this: I am trying to use python to change the values in the second last and last field under the [ atomtypes ] title. The deal is that I am running a code which iteratively updates this file, so I specifically want that field to be targeted, not the regular expression “2.931E-01” or

How to change the same keys related to different articles in bibtex?

bibtex pybtex python text-processing

With the help of some plugin, I get a .bib file with information about scientific articles. Sometimes it turns out that the same keys appear in different records. For example: I am using pybtex library to parse a file. This library ignores duplicate entries with the same keys. Before using this library, I need to somehow process the file so

What is Keras’ Tokenizer fit_on_sequences used for?

keras python tensorflow text-processing tokenize

I’m familiar with the method ‘fit_on_texts’ from the Keras’ Tokenizer. What does ‘fit_on_sequences’ do and when is it useful? According to the documentation, it “Updates internal vocabulary based on a list of sequences.”, and it takes as input: ‘A list of sequence. A “sequence” is a list of integer word indices.’. When is this useful? For fitting on texts, I

Why does Keras.preprocessing.sequence pad_sequences process characters instead of words?

keras nlp python speech-to-text text-processing

I’m working on transcribing speech to text and ran into an issue (I think) when using pad_sequences in Keras. I pretrained a model which used pad_sequences on a dataframe and it fit the data into an array with the same number of columns & rows for each value. However when I used pad_sequences on transcribing text, the number of characters

How can I loop through blocks of lines in a file?

python text-processing

I have a text file that looks like this, with blocks of lines separated by blank lines: How can I loop through the blocks and process the data in each block? eventually I want to gather the name, family name and age values into three columns, like so: Answer Here’s another way, using itertools.groupby. The function groupy iterates through lines